Blog: The Next Big Data Gold Rush - Why Vertical AI Needs Specialized Data Providers Now
“When everyone is looking for gold, it's a good time to be in the pick and shovel business.” (or selling jeans, like Levi-Strauss)
The data industry, once a hot topic, may seem less exciting now amid the buzz around large language models (LLMs) and NVIDIA chips. However, AI companies face a significant challenge: they’re running out of public data to train their models. The industry’s focus will soon shift away from broad, horizontal AI applications toward more specialized, vertical uses cases tailored for specific industries or business needs. As AI continues to narrow its focus, the demand for high-quality datasets to support these targeted use cases will only increase. This shift presents a considerable opportunity for both investors and innovators.
The Rise of Specialized Data Providers
While Silicon Valley chases after the next shiny AI breakthrough, the unsung heroes in data are quietly building the empires of tomorrow. This trend is particularly evident in financial markets. Financial Market intelligence represents a $40 billion market, with Bloomberg leading primarily in public company data. However, private market data is gaining importance as private equity and venture capital occupy a larger share of asset allocation. These markets have traditionally been opaque, with trading driven by personal networks and unstructured data. AI is set to change this, enhancing precision in decision-making for sourcing, evaluating, and tracking investment opportunities. A new wave of specialized alternative data providers is emerging to serve these and other segments, for instance:
Venture Capital: Dealroom.co and CB Insights provide insights into startup ecosystems.
Private Equity: Gain.pro offers specialized data and analysis for private equity markets.
Debt Markets: 9fin focuses on credit and debt markets.
Commodities: Kpler , Vesper, Sparta, and Vortexa offer insights into commodity flows and pricing.
Energy: Yes Energy and Dexter Energy deliver specialized data solutions for power markets.
Hedge funds: YipitData scrapes platforms and use receipt data to predict demand of online platforms (such as Booking, Airbnb) and sell it to hedge funds so they can predict quarterly earnings movement.
Below a more comprehensive overview of the growing number of alternative data providers across different verticals:
Source: Dealroom.co This landscape can be accessed here. Feel free to suggest edits if you think we missed or misplaced a company.
Identifying the Winners in This Space
With new players entering the market, how can we identify the companies poised to become leaders in this rapidly changing landscape? The answer lies in several key differentiators:
Serving High-Stakes Use Cases: The most valuable data providers are those that support critical business decisions. Data that can influence the outcome of a deal commands premium pricing and ensures customer loyalty.
Comprehensive Coverage: Real-time access to deeper data, including niche or less common data points—the "long tail" of lead and lagging indicators—adds significant value. The depth of curation, diligence-grade quality, and comprehensive coverage can set a provider apart from competitors.
AI-Enablement: Using AI to speed up decision-making in sourcing, evaluating, and tracking top investment opportunities.
Workflow Integration: Data that integrates smoothly into customer workflows—through robust APIs, advanced analytics, user-friendly interfaces, and transparent pricing—boosts usability and stickiness.
Proprietary Data: Control over unique datasets creates a strong competitive advantage. More on this below.
The Power of Proprietary Data
While it’s easy and affordable to build large datasets from public sources these days, such data offers no competitive edge. Proprietary data is the cornerstone of the next generation of market intelligence companies. Here’s how providers are building and safeguarding these valuable assets:
Exclusive Agreements: Forming exclusive partnerships with data sources ensures a steady stream of unique insights that competitors cannot replicate.
Costly Data Collection: Collecting comprehensive data, particularly in niche areas or from the "long tail", can be prohibitively expensive, creating a barrier to entry. Companies that excel at this can become the sole providers of certain types of data.
Give-to-Get Data Models: Platforms like Glassdoor use this model, where users exchange their data (e.g., salaries) for access to benchmarks and analytics, creating a cycle of data acquisition and user engagement.
User-Generated Data: Encouraging users to volunteer data, such as reviews or feedback, can build a unique and highly valuable dataset over time.
Proprietary Exhaust Data: Leveraging data generated as a by-product of other business functions—like Google’s search trends or Amazon’s pricing insights—provides unique, high-value datasets.
In Closing: Opportunities for Venture Capital
Looking ahead to the next decade of AI, we anticipate the rise of numerous billion-dollar market intelligence companies. Not all startups will achieve this level of success, but those that do will distinguish themselves through strong network effects, deep integration, and unique datasets. Based on our proprietary ranking, we identified some of the best performing data companies:
Source: Dealroom.co; Coalition Capital ranking. Please press this link for a full list of best performing data companies
Venture Capital investors are just starting to recognize the immense potential in this space. As the demand for high-quality, clean, and actionable data grows, there’s a substantial opportunity to build multiple $100M ARR businesses.
The recent acquisitions of Preqin by BlackRock and Tegus by AlphaSense are examples of big outcomes that are possible in this space. They also signal the potential for industry consolidation. While the number of players in this space is growing, we believe the end state will likely be oligopolistic rather than ‘winner-takes-all’ (Bloomberg/CapIQ/Factset/Refinitiv serving as case in point). Client demands are heterogeneous combined with quite big entry barriers. It takes a long time to build a scaled platform in data.
As AI's thirst for data turns into a drought, the savviest data miners might be today's most undervalued asset in tech.
Feel free to contact us if you’re building the next big market intelligence platform or looking to invest in this space. We’d love to connect and explore how we can shape the future of this exciting industry together.
Note: A big thank you to Yoram Wijngaarde (founder and CEO of Dealroom.co), Frister Haveman and Nicola Ebmeyer (co-founders and co-CEO of Gain.pro), Felipe Elink Schuurman (co-founder and CEO of Sparta) and Stefan Tan (co-founder of Dashmote) for reviewing early drafts and their expert inputs.
The original of this article can be found here.