I have often been asked to write about how data gets valued. The reality is that there is a lot of being in the right place at the right time, and having data the markets want, so scale is a definite factor. So while this article provides a fundamental analysis of data’s intrinsic value what must be understood is the need to simplify the categories to fit this into an article rather than produce ‘War and Peace’.

For instance I place Indices into the highest categories of data value, and this is definitely true for the S&P500, MSCI World Index and FTSE 100 which are wildly successful cash kings, whereas the majority of indices have nowhere near their value, but usually more than datasets in other categories. This same vertical can be seen across all categories of data, and indeed sources, for instance ASX data is far more in demand than peer Nigerian Stock Exchange. So the reader needs to take these factors into account.


Financial markets participants invest almost US$50 Billion per annum on data and the trend has only ever been an increase in spend. Understanding data’s underlying worth to a business is critical in ensuring the most appropriate data is obtained to make investment decisions as well as value the assets to be bought, sold or held. On the other side the data owner needs to know what his data is worth and to whom.

There are two parameters the data consumer needs to address when determining the most appropriate data required to make the best investment decisions. These are:


  1. What categories of data do I need for my business? The many types of available datasets have vastly different values.
  2. Where do I source that data from? What, if any, options are available to obtain the data needed? Intellectual Property Rights plays a very important role here. 

Naturally the two are not exclusive because if a certain type of data is only available from a single source, then options are limited to either do I want, or really need, to subscribe at the offered price or not.

The depth to which data consumers are either price takers or price makers varies significantly according to need and perceived value of the data required. Small data owners can be forced into under-valuing their data because they lack distribution reach or their product is too generic. Interestingly the majority of monopolistic sources are remarkably sensitive to the wider influence of their clients and impacts on retail consumers that might incur a political or regulatory cost.

The question that both data sources and data consumers must constantly ask is what is the current and expected value of the data. This validation is necessary because different types of data changes value in time according to need and perception. The advent of Environmental, Social & Governance data is an example of existing data generating new and higher values just in a different form, while historical time series data something once thrown in with real time pricing is becoming an ever more important component for use in analytical tools and investment modelling.



What categories of data do I need for my business?

All types of data have value, it is just certain data tends to have more value than others. Even within types of data, their sub-types can be weighted differently, for instance tradable prices are more prized than traded prices, because that is the market level now, not after a transaction has taken place, while both are considered more valuable than Indicators of Interest despite their role as liquidity generators.

Per the diagram above as of 2021 each level the data owner is able to command a higher premium than the lower tiers, with a valuation structure that can be broken down into four parts:


  1. Historical data is traditionally viewed as low premium yet gaining traction in importance as users incorporate more time series for populating analytics, trading models as the basis for investment decision making, making a slow migration from Level 1 to Level 3. Contributed or ‘shop window’ data is prices banks and brokers put out to create liquidity. Personally I believe this to be of great value, especially for discovering market levels, however regulators seem somewhat averse to these prices. Level 1 provides background to the real time price market
  2. Once the premier type, Real Time Prices are used for current and executable market levels, the data upon which an investment action is taken rather than an investment decision. It is still a necessity but the value is limited by market saturation for that data. Once saturation has been achieved the dollar worth plateaus, because the only options available to the data owner are to segment the datasets and/or increase fees, neither ever popular. Level 2 puts data to work tactically
  3. Adding value to data is the big business for the 2020a as data sources and third party providers offer their services for use in analytical tools to manufacture decisions, plus the data consumers themselves deriving new data for such uses as financial products creation, and synthetic market valuations. Level 3 puts data to work strategically
  4. Indices and Credit Ratings are all about benchmarking performance (and IP transfer), becoming an absolute necessity across wide swathes of the financial markets to also include tradable products, and asset allocation modelling. Level 4 is about data becoming a brand built around flagship benchmarks (i.e. S&P500) that pull the index creators’ other products in their slipstream enabling the charging of hefty premiums



Where do I source that data from?

Data ownership, differentiating between publicly available and different levels of availability depending upon how competitive the markets are can be a major factor in establishing the value of the datasets belonging to sources and to a lesser extent vendors. It is surprising how many people have yet to realise that the data displayed on a Bloomberg or Refinitiv terminal/feed does not belong to either vendor. Again, from a high level perspective we discern four levels of value, with the fourth, ‘specialist data’ being partly about source but very definitely about demand. These four levels can be categorised as: 


  1. Publicly available data usually free from sources who may retain residual IP over ownership if not usage. However, given the number of sources, the need to set up and maintain this type of data for consumers it is far easier and cheaper to source this data from specialist aggregators and the large vendors. Level 1 is noted for being competitive and low margin in comparison to the other levels.
  2. OTC Markets are known for fragmentation, activity ranging from highly liquid to illiquid and the desire for quality price discovery. The data is proprietary, and while there is a limited number of Interdealer Brokers they offer depth and breadth others cannot match, while the far more numerous financial institutions who now seek to commercialise their data find they have depth but lack breadth. Level 2 is about data ownership with competition limited either horizontally or vertically. 
  3. Single sources, (exchanges) may appear monopolistic but the reality is I can choose an exchange to trade on, yet if I need to invest on a particular exchange I have no choice. Exchanges do compete for business, and in most cases are sensitive to their investors which means they can generate revenue but are capped as to how far they can go in raising prices. In comparison the global Index Creators and Credit Ratings Agencies function for all intents as an oligarchy where choice is constrained and the profit margins are eye watering. Level 3 is about specific data only being available from certain sources
  4. This level does not fit in easily with the others and is broken down into two distinct categories, firstly data identifiers, the glue that allows data to be used, and secondly value added forms of data which have particular purpose. I have deliberately split this into Environment and Social Data and Governance Data, because the latter does not fit well with the former. Level 4 is partly knowing what individual data points are and adding value to data for new purposes 



In data as with everything else, what is valuable to one person, can have little to no value to someone else, and visa-versa. There is little point in selling NYSE stock prices to a Singapore bond trader. However each dataset has significance to someone, and from a core ‘must have’ dataset, more datasets radiate out with proximity related to relevance, directly impacting the consumers’ willingness to put up money for them.

Each of the two diagrams display data value in 2 Dimensions while a more accurate depiction would show a multi-layered matrix linking all the items in both, and much more. Therefore these are obviously simplistic because the data types are inextricably intertwined with data sources. The diagrams are designed to present a high level view of the myriad of types of data and sources that actually exist, while flows are more often multi-lateral than linear.

For the data owners understanding then realising the true value of the data becomes dependent upon key factors:


  1. The data inventory and the data’s attributes
  2. Uniqueness relative to peers
  3. Availability, ability to access, thus allowing the data to be put to work
  4. As a ‘Brand’. In data brand names count, especially for indices and credit ratings, creating the opportunity to command premiums
  5. The data’s relationship to its place in the decision making and execution process
  6. That the value of data by type and source can and does evolve with winners and losers

All data has value, just some has more value than others, and for some users but not necessarily others. Working out the permutations is the challenge both sources and consumers must meet.

Keiren Harris 03/06/2021

Please email for a pdf or information about out consulting services