The Data Wars

Building monopolies off public goods.

Jul 18, 2023

Hello there!

This is issue three (3/8) of the beta version of our paid newsletter. Do send in feedback if you think there is something we can improve on.

Last weekend, I grimly realised that building monopolies off goods that should be public is no longer possible. For instance, the monopolies on packaged water have already reached scale, and you can't compete with them anymore. I'm not sure I want to cross the ethical lines of selling packaged air. But one asset that is accessible to everyone and generates wealth is blockchain data. You can query it off Etherscan, run your own node, or print copies of every transaction on the blockchain ledger and keep it at home.

I recommend you not try the last one for environmental reasons, but here's the point. Blockchain data is free, publicly accessible, and constantly being produced, yet there's a billion-dollar ecosystem built around it – $10 billion if you include publicly traded tokens. Nansen alone was valued at $750 million in their latest round. So, how can firms take a public asset and profit from it?

Data products in Web3 have three levers they can tinker with. It is rare to see a product that combines all three extensively as each requires a different specialisation.

Data vendors like Nansen, Arkham and Santiment are visual layers atop blockchain data. They index (store), query and display public data. When you subscribe to these products, you effectively pay for their ability to visualise on-chain situations. Nothing's stopping people from visualising the same things as their competitors, so consumer prices for data products in Web3 are usually a race to the bottom. In this regard, blockchain data differs from privately held data at monopolies like Alphabet or Meta.

Only the guys at Facebook can peer into user data and uncover insights like 'you did this cringe thing 12 years back' in your notifications. With blockchain data, everyone can do that. (Imagine notifications for all the regrettable purchases you have made through Uniswap or OpenSea.) Jokes aside, data is weaponised through using it for product insights that retain users for longer amounts of time, often causing them to feel more miserable. The algorithmic feeds we see on YouTube or Twitter come from such data.

During bull markets, the value proposition for a person spending $1,000 on a data subscription is quite easy. If they have a portfolio of $100K, having a slight edge that generates a 1% outperformance in their portfolio justifies the subscription. In a bear market, when these traders are (possibly) liquidated, that subscription is the first to go out the window. Therefore, consumer-faced data products are challenged by a lack of moats and revenue that sways with volatility in the market.

This is part of why we see many VCs rush to back API vendors focused on blockchain data. The revenue these firms have is more predictable and sticky when they focus on selling to other businesses. Often, those contracts can be yearly, so the effort and time taken to make a sale is justified for the period during which it brings in revenue. What, then, is the plight of a data product being sold to customers?

We have been thinking of this internally. The following is a brief mental model on how data products have been differentiating, the opportunities we are seeing and the challenges that lie ahead.

Labels as a moat

How do you differentiate when everyone is building visualisation layers atop the same data? You bring your own elements to interpreting the data. Nansen pioneered the model in 2020 when they launched labels atop their product. Instead of noticing that somebody moved one million of your favourite DeFi tokens to the exchange, you could now see that the person moving them is the seed-stage investor of the tokens. And since Nansen was the only player with that data, users flocked to Nansen at scale.

This was a strong enough moat for a good two years. Nobody had a similar offering until Arkham released its product mid-last year. With its token due for listing on Binance in the coming weeks, Arkham has solved token distribution. But the company has also changed the model quite a bit. Instead of using ML/AI (or whatever new voodoo magic the data scientists are into) to label the wallets, they simply crowd-source it and incentivise users with a token. The stranger part is that Arkham has no expenses here for rewarding users. The token is a made-up asset that finds value from speculators on exchanges

Arkham’s genius was in clubbing together multiple wallets held by the same fund and allowing users to see aggregate data on what a fund may be doing.

Since Arkham does not have a paid subscription product (like Nansen), the bulk of the value generated (for investors and founders) will come from the token. Presuming Arkham sells the token at a meaningful valuation, the dollar figure of those sales may far exceed the cash flows a subscription product sees.

So what's next? Since labels may no longer be the strong moat they once were, firms may expand to offering context for a given transaction. Let me explain.

Context as a differentiator

The primary consumers of blockchain data products in subscription models are traders or funds investing in digital assets. Soon enough, that subset may change into gamers, musicians and, well, real estate developers. Last year, during the metaverse boom, I came across data platforms using AI to predict possible land price surges in Decentraland using user activity metrics. I would presume a large chunk of those businesses have now had to wrap up, given the lack of activity. But the tech was real — and well ahead of its time.

Wemeta.world was one of the startups offering metaverse analytics during the boom phase of the last few years.

As sectors like gaming and music take off on-chain, we may see a new class of analytical products that surface insights that are highly contextual and relevant for a new audience subset. One place we have seen this (in our deal flow) is with products focused on giving retention and consumer data about dApps. Say you have a dApp that is a competitor. You can realistically map out every wallet address that uses it, filter it to show the top 1,000 users and see how engaged they are in a 30-day period. All that data is there on-chain.

But as a founder, you likely don't want to spend 10–15 hours cleaning up all that information. A new class of analytic tools may soon allow teams to map out wallet addresses to Twitter handles, making it easier for marketing departments at B2C dApps to target customers considerably better. In each of these instances, the moat comes from a firm's ability to interpret on-chain data and provide it in a context-relevant way for the teams consuming it.

The asset is not the data but the IP that enables interpreting these on-chain events. This makes me believe the next unicorn in on-chain data won't be focused on trading but will be an enabler of B2C use cases.

On-chain feeds and perspectives

We have spoken to at least three teams building algorithms-as-a-service for Web3 content. The premise is simple. As on-chain content explodes, users will need services that curate and surface the way we see on Lens or Mirror today what is trendy and interesting. There are multiple ways to do it. You can check whose content is being collected on Lens the most, create a social graph that verifies a wallet's social signal based on the other wallets interacting with it or check the smart contract interactions it has had.

Would you rather follow Sequoia's wallet or be the first person to have interacted with YFI's wallet? (I don't quite know the answer to that. But here's a random trivia: as I write this, there's a $750 bounty on Arkham for sharing all of Sequoia's wallets publicly.)

There are multiple ways to go about it, but in my view, services that help users tweak, iterate or develop their own algorithms to consume on-chain data would be key contenders in the next market cycle. These products would enable users to see when their favourite artists or writers have issued content they can collect or use for access to events.

The number of writing related NFTs on Mirror have crossed over a million in the past few months.

There are two ways this can evolve: users may log into their content clients and pick and choose algorithms best suited for them, or users may want to tweak the algorithms to their own preference from multiple platforms. Think of getting the best content from Twitter, Instagram, YouTube and so on in a custom client that curates content based on your preference.

It is a far-fetched, utopian vision for now, and we are still figuring out how the firms in our deal flow will figure out distribution. However, there's a different way to look at data products, and that is through perspective.

Let me explain what that means with an example from Saurabh. A user trading at a decentralised exchange needs exact data on the price and time at which they sold (or acquired) an asset. All such data is needed for the taxman; however, none of it is available today with precision on subscription products like Santiment. The reason is quite simple. All such products are built from the perspective of a trader looking to gather data on the usage of a decentralised exchange. Not of that of a consumer looking to export their own data as you can from an exchange like Binance. (Saurabh’s note: Some of these products exist today but are restricted to enterprise clients).

It is not what is on-chain already alone that matters. Data that is waiting (and competing) in memory pools of different node networks is also critical from a sophisticated trader’s perspective. Data providers that run nodes across various providers and help aggregate their mempools are of immense value to those trading in size. We have not yet seen firms specialising in that kind of data scale to size just yet.

There are a new crop of data products solving for perspective when it comes to DeFi. But if you extrapolate the concept to more nascent themes like gaming or music, you will see that the bulk of them barely consider what is needed by a user that is not looking to speculate would want. It is quite possible that creating discovery graphs of on-chain music is not a profitable endeavor, and the TAM for that is less than a thousand users as of today.

But until that tooling evolves, we may be running around in circles with data in a few ways

API providers will compete with one another on pricing and lose B2B clients to one another as they race to the bottom on price;
B2C-focused (primarily traders) products will struggle to generate subscriptions in a bear market and see the opportunity that emerges with issuing tokens; and
Those focused on emergent themes (metaverse, gaming, music) will struggle with a small TAM.

These are all battles worth picking if you have a long-term bullish thesis on Web3 as an industry. In the last cycle, one of my favourite analytical teams closed shop in March. Had they stuck around a quarter more, it is quite possible they would have been worth a few hundred million.

As with most things in life, the outcomes for on-chain data products are neither predictable nor consistent. All you can do is turn up and build.

Signing out to turn up and build (err write) our long-form for this week.

Joel John

P.S. I could not delve into the dynamics of data networks with tokens (like Covalent), but that will be for a long form in the coming weeks.

Disclosures

I am an early stage investor in Nansen.
I was an early stage advisor to Covalent.
We are looking at multiple data-protocols as Decentralised.co for both commercial partnerships and active investments.

If you liked reading this, check these out next:

- Mapping The Data Landscape

- A New Medium

- The Data Must Flow

- Matching Engines

Decentralised.co

The Data Wars

Building monopolies off public goods.

Labels as a moat

Context as a differentiator

On-chain feeds and perspectives

Discussion about this post