The Bull Case for Owning Your Data

And not relying on platforms to always be there

Feb 05, 2024

On the 19th March 2019, it was announced that Myspace had “lost” every piece of content uploaded to it prior to 2016. In fact, it had happened at least a full year earlier, with no admission until that point. Obviously, Myspace in 2019 was something very different to the first digital home that many of us found back in the mid-2000s, but, it’s quite gutting to realise that all the shitty demos and EPs that I released with friends in our bands at school were lost to the winds of time.

In August 2023, Spotify announced that it was closing off access to historic data for artists, defined as more than two years old. Artists were given notice ahead of time, if they saw the message, and were able to download that data before the shutoff date. But, no doubt, many missed the memo and lost all of that crucial historical information about their fans and songs.

In December 2023, just before Christmas, Spotify announced that it was making 17% of its workforce, 1,500 people, redundant. Our analysis of representative data (c.40% of the leavers) estimates that c. 40% were from Engineering & Data departments and c.20% from Product. Following that, this year, it appears that Spotify is currently positioning to shut down its open API access to developers.

For every successful Creator-Tech startup, of which there are few, there are hundreds of others that litter the graveyard of failed expectations; whose sole crime was to rely on platform data and APIs being maintained as the foundation for their functionality.

Platform risk is real, just as Games app developers building on Facebook found in 2017 and Social app developers building on Twitter found post Elon Musk’s take-over.

Historical context

Things were simpler back in the day. You’d listen to a band on the radio, read about them in a magazine, buy a CD, and go to a gig. The relationship between you and the band that you were interested in was relatively one-way, and the band had hard numbers on their popularity through sales and concert attendance.

The internet changed everything, as with most areas in life. Suddenly, the potential market for an artist was not local, national, but global from day dot. Instant connectivity that defied physical boundaries created entirely new ways of enjoying and sharing content and up-ended the model that to that point had been canon.

The social internet (commonly known as “web2”) further accelerated this development, not only were the products that an artist was making available globally, but it became increasingly friction-free to be constantly present in the lives of their fans.

Of course, this had second-order effects, on the “value of music”, on the level of competition that artists’ faced, and on the work-life balance of creators, but it also centralised the power of a previously relatively decentralised supply chain into the ears and thoughts of a fan. For in this period, we saw the rise of the “super aggregators”, who provided a home to practically all of this activity that would have previously equivalently occurred across a range of broadcasters, record stores, and venues.

These platforms, exploding to market dominance through zero marginal cost growth and built on big data became the new gatekeepers between a creator and a consumer, holding all of the information about their relationships.

Examples of the risks associated with this new model for content are highlighted in the opening passage. However, these threats, which are generally understood, are on the whole balanced out by the value that platforms provide through practically free distribution and connectivity to the world.

That said, throughout all the post-internet years, there has been one single most important channel for an artist, email. By all rights, email should have been replaced years ago by one or many of the more viral and networked communication tools, and yet, ask a professional manager whether they would prefer an Instagram follow or an email address and there will be a very clear response.

This is curious, is it not? But also completely logical when considering that the relationship between a creator and a fan has a lifespan far exceeding that of digital platforms. Email is a simple protocol that isn’t owned by any one party and can’t be killed. It also provides inspiration for one of the great narratives of blockchains and content.

Self-sovereign identity and data

First an admission, I haven’t actually read the Self Sovereign Individual. Mainly because I despise the author’s son, the politics surrounding it, and lament the impact that this school of thought has had on current-day society.

With that said the essence of self-sovereignty with respect to one’s identity and data is a far more interesting prospect. And despite being relatively lower memed than other narratives, such as decentralised finance, has spawned many interesting experiments in the years since Ethereum’s launch in 2015.

Some examples from the first wave of Ethereum application innovation. One of the first projects that I worked with was called Mycelia, set up by Imogen Heap, which focused on the idea of a “Creator Passport”, a tool for artists to own the reference data that the internet would read and display. There was (is) a project called Streamr that wanted to enable users to own their data and set a price against it. Within the Consensys hub and spoke structure there were several projects, such as uPort, focussed on self-sovereign identity and giving users control over their data on the internet.

More recently we have seen projects such as Disco continue development along this narrative, along with wider NFTs exploration, both “traditional” and “soulbound” (non-transferable).

There is promise, however, significant challenges exist.

A common stumbling block is that the value of an individual’s data is actually not that high, it is the aggregation and comparison of data that provided the foundation for super-aggregator growth. Furthermore, the reason for a user to engage down this path has not been sufficiently communicated, and user experience has not been sufficiently developed. I guess a case in point is this blog being published on Substack rather than Ghost or Mirror.

Data related to content

Before focussing on opportunities in the content industries, and especially music, it’s worth taking a second to discuss the problem space and the data that self-sovereign solutions can be built around.

As a recovering, and occasionally relapsing, person who works with copyright data, it seems like a good place to start. Copyright data, as many of you will know is the information that enables creators to state their ownership of their music, get paid for the use of it, state requirements for permission to work with it, etc.

The current state of play is not good; there is no single view of “truth”, no central database to reference, friction in the form of unresolved conflicts and disputes. Which makes it a common problem space for startups to target. The rationale as to why this is generally a bad idea is worthy of a standalone blog by itself, we have the scars; it is enough to say for now that there are a significant set of challenges surrounding copyright data, these challenges are impossible for a single party to “fix”, and that they are on the whole more “people problems” rather than “technology problems”.

Far more interesting in the context of this piece, are data points relating to transactions and engagement.

Within the streaming, download, social, and everything-in-between platforms that we consume media through online, there are social and action graphs that shape our experiences, be that our usage, preferences, identity, connections, message history.

In contrast to copyright data, transaction & engagement data is consumer-generated, directly related to and influencing user experience, and generally more open for individual companies to generate or aggregate. Generating new data points, as the opening examples highlighted, is the most important aspect to consider, as any project built on top of a closed database through APIs will at some point in the future very likely have the rug pulled from under it.

Opportunity with blockchains

As previously mentioned, there has already been significant development in the world of self-sovereign identity and data in crypto, however, there is still a lot more to be done before this becomes something that is general consumer-ready.

A useful framing is to think about tokens and wallets as vehicles for digital access control lists, opt-in user-owned cookies, which are ex-platform, that is, living separately from the platforms on which they provide context and impact a user’s experience.

Rather than the current status quo where platforms and search engines harvest our online activity as a basis for their business, we could be much more selective about what we want to expose to the world and shape our own experience on the internet accordingly. Imagine a “for you” view of the web that you control through the tokens and markers that you decide to pick up along the way. This cannot be done without “web3” technologies as it requires a degree of composability and permissionlessness that current internet platforms are not incentivised to provide.

Of course, there is a cold start problem here, it’s far easier and more profitable to build a platform business and generate and keep that data in-house, closed source. Which is largely the case when analysing most “web3” projects to date. We hit a significant free-rider issue, a major roadblock to the development of composable systems in the fat protocols, thin applications design.

This is an incentive design problem, but incentive design problems are what crypto does best. For example, the growth of L2 and similar protocols, such as Base, Optimism, Lens, and Farcaster, can promote building in this way through funding and infrastructure development.

With that said, there are some very promising signs, with a very recent example that we can point towards; Catalog’s “cosigns”. Cosigns are a “soulbound” token that a user can mint when listening to a track within Catalog Radio. Of course, as it’s crypto, tokens can be minted at a contract level and skip the front-end, however, it sets up a very interesting marker of time and place, in a wonderfully simple way. I would love to see further development whereby additional contextual data is minted into the cosign’s metadata, such as, the radio show that the user was listening to at the time of mint, who that radio show was compiled by, and who else was in the audience at the time. This would make cosigns even more interesting as tokens and as a basis to build on top of, while also pushing more traffic toward Catalog Radio itself. Furthermore, there’s an angle to allow for comments to be included alongside cosigns, which could function as ex-platform “Soundcloud comments”.

Think about this concept expanded across a wider industry. Simple pointers that users can collect and show that they pick up on their journey around the internet. Tools that aggregate and serve this open information to interested parties, for example, artists. Services built on top of this transferable data and tools enable creators to hold a greater number of relationships with their fans without being at risk of single mega platform decay and deletion.

This is the objective that some teams now, and likely many teams in the near future, are building towards. And there’s a high likelihood they will be successful…as long as they remain independent of the very same platform APIs that they are seeking to disrupt.

JUICE

Discussion about this post