Sweating the operator's data assets
Given how we wrote about environmental causes recently, we're going to engage in a little of our own greenery and recycle some old material on user identity whose time appears to have come. I'd like to show you a diagram I first drew up on a whiteboard over four years ago, and prompted me to believe that the real opportunity for network operators was to exploit their data assets.
The thesis is quite simple at heart: it's not just what you know about the customer that matters. It's what you know about what you "know" about the customer that matters. In other words, is this really your name, your address, and are these really your friends?
We've not been blogging recently because we've filled every moment of each day with client work. (You're welcome to join the queue, however.) One of our assignments involves undersanding the real assets of operators that can be used to resist competition from Internet players, or embrace co-operation with them. Indeed, this seems to be a recurring theme, and we've covered it with several client engagements.
So to make up for our silence, here's a diagram that summarises the competitive picture of identity and data assets of network operators vs. other players in the value chain:
Let's understand the model first, and then dig into the implications. You can find some of the credits and sources in the article linked to earlier.
A taxonomy for our metadata
Firstly, we're going to provide a classification of the data into four buckets, from highly personal and individualised to impresonal and collective:
- tier 0, the biometric and unchangeable you. Your fingerprint, iris pattern, DNA profile, height.
- tier 1, "mydentity", the named person (e.g. "George W. Bush"), chosen by you. You can go to a lawyer and change your name.
- tier 2, "ourdentity", the assigned identity (e.g. "The current resident of 1600 Pennsylvania Avenue, Washington, DC"), with the label or name chosen by others (such as the post office and city in this example) but assigned to you. This is a form of shared identity. The canonical telco example is a phone number.
- tier 3, "theirdentity", the inferred person (e.g. "living US presidents"), described as a class. Other examples might be segments you're assigned to for marketing or service purposes, or credit scores.
The lower tier data tends to be more expensive to collect or create. Some think that layer three should be called the "marketing identity", but that seems too narrow and prejudicial a definition to me.
And a taxonomy for our meta-metadata (yes, it's hard)
We then look at how much we know about the data in these four buckets, with increasing levels of certitude:
- Level 0: Anonymous data -- incomplete or partial profile of the user that can't be used to identify you. Furthermore, multiple data points supplied by the user can't be correlated in space or over time. That means it cannot be tracked back to me now; and I can supply different data at different times without the fact being detected. So the fact that I am 36 and male can be revealed to any 3rd party, and as long as they can't see me in person or hear me then I can equally claim to be a 23 year old woman tomorrow. There are no consequences of lying repeatedly. Privacy is assured. Consider a location-based service which gets a user ID and returns a map, and doesn't get to see your IP address (as you're hiding behind a telco proxy or firewall). In this case the location data is completely anonymous.
- Level 1: Pseudonymous data -- incomplete but traceable in time. Here, you have acquired a pseudonym. Cookies in a web browser are the classic example of this. A web site like Priceline that wants to prevent you from submitting multiple identical bids (themselves level 0 data) uses a cookie to track you. You can lie, just not over and over. A location service which sees my IP address can effectively use that as a pseudonym for me and track my movements.
- Level 2: Asserted data. This is a complete data field that is tied to a particular person, such as your name, address or government ID number. But there's nothing we know about the data we've been given. You can commit subsequent fraud or abuse, and if you gave a real name, address or number it has consequences. You could lie about who you are without any consequence. You might also not be lying, but could unintentionally give inaccurate data, simply through a typographical mistake.
- Level 3: Validated data -- complete and suitable for repeatable use. Now we're getting into the realms of "data quality". This is subtly different from level 2, and moving identity data from level 2 to level 3 is already big business. If I store my data in a profile, and keep giving it out, then it is more likely to be correct than if I have to enter it fresh each time. If someone checks my postal code against an address database, the data may not change one iota, but the meta data now starts to take on value.
- level 4: Verified data -- non-repudiable and the user has staked some collateral on it. This is the ultimate level, where we add in verification. A trusted third party asserts that the data is true to some degree, in the context of some liability relationship for that assertion. The American Express logo on my credit card acts as an assurance to a merchant that they will be paid in return for entering my credit card number into their system. The assertion is relatively weak, since traditional magnetic stripe cards are easily forged. The credit card number has a check digit and is printed on a card, which at least makes for strong level 3 data.
The telcos have the most valuable data hostage
The obvious conclusion from the diagram is that different players have complementary data sets. Fixed network operators put physical lines into physical homes, and know the location of the network end points. (OK, there are some seriously embarrassing stories to be told here, but still... the big picture is right). They credit check you, know who you call and interact with, where you travel when mobile. With multi-person households, they will often know the family structure. They could probably even work out if you're married or divorced with kids based on collective weekday and weekend patterns.
Now the diagram above is, inevitably, an over-simplification. Internet players expend a lot of effort validating your email address by sending you a welcome email, and verifying it by getting you to click on a unique link in the text. But their identity assets are generally very weak and narrow.
For companies like eBay who are involved in reputation and facilitating transactions, the telco or ISP at the end of the line could potentially be an ally in resisting fraud. It's a leak in the "end to end" model which forgets about the physical anchors of the network and any form of payment.
It's a business you'd like to be in
As a network operator, wouldn't you rather have the profits and valuations of some of these businesses:
- Experian, who CAPTURE credit and personal data via a unique network
- Google, who CORRELATE data between web pages and advert keywords
- Verisign, who DISTRIBUTE data such as digital certificates into browsers
- Amdocs, who MANAGE data on behalf of operators
- Oracle, who MANIPULATE data on behalf of everyone
The opportunity for operators is to be able to federate, project or exchange this data in a win-win-win for them, the partner and (most importantly) the user. This means protecting the privacy of the user to the maximum degree possible.
If Google can spin $165bn of market cap out of some matrix algebra from scraped hyperlinks off web pages, how much are those call detail records really worth?
We'll be discussing how operators should go about partnering, surviving and thriving in a data-centric world at our next Telco 2.0 Executive Brainstorm in London on 16-18 October.