Let's discover the best web3 projects.
Understand the value of Web3 from the perspective of "data market"
By Frederik ·  
Web3 based on blockchain technology is an innovation to the traditional Internet. It is hoped that it can solve many problems through the combination of "algorithm + incentive mechanism" and provide a possibility for realizing the data market.

Web3 based on blockchain technology is an innovation to the traditional Internet. It is hoped that it can solve many problems through the combination of "algorithm + incentive mechanism" and provide a possibility for realizing the data market.

How important is data?

1. Changes in Production Modes and Migration of Organizational Forms

Since the development of human society, the productive forces have undergone several changes. Changes in productivity bring about changes in production methods, which affects the organizational form of production because production organizations are created to adapt to production activities after all.

Pure production requires bartering to meet demand, often inefficient and cumbersome. In response to the need for increased efficiency, money emerged as a general equivalent for exchanging commodities. As a result, the circulation market began to be gradually established, and the commercial activities based on the circulation market began to prosper day by day.

I think that human beings have undergone three types of changes in production methods so far:

First, with the appearance of tools as a sign, it entered the farming society from the primitive society. Then, through the use of various tools such as stone tools, bronze tools, and iron tools, human beings began to transform nature according to their own needs, grow rice and wheat, raise poultry, and settle down. This period was dominated by self-sufficient production activities (agriculture, cottage industry). With the development of civilization, some commercial activities gradually emerged (the word "merchant" originated in the Shang Dynasty).

With the development of society, products are becoming more and more complex, self-sufficient production methods are more and more difficult to meet individual needs, and the proportion of commercial activities is also increasing. Nevertheless, it has lasted for thousands of years, and many commercial institutions in modern society have begun to take shape at this stage, such as banks and customs.

Second, marked by the invention of the steam engine, the handicraft industry entered the machinery industry. Coal and steel solve the energy and material problems of productivity change, respectively, while steam engines deal with labor efficiency problems. After all, the strength of human beings is not physical, and repetitive and inefficient production will eventually encounter upper limits and bottlenecks. The emergence of machinery (including the later electric revolution) liberated human hands and improved production efficiency.

In this regard, the mode of production began to evolve in the direction of the professional division of labor. By enslaving machinery, human beings had more time to develop science and technology, and humanities and civilization could move forward on a more complex and diverse path. The liberation of the production side has created the prosperity of the circulation side, commercial activities have begun to explode, and the modern enterprise system has already started to take shape.

Third, with the emergence of the Internet as a sign, from machine production to information production. As its name implies, the Internet is a network of interconnected computers. The development of civilization generates a lot of information. Taking accounting as an example, human beings first recorded the quantitative relationship in economic activities by means of knotting ropes and carvings. They were recorded on paper until the invention of papermaking.

As civilization evolves and production activities become more and more complex, the need for a set of clear and understandable accounting rules begins to grow stronger. They were slowly developed to the double-entry bookkeeping method in accounting that we are familiar with today.

However, the information produced does not have the opportunity to play a greater value. In the long history, they have either been recorded and deposited in a corner where no one has been canonized; or they have been forgotten and dissipated in the clouds of the past.

Until computers (in a broad sense, computing chips) replace paper and pencil as a tool for carrying information, human beings can record and share information in a more efficient and broader manner. In the context of the Internet, production activities and commercial activities have rediscovered the value of information, so that information is not only a product but can also assume the role of means of production.

Before the emergence of Internet products, of course, information can also be used as a means of production, but this means a high cost; and the emergence of the Internet makes information digitized, which gives it a very important feature: zero marginal cost. (A simple understanding of marginal cost is how much the cost of each additional quantity of product will increase)

In fact, another great advantage of information production over machine production is network externalities (network effects). Network effects mean that every addition to the network has a positive effect on existing nodes. This essentially still comes from the property of zero marginal cost of information - each newly added node will share a part of the new information with all nodes in the network at zero cost (this is the origin of positive utility).

Zero marginal cost and network externalities endow information production methods with some very scary properties, such as rapid expansion and natural monopoly. However, after understanding these two points, it will be very easy for you to understand why Internet companies can create value beyond traditional manufacturing in just a few years, understand why startups in the Internet industry always like to burn money and understand why Chinese Internet companies have recently started Go downhill.

The transformation of production methods based on the Internet has also affected the corresponding organizational form. According to the theory of Coase, the master of institutional economics, enterprises exist because their transaction costs are less than the market.

In the Internet-based market, information has zero marginal cost; that is to say, the transaction cost of enterprises must become lower to be able to adapt, and the original vertical management form must begin to transform into horizontal coordination. Management systems that focus more on internal collaboration, such as OKR (Objective and Key Results Method), have also begun to replace the original KPI (Key Performance Indicator) system.

2. The Internet's reconstruction of traditional business models

With the continuous change of production methods, the focus of human economic activities has also begun to shift. Compared with material production, information production has received more attention due to its broader development prospects. Therefore, in addition to the original business activities on the Internet, it will be imperative to apply the Internet to transform traditional industries.

The existing transformation methods have two directions. One starts from the production process, and the goal is to improve production efficiency. For example, Industry 4.0, which was called rotten a long time ago (2013, Germany), improves existence through "interconnection + intelligence". Production system, the industrial division of labor, logistics management, etc.; the other is to reconstruct business models, such as sharing economy, information platform, online shopping, social networking, etc.

Traditional business models are linear. Suppose you want to buy a thermos cup (why the first thing I think of is a thermos cup), your first thought is to go to a retailer such as a supermarket/shopping mall; you would not say that I would go to the manufacturer to get the goods first, and the manufacturer usually would not Here you go; you wouldn't even say I wish my thermos were made of titanium steel to find a steel mill further upstream. A complete chain from upstream material suppliers to midstream manufacturers (and then to downstream retailers) is the industry chain.

The production of manufacturers is also relatively blind. Why do you say that? Because the manufacturer has its own account, one end of this account is cost and the other end is profit. Profits come from downstream orders, and usually, those whose conditions are more suitable will take orders. Consumer needs cannot be communicated directly to manufacturers. Broadly speaking, every node in the industry chain cannot directly transmit information and value with non-adjacent nodes at a low cost.

The reconstruction of the Internet is to turn the "chain" into a "network".

In the network, any node can be connected to each other (unless the leader does not allow it). Consumers can bypass retailers to find manufacturers directly, to wholesale, or customize products (the former means that the boundaries of traditional roles begin to blur, and consumers can become retailers if they want.

The latter means that each node of the industrial chain has more choices, which is beneficial to breaking vertical monopoly and improving efficiency); it seems that the role of retailers is deliberately eliminated, but it is not. The Internet actually emphasizes the role of retailers' information intermediaries, because it costs consumers to go directly to manufacturers, and if retailers can integrate and match information well, they can make profits.

However, we know that distributed systems will bring a lot of redundant information. If the Internet just turns "chains" into "networks", then information blocking and information interference will follow, and efficient and accurate matching between information cannot be completed. The second point of the Internet's reconstruction of business models is the emergence of platforms.

What the platform does is, in essence, information matching. After the linear traditional industrial chain is reconstructed into nodes by the Internet, there needs to be something to realize what was originally realized by the industrial chain, that is, matching supply and demand information. Manufacturers go to the B-side (business), and consumers go to the C-side (customer).

Consumer demand for a certain type of product can be captured by the producer, and when enough of the same demand appears across the entire platform, the producer's production becomes profitable (diminishing marginal cost).

As we mentioned earlier, there are two characteristics of the Internet for production: zero marginal cost and network externalities. When more and more nodes are connected through the platform, they will gradually become path dependent on the platform, which means that the platform has a stronger and stronger voice in production/commercial activities. The right to speak means the right to set prices, and the zero marginal cost brings almost zero cost to the platform.

Therefore, pricing power almost means a higher profit margin for a single node, and network externalities bring the platform to accelerated node entry. When both profit factors are increasing at an alarming rate, it is conceivable how much a successful platform will benefit.

Let us explain the following three previously mentioned issues in this regard:

Why can Internet companies create value beyond traditional manufacturing in just a few years? Why do startups in the Internet industry always like to burn money? Why have Chinese internet companies started to go downhill recently?

Problem one has been solved. The second problem is that the platforms in a state of competition are faced with the instability of the right to speak and the multiple choices of new nodes. Even if there is a similar opponent, the result is uncertain. (Typical cases such as the bike-sharing war) and constantly financing and burning money to compete for users is to make users have no choice in the future, and then use their right to speak to seek profits. (The case is DiDi)

The essence of the Internet platform business model is "winner-take-all."

But in fact, the platform can do more than this. If it interferes with the normal development of the market just because of the characteristics of the platform itself, this kind of behavior is short-sighted and unsustainable. If burning money wins, it will be necessary to "tax" nodes in the future to make up for the money that has been burned.

At this time, a new platform with good strength will appear, and it is easy to attract traffic through better services and lower prices. Others are debt-free at this time, and you? (The case is Hello after the bike-sharing war)

Network externalities do not mean pure moats, but "good service = extremely strong moat" and "bad service = building will collapse". This unhealthy business model will not hold for a long time.

Back to what the platform can do.

As mentioned earlier, the Internet's reconstruction of the industrial chain is to change the "chain" into a "network", and the platform fights to snatch these nodes. But they ignore the premise of network externality is the path dependence of nodes on the platform, and also ignore the difference between nodes. Taking online car-hailing as an example, drivers and passengers are two nodes of different nature, and the consumption behavior of passengers taking taxis is more random.

Pay more attention to the result of "getting a car to the destination". As for the discount, which platform is placed in the back, believe me, passengers will download every APP in the online car-hailing war, and they will not miss the free prostitutes; The driver is different. The driver and the platform are more like a new free employment relationship. Although they will use multiple apps simultaneously, they are well aware of how each app treats them.

That is to say, drivers are more likely to develop loyalty and play a more important role in the behavior of taxis (drivers are service providers, drivers do not blame the platform for bad passengers, and passengers encounter bad drivers) The platform will inevitably be to blame). So the goal is to use incentives to align the interests of drivers and platforms as much as possible.

Whether it is a subsidy or any measure, it should be as biased towards the driver as possible. Someone said, what about the passengers? Don't forget that now in the context of network externalities, the latter of the two choices for passengers (taxi, ride-hailing) is still the best choice (but the reward is slightly less).

Therefore, by balancing the interests of the life cycle, more resources are poured to provide long-term incentives for drivers, so that their interests are consistent with the platform; the passenger side will give priority to providing a more convenient and comfortable experience than taxis (provided by the driver), and economic incentives are second. , is a more reasonable and healthier way to play.

Another point is, horizontal competition between platforms is better than vertical extension. If the platform can use the network externalities it obtains to benefit upstream and downstream, why not have user stickiness? If not, and there are external incentives for users to break path dependencies, the network externalities of existing platforms are threatened.

All of the above is about the Internet, which exists between computers (people) and computers (people). What if the Internet of Things also joins? The connection of computers (things) to computers (things), and computers (things) to computers (people) will make the network a multiple of the growing power level. Think about how many things we own on average, and how much each new node adds to the network complexity can be understood.

The reconstruction of traditional business models by the Internet/Internet of Things is far from over. The "information production" of the Internet is essentially the reuse of data generated by nodes in the network. From a certain perspective, data is to the Internet what energy is to modern industry.

3. The data island of web2

As mentioned earlier, Internet companies complete information collection and matching by establishing platforms and make a lot of profits by utilizing the information production characteristics of zero marginal cost and network externalities. With the increasing development of the Internet of Things, big data, cloud computing, artificial intelligence, and other technologies.

Human life will become more and more "digital": using digitization to solve payment scenarios, solve workflow, solve social connections, and solve financial business needs... In this digital migration, human "online" time will continue to increase, more human activities will be recorded as data stored on the Internet.

Think about today, sleep monitors can get your sleep data, smart homes can get your life data, smart travel tools can get your movement trajectory, ubiquitous monitoring can get all your body and behavior data... And in the future, The addition of the Internet of Things will only enrich your data database. Big data and cloud computing will allow algorithms to depict your digital image through data, and will accurately locate the connection between data and individuals through search...

The data ecology of web2 is obviously difficult to meet the increasingly complex data production and demand activities.

Big internet companies make money by monopolizing user data, but they don't inherently own that data - they just get it by offering free services; they also don't have a well-established mechanism to protect it (obviously, also There is no incentive to do so), privacy leaks become the norm; data is stored on their central server, and they don't deliberately record the details of every copy.

The most important thing is that different institutions have their own databases, which come from an invalid and repetitive collection; the storage and management of data are not systematic, and there is a lot of distortion; data islands are formed between institutions, and there is a lack of interoperability measures; illegal data transactions frequently occur. , the cost of trust is extremely high.

When web3 and the Internet of Things arrive, data will grow exponentially. If the above problems are still unsolved, how many inefficient market transactions will be born? The application value of new technologies will be greatly reduced.

Data silos don't work. Humans are social animals, and so are data. For data to take advantage of the two characteristics of information production, it must be open and interconnected. With the advent of various new technologies, there are some possibilities for the application of data. In the second part of this article, I will elaborate on the current difficulties in using data.

What problems exist in the use of data?

Modern commercial activities are built on the market mechanism. According to the different exchange objects, the market is usually divided into the commodity market, service market, technology market, financial market, labor market, and information market.

Among them, the technology market can be divided into technical goods and technical services, which can be cut off; and services can also be packaged as commodities in essence; therefore, from my point of view, it is generally divided into the commodity market, labor market, financial market, and information market. (The reason why the labor force is singled out is that there are people behind it, and the behavior of people is complex and unpredictable, and cannot be simply defined as a commodity)

The first three are what we can often come into contact with, but the concept of the information market is relatively abstract. As the name suggests, the exchange object in the information market is information, such as business information, financial information, talent information, etc. Most of the information exchanged in these known information markets, such as real estate agencies, headhunters, HowNet, and user information transactions, have specialized information intermediaries. Users have to pay for this kind of information. Otherwise, they need to pay a lot of costs to find it.

As mentioned above, the information currently available for trading only accounts for a tiny part of the data produced by the Internet, and it is basically in a gray area. In order for data to drive the digital economy like energy drives modern industries, it must have prevailing industry standards, compliant markets, and appropriate transaction rules. And this is difficult.

1. Privacy Boundaries and Privacy Protection

The first issue that needs to be mentioned is privacy protection. I mentioned earlier that a lot of data will be logged:

Sleep monitors can get your sleep data, smart homes can get your life data, smart travel tools can get your movement trajectory, ubiquitous monitoring can get all your body and behavior data...

This data is valuable to the companies that provide the corresponding services. For example, if the smart air conditioner detects that you like to turn on the air conditioner in winter, this piece of data may be purchased by a manufacturer of "Barabala ion heater", and then push the advertisement of his product to you that it is "healthier and more energy-efficient than air conditioner"... Manufacturer targeting Buying 1,000 pieces of such data may cost far less than advertising on the homepage of a certain network. Ideally, of course, the money is paid to you, after all, you are the owner of this data.

Here's the question: what if you don't want to be known that you like to turn on the air conditioner?

The most extensive way is of course to directly remove the smart air conditioner and replace it with an ordinary air conditioner; but what if the chip of the ordinary air conditioner can also collect data? It may be more reliable to go to the second-hand market to find an old-fashioned electric fan.

Smart refrigerators are the same, it is best to replace them with ice storage in cellars; you can't take high-speed trains, and you can't pass toll booths. In order to go to other places, you have to walk through unmanned villages... However, after one operation, you find that your quality of life has dropped sharply, and technology is clearly in Progress, but you degenerate into a primitive.

- The exclusion of new products and the exclusion of data collection is clearly unrealistic. The point is that individuals should have the right to choose independently, and they can choose what kind of data is collected and what kind of data is not. But is this really realistic?

Friends who have studied economics know that a concept called "moral hazard" comes from information asymmetry afterward. That is: if the user chooses what kind of data to collect, the user can choose not to provide any data, or to provide false data for profit from the data because no one wants to know some real data about their own life.

If things turned out like this, there would be no point in talking about data, and the digital economy would cease to exist. 'Cause nobody's going to go through all the hard work and finally learn that your name is "Cambnettle Swizzy Bucknee Bwisda I won't give you your real name. Guess it for yourself, but I took the money first." Sprinkle oil Lala Zhang".

Therefore, data collection must be objective and tacit, which requires a sufficient degree of privacy protection that is recognized by the users themselves. At this point, the current cryptography technology has some directions.

But the real question is often philosophical: how to define the boundaries of privacy? Should the boundaries of privacy be chosen by individuals or groups? How to balance regulation and individual rights? How to deal with externalities of privacy?

For example, if the data is collected by default, it is up to the user to choose whether to encrypt the collected data. In this way, in the event of a critical event, the government can choose to enable the "encrypted" data selected by the user, and part of the data that is usually involved in the business is also handled by the user. Choice, and the benefit to the user, seem like a good solution.

But in reality, what if the person is a terrorist, and the data he chooses not to release contains information that can find him? Some people say, let the government enable it! The problem is that the government does not know who the terrorist is before it is activated. In order to know who it is, it can only fully activate it, which will affect other innocent users (privacy is leaked); at the same time, the terrorist's evil will cause damage to others. negative externalities. How to deal with these externalities?

Privacy is like literature, and different people may have different understandings of it. I don't think showing the neck is a big deal, and some people may be very disgusted. This leads to the fact that if a general standard is implemented, the "privacy" of some people will always be violated. Such a general standard can only be as broad as possible, but if it is too broad, it cannot be called a "standard".

2. Data externality and establishment of property rights

Talking about the externality of data, we must first introduce two concepts: non-rivalry and non-exclusivity. These two concepts are used to define public goods, and externalities exist in the problem of public goods.

**Non-rival means that when one person consumes a product, it does not reduce or restrict the consumption of that product by others. ** Generally speaking, this means zero/low marginal cost (so internet products are usually non-competitive).

Most of the data we have seen can be reused, and will not self-immolate or change the content just because it is used once. The difference is that for college admissions if I squeeze into the scoreline, someone will be squeezed out, so the college entrance examination is "competitive".

**Non-excludability means that when one person is consuming a product, it cannot be excluded (or the cost of exclusion is high) that other people are also consuming the product. What does that mean? For example, if you go fishing in a fish pond, you have to let others fish (unless the fish pond is yours); or you go to the road in the middle of the night, and you see another person who is crossing the road, but you can't hit him unless you give him A lot of money asked him to leave, but if he left and another person came to run over the road, you still can't hit him, because everyone has a share in the road.

What satisfies non-rivalry and non-excludability is public goods. There is a famous game in the problem of public goods: "tragedy of the commons", which means that everyone wants to use public resources as much as possible for personal gain, which eventually leads to the collapse of public resources.

This is because each person's use of a common resource creates a "negative externality" for others. We know that on the Internet, externalities are positive. This stems from the zero marginal cost of information production, and public resources obviously do not have this advantage.

Whether the externality is positive or negative, the existence of the externality means that property rights are not clear enough. The market cannot make reasonable prices for commodities whose property rights are not clear enough. How to treat the externality of data?

First, we need to classify the data on the concepts of non-rival and non-exclusivity. For non-competitive and non-exclusive data, it should obviously be provided by the government/public organization, and the proceeds go to them. Such as weather forecasts, macroeconomic data. One characteristic of this type of public data is that it has nothing to do with individuals. This is the clearest one.

For competitive/exclusive data, since the subject of rights cannot be clearly separated in the production process, it is impossible to separate the public content and private content in the data. For example, a company wants to find investment opportunities in X city through the life data of an ordinary person in X city. A total of 100,000 people in X city are willing to provide such data, but the company only needs 10,000 pieces of data. This type of data has externalities, because part of their content is shared, and the adoption of any piece of data will cause other data to be affected by "negative externalities" and depreciate.

For another example, in addition to knowing the data of my listening songs, the software that records the data must also know because I use this software to listen to songs. Except for my behavior part, the rest is essentially produced by the software, does this mean that software also owns part of the property rights of my Duolingo data?

Any behavior that people do must ultimately interact with the external world; no matter whether this interaction is physical or manifested through living conditions. This makes interacting objects usually present in your data, whether they are objects or people. How can we establish clear property rights for data when externalities seem inevitable?

3. IoT and data collection

The first two points are more or less related to data collection. For example, should data collection be spontaneous but selectively controlled? How does individual-controlled data collection ensure authenticity? How can spontaneous data collection ensure that privacy is not violated? Scope, method, and scale of data collection?

The existing data collection may mainly occur in the behavior of "surfing the Internet". For example, shopping habits and action trajectories can be obtained through payment and consumption records; individual thoughts and cognitions can be inferred through online speech; personal preferences can be obtained through browsing records, application download records, etc. However, behind the smart home, autonomous driving, monitoring, etc. may be another way of data collection with wider coverage - the Internet of Things.

The Internet of Things will fill the lives of individuals with machines equipped with high-speed computing chips. The daily work of these machines will accumulate a large amount of data, which will be matched into the database through calculation and processing. These richer details will make the big data's portrait of the individual clearer, from simple behavior habits to thinking cognition, spiritual characteristics, etc.

On the one hand, this is of great significance to the digital economy and social governance, and on the other hand, it also triggers an Orwellian personal privacy dilemma – not only from the anxiety of being constantly monitored but also because once these important data are leaked, the basic can declare the "death" of citizens in a digital age.

Therefore, the degree to which the Internet of Things should be achieved in the data collection process, what rules should be followed, the reliability of the equipment, the identity verification of the equipment, the accounting system of the equipment, etc., must be agreed in advance and strictly followed.

4. Data value matching

When it comes to the data market, a problem that has to be said is the value matching of data.

What's the meaning? Compared with the commodity market, we are very clear about what each commodity can do, and it is based on this that we give the expected price in combination with our own needs. For example, I am a farmer. I can chop ten catties of firewood a day, and a catty of firewood can be sold for twenty yuan. I want to go to the market to buy an axe. A piece of firewood, I'm so tired of chopping firewood, I should earn 3,000, so the expected price of the axe is less than 3,000.

But data markets are different. There is a paradox in the discussion of the value of data: that is, if I don't know the content of a piece of data, I can't determine the value of it; but once I know the content of this piece of data, this piece of data has no value to me. This feature makes it very difficult for the data market to naturally complete value matching.

Fortunately, big data technology enables data whose content cannot be seen at a glance to complete value discovery. Data demanders can search or mine the data they want, and now they face the difficult problem: how to determine the "correctness" of the data content?

That is: if low-value data is disguised as high-value data, how can data demanders who cannot view the content in advance quickly filter to meet their needs?

In cryptography, a technology "makes the verifier believe that a certain assertion is correct without providing any useful information to the verifier" is called "zero-knowledge proof". However, how can the provider of zero-knowledge proofs ensure that his motives for providing correct assertions are not affected by high interests? It is a good idea to design an ex-ante incentive mechanism, but if the exact value of the data cannot be known, how can the amount of incentive be adjusted?

Even if the "correctness" of matching data content and data title is solved, in the face of massive transaction demands, what is obviously needed is a system with high concurrency, high performance, and automatic execution of transactions. Fortunately, the blockchain is already on the way to solving the problem.

5. Data Valuation

There is another point that is easily overlooked: data valuation. Since a transaction is to be made, there must be a generally accepted valuation system, otherwise, the market will be chaotic. Current data valuation methods include:

The cost approach uses the cost of collecting, storing, and analyzing data as a benchmark for data valuation. An obvious problem is that most of the data is not specially produced, but an appendage of other activities; most of the data is collected, stored, etc. at the same time; the property rights of most of the data are still difficult to define. This makes their cost difficult to divide.

Income method, forecasting the future cash flow of the data and discounting it. However, the utility generated by the data is difficult to model. Taking the value matching just mentioned as an example, the data may be worthless if the matching is wrong. Should this part of the probability be discounted into the expected value? In addition, the utility of the same data to different users is entirely different, and it is difficult to formulate a common standard.

The market method is based on the analogous valuation of the transaction price of similar data in the market. This requires a relatively complete market mechanism, with a large number of transactions and data accumulation. I personally think that market law is the most reasonable, but there are still many problems.

For example, due to value matching problems, data transactions are not stable, such as opening a blind box to garbage, which will be reflected in the market and affect valuation (data may be low due to high matching error rates and non-content reasons). Valuation). For another example, the data is non-standardized, and how to define similar data will also be a big problem. If the definition is too fine, it will affect the cumulative depth, and if the definition is too broad, it will be useless...


web3 enthusiast novel writer metaverse original