🎉 [Gate 30 Million Milestone] Share Your Gate Moment & Win Exclusive Gifts!
Gate has surpassed 30M users worldwide — not just a number, but a journey we've built together.
Remember the thrill of opening your first account, or the Gate merch that’s been part of your daily life?
📸 Join the #MyGateMoment# campaign!
Share your story on Gate Square, and embrace the next 30 million together!
✅ How to Participate:
1️⃣ Post a photo or video with Gate elements
2️⃣ Add #MyGateMoment# and share your story, wishes, or thoughts
3️⃣ Share your post on Twitter (X) — top 10 views will get extra rewards!
👉
AI + Web3 Collaboration: Unlocking a New Landscape of Data and Computing Power
AI+Web3: Towers and Squares
Key Points
Web3 projects with AI concepts have become targets for capital attraction in the primary and secondary markets.
The opportunities for Web3 in the AI industry are reflected in: utilizing distributed incentives to coordinate potential supply in the long tail, involving data, storage, and computation; at the same time, establishing open-source models and a decentralized market for AI Agents.
AI is primarily applied in the Web3 industry for on-chain finance (crypto payments, trading, data analysis) and assisting development.
The utility of AI + Web3 is reflected in the complementarity of the two: Web3 is expected to counteract the centralization of AI, while AI is expected to help Web3 break out of its niche.
Introduction
In the past two years, the development of AI has shown an accelerated trend. The wave of generative artificial intelligence triggered by Chatgpt has also created a huge ripple in the Web3 field.
With the support of AI concepts, financing in the cryptocurrency market has significantly boosted. According to statistics, 64 Web3+AI projects completed financing in the first half of 2024, among which the AI-based operating system Zyber365 achieved the highest financing amount of 100 million USD in Series A.
The secondary market is thriving even more. According to the crypto aggregation site Coingecko, in just over a year, the total market value of the AI sector has reached 48.5 billion USD, with a 24-hour trading volume close to 8.6 billion USD. The significant progress in mainstream AI technology has brought clear benefits; after the release of OpenAI's Sora text-to-video model, the average price in the AI sector increased by 151%. The AI effect has also radiated to one of the cryptocurrency fundraising segments, Meme: the first AI Agent concept MemeCoin GOAT quickly became popular and gained a valuation of 1.4 billion USD, successfully sparking the AI Meme craze.
The research and topics around AI+Web3 are equally hot, from AI+Depin to AI Memecoin and now to AI Agent and AI DAO, the speed of the new narrative rotation makes it hard for FOMO emotions to keep up.
The term combination of AI+Web3, filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital. It is difficult for us to discern whether, beneath this glamorous exterior, it is a playground for speculators or the eve of a dawn explosion?
To answer this question, the key lies in considering: will it get better with the other party? Can we benefit from the other party's model? This article attempts to examine this pattern from the perspective of previous thinkers: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?
Opportunities of Web3 under AI Stack
Before delving into this topic, we need to understand the technology stack of AI large models:
Large models are like the human brain; in the early stages, they are akin to a newborn baby, needing to observe and absorb vast amounts of external information to understand the world. This is the "data collection" phase. Since computers do not possess human-like multi-sensory capabilities, it is necessary to convert unlabelled information into a format that computers can understand through "preprocessing" before training.
After inputting data, AI builds a model with understanding and predictive capabilities through "training", similar to how a baby gradually learns to understand the outside world. The model parameters are akin to the language abilities that the baby continuously adjusts. The learning content is categorized or feedback is obtained through communication with others, leading to the "fine-tuning" phase.
After children grow up and learn to speak, they are able to understand meanings and express feelings and thoughts in new conversations, similar to the "reasoning" of AI large models, which can perform predictive analysis on new language and text inputs. Infants express feelings, describe objects, and solve problems through language abilities, similar to how AI large models are applied to various specific tasks such as image classification and speech recognition during the reasoning phase after training.
AI Agent is closer to the next form of large models - capable of independently executing tasks and pursuing complex goals, possessing not only the ability to think but also to remember, plan, and interact with the world using tools.
In response to the pain points of AI stacks, Web3 has currently formed a multi-layered, interconnected ecosystem that encompasses all stages of the AI model process.
Base Layer: The Airbnb of Computing Power and Data
Hash Rate
Currently, one of the highest costs of AI is the computing power and energy required to train models and perform inference.
For example, Meta's LLAMA3 requires 16,000 NVIDIA H100 GPUs for 30 days to complete training. The unit price of the H100 80GB version is between $30,000 and $40,000, which necessitates an investment of $400 million to $700 million in computing hardware (GPUs + network chips). The monthly training consumes 1.6 billion kilowatt-hours, with energy expenses nearing $20 million.
The release of AI computing power is also one of the earliest intersections of Web3 and AI - DePin (Decentralized Physical Infrastructure Network). The DePin Ninja data website has listed over 1,400 projects, with representative projects in GPU computing power sharing including io.net, Aethir, Akash, Render Network, and more.
The main logic is: the platform allows owners of idle GPU resources to contribute computing power in a permissionless decentralized manner, increasing the utilization of underutilized GPU resources through an online marketplace similar to Uber or Airbnb, where end users can obtain efficient computing resources at a lower cost; at the same time, the staking mechanism ensures that resource providers face corresponding penalties when they violate quality control or interrupt the network.
Features include:
Aggregating idle GPU resources: The suppliers mainly consist of third-party independent small and medium-sized data centers, excess computing power resources from operators such as cryptocurrency mining farms, and mining hardware for PoS consensus mechanisms, such as FileCoin and ETH miners. Some projects are dedicated to launching lower-threshold devices, such as exolab, which utilizes local devices like MacBook, iPhone, and iPad to establish a computing power network for running large model inference.
Long-tail market oriented towards AI computing power: a. Technical aspect: The decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities of extremely large cluster-scale GPUs, while inference has relatively lower requirements for GPU computing performance, such as Aethir focusing on low-latency rendering work and AI inference applications. b. Demand side: Small and medium computing power demanders will not train their own large models separately, but will only choose to optimize and fine-tune around a few leading large models. These scenarios are naturally suitable for distributed idle computing resources.
Decentralized ownership: The significance of blockchain technology lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand, while also generating profits.
Data
Data is the foundation of AI. Without data, computation is as useless as floating duckweed. The relationship between data and models is akin to the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For current AI model training, data determines the model's language ability, understanding ability, and even its values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:
Data Hunger: AI model training relies on massive data input. Public information shows that OpenAI trained GPT-4 with a parameter count reaching trillions.
Data Quality: With the integration of AI and various industries, new demands for data quality arise from the timeliness, diversity, specialization of vertical data, and emerging data sources such as social media sentiment analysis.
Privacy and compliance issues: Countries and companies are gradually realizing the importance of high-quality datasets and are imposing restrictions on data scraping.
High costs of data processing: Large data volume and complex processing. Public information shows that AI companies spend over 30% of their R&D costs on basic data collection and processing.
Currently, web3 solutions are reflected in the following four aspects:
The vision of Web3 is to allow users who truly contribute to also participate in the value creation brought by data, and to obtain more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms.
Grass is a decentralized data layer and network that allows users to run Grass nodes to contribute idle bandwidth and relay traffic in order to capture real-time data from across the internet and earn token rewards.
Vana introduces a unique Data Liquidity Pool (DLP) concept, allowing users to upload private data (such as shopping records, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize specific third parties to use it.
In PublicAI, users can use #AI或#Web3 as a classification tag on X and @PublicAI to achieve data collection.
Grass and OpenLayer are both considering incorporating data annotation as a key component.
Synesis proposed the "Train2earn" concept, emphasizing data quality, where users can earn rewards by providing annotated data, comments, or other forms of input.
The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.
The current common privacy technologies in Web3 include:
Trusted Execution Environment ( TEE ), such as Super Protocol.
Fully Homomorphic Encryption (FHE), such as BasedAI, Fhenix.io, or Inco Network.
Zero-knowledge technology (zk), such as the Reclaim Protocol that uses zkTLS technology, generates zero-knowledge proofs for HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.
However, the field is still in its early stages, and most projects are still in exploration. The current dilemma is that the computing costs are too high, for example:
The zkML framework EZKL takes about 80 minutes to generate a proof for the 1M-nanoGPT model.
According to Modulus Labs data, the overhead of zkML is more than 1000 times higher than pure computation.
Middleware: Model Training and Inference
Open Source Model Decentralized Market
The debate over whether AI models should be closed source or open source has never disappeared. The collective innovation brought about by open source is an unmatched advantage that closed source models cannot compete with. However, without a profit model, how can open source models enhance developers' motivation? This is a direction worth considering. Baidu founder Robin Li asserted in April of this year that "open source models will increasingly fall behind."
In response, Web3 proposes the possibility of a decentralized open-source model market, which involves tokenizing the model itself, reserving a certain proportion of tokens for the team, and directing a portion of the model's future income flow to token holders.