AI's Cloudy Path to Zero Emissions, Amazon's Agent Builders, Claude's UI Advance, Training On Consumer GPUs
07-10-2024
Dear friends,
I continue to be alarmed at the progress of proposed California regulation SB 1047 and the attack it represents on open source and more broadly on AI innovation. As I wrote previously, this proposed law makes a fundamental mistake of regulating AI technology instead of AI applications, and thus would fail to make AI meaningfully safer. I’d like to explain why the specific mechanisms of SB 1047 are so pernicious to open source.
SB 1047’s purported goal is to ensure safety of AI models. It puts in place complex reporting requirements for developers who fine-tune models or develop models that cost more than $100 million to train. It is a vague, ambiguous law that imposes significant penalties for violations, creating a huge gray zone in which developers can’t be sure how to avoid breaking the law. This will paralyze many teams.
You can read the latest draft of the law here. I’ve read through it carefully, and I find it ambiguous and very hard to follow.
Developers who try to navigate the law’s complex requirements face what feels like a huge personal risk. It requires that developers submit a certification of compliance with the requirements of the law. But when the requirements are complex, hard to understand, and can even shift according to the whims of an unelected body (more on this below), how do we ensure we are in compliance?
This creates a scary situation for developers. Committing perjury could lead to fines and even jail time. Some developers will have to hire expensive lawyers or consultants to advise them on how to comply with these requirements. (I am not a lawyer and am not giving legal advice, but one way to try to avoid perjury is to show that you are relying on expert advice, to demonstrate that you had no intent to lie.) Others will simply refrain from releasing cutting-edge AI products.
If this law passes, the fear of a trial by a jury — leading to a verdict that can be very unpredictable with significant penalties in the event of a conviction — will be very real. What if someone releases a model today after taking what they genuinely felt were reasonable safeguards, but a few years later, when views on AI technology might have shifted, some aggressive prosecutor manages to convince a jury that whatever they did was not, in hindsight, “reasonable”? Reasonableness is ambiguous and its legal interpretation can depend on case law, jury instructions, and common facts, among other things. This makes it very hard to ensure that what a developer does today will be deemed reasonable by a future jury. (For more on this, see Context Fund’s analysis of SB 1047.
One highly placed lawyer in the California government who studied this law carefully told me they found it hard to understand. I invite you to read it and judge for yourself — if you find the requirements clear, you might have a brilliant future as a lawyer!
Adding to the ambiguity, the bill would create a Frontier Model Division (FMD) with a five-person board that has the power to dictate standards to developers. This small board would be a great target for lobbying and regulatory capture. (Bill Gurley has a great video on regulatory capture.) The unelected FMD can levy fees on developers to cover its costs. It can arbitrarily change the computation threshold at which fine-tuning a model becomes subject to its oversight. This can lead to even small teams being required to hire an auditor to check for compliance with an ambiguous safety standard.
Keep learning! Andrew
A MESSAGE FROM DEEPLEARNING.AIIn our new course “Prompt Compression and Query Optimization,” you’ll learn how to use MongoDB’s features to build efficient retrieval augmented generation (RAG) systems and address challenges to scaling, performance, and security. Enroll for free
NewsClaude Advances the LLM InterfaceClaude 3.5 Sonnet lets users work on generated outputs as though they were independent files — a step forward in large language model user interfaces. What’s new: Anthropic introduced Artifacts, a feature that displays outputs in a separate window of Claude 1.5 Sonnet’s web interface, outside the stream of conversation that creates and modifies them. Artifacts can include documents, code snippets, HTML pages, vector graphics, or visualizations built using JavaScript. How it works: Users can enable artifacts from the “feature preview” dropdown in their profile menu at Claude.ai. Then, asked to generate an output that’s likely to act as standalone content and undergo further work, Claude opens an artifact window next to the chat frame, populates it with an initial output, and further updates it according to subsequent prompts.
Why it matters: Artifacts make working with a large language model more fluidly interactive. Large language models (LLMs) have long been able to generate code but, outside of AI-assisted development environments like GitHub with Copilot, executing generated code typically requires further steps such as copy-pasting the code into a development environment. The additional steps add friction for developers and confusion for non-developers. Keeping and running the code in a separate window makes for a convenient, low-friction experience. Likewise when generating images and other kinds of visual output. We’re thinking: It’s rare when a user interface update makes a tool more useful for casual and hardcore users alike. It’s even more exciting to see it happen to an LLM!
AI’s Path to Zero Emissions Is CloudyThe boom in AI is jeopardizing big tech’s efforts to reach its targets for emissions of greenhouse gasses. What’s new: Google’s annual environmental report shows that the company’s total carbon dioxide emissions rose nearly 50 percent between 2019 and 2023 to 14.3 million tons. Google attributes the rise to its efforts to satisfy rising demand for AI. How it works: Google’s carbon emissions increased 16.7 percent from 2021 to 2022 and another 13.5 percent from 2022 to 2023 for a total 48 percent rise over those periods. “As we further integrate AI into our products, reducing emissions may be challenging due to increasing energy demands from the greater intensity of AI compute, and the emissions associated with the expected increases in our technical infrastructure investment,” the report states.
Countering the trend: Google is working to reduce its greenhouse gas emissions on several fronts. Its effort to purchase electricity from low-emissions sources cut its net carbon footprint by around 30 percent in 2023. It claims that its owned-and-operated data centers are 1.8 times more energy-efficient than a typical enterprise data center, and its sixth-generation tensor processing units (TPUs) are 67 percent more efficient than the prior generation. Google has asked its largest hardware partners to match 100 percent of their energy consumption with renewable energy 2029. The company is pursuing several AI-based initiatives to mitigate climate change from weather prediction to fuel-efficient vehicle routing. It says that AI has the potential to mitigate 5 to 10 percent of global greenhouse gas emissions by 2030. Behind the news: In 2020, after five years of successfully reducing its carbon footprint, Google set an ambitious target to reach net-zero greenhouse gas emissions by 2030. But its total emissions since then have risen each year. Google’s experience mirrors that of Amazon and Microsoft, which aim to reach net-zero carbon emissions by 2030 and 2040 respectively. Amazon’s emissions increased 39 percent from 2019 to 2022, while Microsoft’s emissions rose 29 percent between 2020 and 2023. (Amazon’s and Microsoft’s cloud computing revenues were roughly triple Google’s in 2023 and thus their AI-related greenhouse case emissions presumably were larger.) Why it matters: Growing use of AI means greater consumption of energy. The tech giants’ ambitious emissions goals predate the rapid growth of generative AI, and their latest reports show that it’s time to rethink them. This adds urgency to already critical efforts to develop renewable and other low-emissions energy sources. We’re thinking: We applaud Google’s efforts to cut its carbon emissions and its transparency in issuing annual environmental reports. We’re somewhat relieved to note that, for now, data centers and cloud computing are responsible for 1 percent of the world’s energy-related greenhouse gas emissions; a drop in the bucket compared to transportation, construction, or agriculture. Moreover, we believe that AI stands to create huge benefits relative to the climate impact of its emissions, and AI is one of the most powerful tools we have to develop low-carbon energy sources and boost energy efficiency throughout society. Continuing to improve the technology will help us develop lower-carbon energy sources and efficient ways to harness them.
Amazon Onboards AdeptAmazon hired most of the staff of agentic-AI specialist Adept AI in a move that echoes Microsoft’s absorption of Inflection in March. What’s new: Amazon onboarded most of the leadership and staff of Adept AI, which has been training models to operate software applications running on local hardware, GeekWire reported. Amazon licensed Adept’s models, datasets, and other technology non-exclusively. The companies did not disclose the financial terms of the deal. (Disclosure: Andrew Ng serves on Amazon’s board of directors.) How it works: Amazon hired two thirds of Adept’s former employees. Those who remain will “focus entirely on solutions that enable agentic AI” based on proprietary models, custom infrastructure, and other technology.
Behind the news: Amazon’s agreement with Adept is one of several moves to compete in AI for both businesses and consumers. In March, the company completed a $4 billion investment in Anthropic in exchange for a minority share in the startup. It’s reportedly developing new models and overhauling its longstanding Alexa voice assistant. Why it matters: Luan and his team say they’re aiming to automate corporate software workflows, a potentially valuable and lucrative market. Although Amazon Web Services’ Bedrock platform already enables users to build AI agents, Adept’s talent may bring expanded agentic and interactive capabilities.
Like LoRA, But for PretrainingLow-rank adaptation (LoRA) reduces memory requirements when fine-tuning large language models, but it isn’t as conducive to pretraining. Researchers devised a method that achieves similar memory savings but works well for both fine-tuning and pretraining. What’s new: Jiawei Zhao and colleagues at California Institute of Technology, Meta, University of Texas at Austin, and Carnegie Mellon proposed Gradient Low-Rank Projection (GaLore), an optimizer modification that saves memory during training by reducing the sizes of optimizer states. They used this approach to pretrain a 7B parameter transformer using a consumer-grade Nvidia RTX 4090 GPU. Key insight: LoRA saves memory during training by learning to approximate a change in the weight matrix of each layer in a neural network using the product of two smaller matrices. This approximation results in good performance when fine-tuning (though not quite as good as fine-tuning all weights) but worse performance when pretraining from a random initialization. The authors proved theoretically that updating weights according to an approximate gradient matrix — which reduces the memory required to store optimizer states — can yield the same performance as using the exact gradient matrix (at least for deep neural networks with ReLU activation functions and classification loss functions). Updating weights only once using an approximate gradient matrix is insufficient. However, updating weights repeatedly using gradient approximations that change with each training step (because the inputs change between training steps) achieves an effect similar to training weights in the usual way. How it works: GaLore approximates a network’s gradient matrix divided into layer-wise matrices. Given a layer’s gradient matrix G (size m x n), GaLore computes a smaller matrix P (size r x m). It uses PG, a smaller approximation of the gradient matrix (size r x n), to update optimizer states. To further save memory, it updates layers one at a time instead of all at once, following LOMO.
Results: The authors tested GaLore in both pretraining and fine-tuning scenarios.
Why it matters: LoRA’s ability to fine-tune large models using far less memory makes it a very popular fine-tuning method. GaLore is a theoretically motivated approach to memory-efficient training that’s good for both pretraining and fine-tuning. We're thinking: LoRA-style approximation has been unlocking data- and memory-efficient approaches in a variety of machine learning situations — an exciting trend as models grow and demand for compute resources intensifies.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|