AI Monopolies, Ancestor Avatars, Benchmarks for Agentic Behavior, Chatbot for Minority Languages
06-26-2024
Dear friends,
On Monday, a number of large music labels sued AI music makers Suno and Udio for copyright infringement. Their lawsuit echoes The New York Times’ lawsuit against OpenAI in December. The question of what’s fair when it comes to AI software remains a difficult one.
To be clear, just as humans aren’t allowed to reproduce large parts of copyrighted works verbatim (or nearly verbatim) without permission, AI shouldn’t be allowed to do so either. The lawsuit against Suno and Udio points out that, when prompted in a particular way, these services can nearly reproduce pieces of copyrighted music. But here, too, there are complex issues. If someone were to use a public cloud to distribute online content in violation of copyright, typically the person who did that would be at fault, not the cloud company (so long as the company took reasonable precautions and didn’t enable copyright infringement deliberately). The plaintiffs in the lawsuit against Suno and Udio managed to write prompts that caused the systems to reproduce copyrighted work. But is this like someone managing to get a public cloud to scrape and distribute content in a way that violates copyright, or is this — as OpenAI said — a rare bug that AI companies are working to eliminate? (Disclaimer: I’m not a lawyer and I’m not giving legal advice.)
To acknowledge a weakness of my argument, just because humans are allowed to emit a few pounds of carbon dioxide per day simply by breathing doesn’t mean we should allow machines to emit massively more carbon dioxide without restrictions. Scale can change the nature of an act.
Keep learning! Andrew
A MESSAGE FROM DEEPLEARNING.AILearn to reduce the carbon footprints of your AI projects in “Carbon Aware Computing for GenAI Developers,” a new course built in collaboration with Google Cloud. Perform model training and inference jobs with cleaner, low-carbon energy and make your AI development greener! Join today
NewsU.S. to Probe AI Monopoly ConcernsU.S. antitrust regulators are preparing to investigate a trio of AI giants. What’s new: Two government agencies responsible for enforcing United States anti-monopoly laws agreed to investigate Microsoft, Nvidia, and OpenAI, The New York Times reported. How it works: The Department of Justice (DOJ) will investigate Nvidia, which dominates the market for chips that train and run neural networks. The Federal Trade Commission (FTC) will probe Microsoft and its relationship with OpenAI, which together control the distribution of OpenAI’s popular GPT-series models. In February, FTC chair Lina Khan said the agency would look for possible anti-competitive forces in the AI market.
Behind the news: Government attention to top AI companies is rising worldwide. Microsoft’s partnership with OpenAI faces additional scrutiny by European Union regulators, who are probing whether the relationship violates EU regulations that govern corporate mergers. U.K. regulators are investigating Amazon’s relationship with Anthropic and Microsoft’s relationship with Mistral and Inflection AI. Last year, French regulators raided an Nvidia office over suspected anti-competitive practices. In 2022, Nvidia withdrew a bid to acquire chip designer Arm Holdings after the proposal attracted international regulatory scrutiny including an FTC lawsuit. Why it matters: Microsoft, Nvidia, and OpenAI have put tens of billions of dollars each into the AI market, and lawsuits, settlements, judgments, or other interventions could shape the fate of those investments. The FTC and DOJ similarly divided their jurisdictions in 2019, resulting in investigations into — and ongoing lawsuits against — Amazon, Apple, Google, and Meta for alleged anti-competitive practices in search, social media, and consumer electronics. Their inquiries into the AI market could have similar impacts. We’re thinking: Governments must limit unfair corporate behavior without stifling legitimate activities. Recently, in the U.S. and Europe, the pendulum has swung toward overly aggressive enforcement. For example, government opposition to Adobe’s purchase of Figma had a chilling effect on acquisitions that seems likely to hurt startups. The UK blocked Meta’s acquisition of Giphy, which didn’t seem especially anticompetitive. We appreciate antitrust regulators’ efforts to create a level playing field, and we hope they’ll take a balanced approach to antitrust.
Chatbot for Minority LanguagesAn AI startup that aims to crack markets in southern Asia launched a multilingual competitor to GPT-4. What’s new: The company known as Two AI offers SUTRA, a low-cost language model built to be proficient in more than 30 languages, including underserved South Asian languages like Gujarati, Marathi, Tamil, and Telugu. The company also launched ChatSUTRA, a free-to-use web chatbot based on the model. How it works: SUTRA comprises two mixture-of-experts transformers: a concept model and an encoder-decoder for translation. A paper includes some technical details, but certain details and a description of how the system fits together are either absent or ambiguous.
Results: On multilingual MMLU (a machine-translated version of multiple-choice questions that cover a wide variety of disciplines), SUTRA outperformed GPT-4 in four of the 11 languages for which the developer reported the results: Gujarati, Marathi, Tamil, and Telugu. Moreover, SUTRA’s tokenizer is highly efficient, making the model fast and cost-effective. In key languages, it compares favorably to the tokenizer used with GPT-3.5 and GPT-4, and even narrowly outperforms GPT-4o’s improved tokenizer, according to Two AI’s tokenizer comparison space on HuggingFace. In languages such as Hindi and Korean that are written in non-Latin scripts and for which GPT-4 performs better on MMLU, SUTRA’s tokenizer generates less than half as many tokens as the one used with GPT-3.5 and GPT-4, and slightly fewer than GPT-4o’s tokenizer. Behind the news: Two AI was founded in 2021 by Pranav Mistry, former president and CEO of Samsung Technology & Advanced Research Labs. The startup has offices in California, South Korea, and India. In 2022, it raised $20 million in seed funding from Indian telecommunications firm Jio and South Korean internet firm Naver. Mistry aims to focus on predominantly non-English-speaking markets such as India, South Korea, Japan, and the Middle East, he told Analytics India. Why it matters: Many top models work in a variety of languages, but from a practical standpoint, multilingual models remain a frontier in natural language processing. Although SUTRA doesn’t match GPT-4 in all the languages reported, its low price and comparatively high performance may make it appealing in South Asian markets, especially rural areas where people are less likely to speak English. The languages in which SUTRA excels are spoken by tens of millions of people, and they’re the most widely spoken languages in their respective regions. Users in these places have yet to experience GPT-4-level performance in their native tongues.
Conversing With the DepartedAdvances in video generation have spawned a market for lifelike avatars of deceased loved ones. What’s new: Several companies in China produce interactive videos that enable customers to chat with animated likenesses of dead friends and relatives, MIT Technology Review reported. How it works: Super Brain and Silicon Intelligence have built such models for several thousand customers. They provide a modern equivalent of portrait photos of deceased relatives and a vivid way to commune with ancestors.
Behind the news: The desire to interact with the dead in the form of an AI-generated avatar is neither new nor limited to China. In the U.S., the startup HereAfter AI builds chatbots that mimic the deceased based on interviews conducted while they were alive. Another startup, StoryFile, markets similar capabilities to elders (pitched by 93-year-old Star Trek star William Shatner) to keep their memory alive for younger family members. The chatbot app Replika began as a project by founder Eugenia Kuyda to virtually resurrect a friend who perished in a car accident in 2015. Yes, but: In China, language models struggle with the variety of dialects spoken by many elders. Why it matters: Virtual newscasters and influencers are increasingly visible on the web, but the technology has more poignant uses. People long to feel close to loved ones who are no longer present. AI can foster that sense of closeness and rapport, helping to fulfill a deep need to remember, honor, and consult the dead. We’re thinking: No doubt, virtual avatars of the dead can bring comfort to the bereaved. But they also bring the risk that providers might manipulate their customers’ emotional attachments for profit. We urge developers to focus on strengthening relationships among living family and friends.
Benchmarks for Agentic BehaviorsTool use and planning are key behaviors in agentic workflows that enable large language models (LLMs) to execute complex sequences of steps. New benchmarks measure these capabilities in common workplace tasks. What’s new: Recent benchmarks gauge the ability of a large language model (LLM) to use external tools to manipulate corporate databases and to plan events such as travel and meetings. Tool use: Olly Styles, Sam Miller, and colleagues at Mindsdb, University of Warwick, and University of Glasgow proposed WorkBench, which tests an LLM’s ability to use 26 software tools to operate on five simulated workplace databases: email, calendar, web analytics, projects, and customer relationship management. Tools include deleting emails, looking up calendar events, creating graphs, and looking up tasks in a to-do list.
Planning: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, and colleagues at Google published Natural Plan, a benchmark that evaluates an LLM’s ability to (i) plan trips, (ii) arrange a series of meeting times and locations, and (iii) schedule a group meeting. Each example has only one solution.
Why it matters: When building agentic workflows, developers must decide on LLM choices, prompting strategies, sequencing of steps to be carried out, tool designs, single- versus multi-agent architectures, and so on. Good benchmarks can reveal which approaches work best. We're thinking: These tests have unambiguous right answers, so agent outputs can be evaluated automatically as correct or incorrect. We look forward to further work to evaluate agents that generate free text output.
Work With Andrew Ng
Join the teams that are bringing AI to the world! Check out job openings at DeepLearning.AI, AI Fund, and Landing AI.
Subscribe and view previous issues here.
Thoughts, suggestions, feedback? Please send to thebatch@deeplearning.ai. Avoid our newsletter ending up in your spam folder by adding our email address to your contacts list.
|