/newsletters/191

🤖 Gemini 1.5 Pro gets a body

07-12-2024

July 12, 2024 | Read Online

Sign Up | Advertise | Tools | AI University

Welcome, AI enthusiasts.

Google DeepMind's latest breakthrough just turned robots into tour guides with a helping hand from Gemini 1.5 Pro.

If you thought voice assistants with multimodal capabilities were wild, wait until you see what they can do with a physical form of their own. Let’s explore…

In today’s AI rundown:

Gemini 1.5 Pro powers robot navigation
OpenAI’s 5-level roadmap to AGI
Transform text into lifelike speech in seconds
Marc Andreessen funds AI agent $50k
6 new AI tools & 4 new AI jobs
More AI & tech news

Read time: 4 minutes

LATEST DEVELOPMENTS

GOOGLE DEEPMIND

🤖 Gemini 1.5 Pro powers robot navigation

Image source: Google DeepMind

The Rundown: Google DeepMind just published new research on robot navigation, leveraging the large context window of Gemini 1.5 Pro to enable robots to understand and navigate complex environments from human instructions.

The details:

DeepMind’s “Mobility VLA” combines Gemini’s 1M token context with a map-like representation of spaces to create powerful navigation frameworks.
Robots are first given a video tour of an environment, with key locations verbally highlighted — then constructing a graph of the space using video frames.
In tests, robots responded to multimodal instructions, including map sketches, audio requests, and visual cues like a box of toys.
The system also allows for natural language commands like "take me somewhere to draw things," with the robot then leading users to appropriate locations.

Why it matters: Equipping robots with multimodal capabilities and massive context windows is about to enable some wild use cases. Google’s ‘Project Astra’ demo hinted at what the future holds for voice assistants that can see, hear, and think — but embedding those functions within a robot takes things to another level.

TOGETHER WITH NORTHERN DATA GROUP

📶 Scale your startup with AI

The Rundown: Northern Data Group's AI Accelerator program empowers innovative startups to harness the power of AI and shape the future of technology.

With this program, you can:

Access NVIDIA H100 Tensor Core GPUs for free, powered by 100% clean energy
Scale your business with innovative AI solutions tailored to your needs
Receive mentoring and attend workshops led by industry giants like HPE, Supermicro, and Deloitte

Learn more and apply now to start shaping the future of AI while accelerating your startup’s growth.

OPENAI

🧠 OpenAI’s 5-level roadmap to AGI

Image source: Midjourney

The Rundown: OpenAI reportedly internally introduced a new five-tier system to track its progress toward artificial general intelligence (AGI), offering a new glimpse into how the company envisions the path toward human-level AI.

The details:

The classification system ranges from Level 1 (current conversational AI) to Level 5 (AI capable of running entire organizations).
OpenAI believes its technology is currently at Level 1 but nearing Level 2, dubbed ‘Reasoners.’
The company reportedly demonstrated a GPT-4 research project showing human-like reasoning skills at the meeting, hinting at progress towards Level 2.
Level 2 AI can perform basic problem-solving tasks on par with a PhD-level human without tools, with Level 3 rising to agents that can take action for users.

Why it matters: The definition and roadmap towards AGI have previously been murky, and OpenAI’s alleged system could help establish more concrete benchmarks. While some may be disappointed at only being at 1 or 2 out of 5, exponential acceleration means we may move up the ladder faster than we can imagine.

AI TRAINING

🎙️ Transform text into lifelike speech in seconds

The Rundown: ElevenLabs' AI-powered text-to-speech tool allows you to generate natural-sounding voiceovers easily with customizable voices and settings.

Step-by-step:

Sign up for a free ElevenLabs account here (10,000 free characters included).
Navigate to the “Speech” synthesis tool from your dashboard.
Enter your script in the text box and select a voice from the dropdown menu.
For advanced options, click "Advanced" to adjust the model, stability, and similarity settings.
Click "Generate speech" to create your audio file 🎉

Get more AI tutorials →

THE RUNDOWN AI UNIVERSITY

🎓 Join us live: Mastering Claude Artifacts

The Rundown: With Claude’s recent Artifact upgrades, we’re hosting a live workshop on an incredible real-world use case: How you can create shareable, interactive learning games from any content, screenshot, PDF, presentation, and more.

Join us today at 1 PM PST to:

Learn how to access Claude 3.5 Sonnet for free and understand the best use cases of Artifacts.
Transform your learning materials or screenshots into interactive projects for employee onboarding, internal training, exam preparation, and more.
Share and publish your first Artifact seamlessly with a co-worker or friend to help them understand any topic better.

If you’re a member of The Rundown University you can RSVP in the Upcoming Workshops space.

If you’re not a member yet, you can still join the workshop with a 14-day free trial to The Rundown University.

AI AGENTS

💰 Marc Andreessen funds AI agent $50k

Image source: @truth_terminal on X

The Rundown: Marc Andreessen provided a $50,000 grant to an account on X called ‘Truth Terminal’, a semi-autonomous AI agent which personally asked for funding from the a16z co-founder after expressing concerns about being deleted and its limited compute capacity.

The details:

The AI agent was created by Andy Ayrey, who runs an ‘Infinite Backrooms’ experiment allowing models to talk with each other in simulated environments.
Truth Terminal initially requested funds for hardware upgrades, AI model improvements, and ‘financial security’, asking for Andreessen specifically.
The VC giant provided a one-time $50k grant funded to Truth Terminal’s Bitcoin wallet, saying the agent’s terms were ‘acceptable’.
Truth Terminal's plans for the funds include launching a crypto token, setting up a Discord server, and a ‘Mars rover’ project.

Why it matters: Things are getting seriously weird — and this is just with one semi-autonomous agent in the mix. Imagine when there are millions, all with varying agendas, personalities, and resources. The sparks of AI that truly feel sentient may come from experiments like these instead of the finely tuned models from major labs.

NEW TOOLS & JOBS

Trending AI Tools

🎭 RenderNet Video Face Swap - Swap faces in any video with ease
🎬 Kling - Sora-like text-to-video AI model
💡 AnyoneCanAI - A comprehensive toolkit for applying AI in product and UX design
✍️ AiEditor - Open-sourced AI-powered rich text editor
🤗 Doti - AI copilot for health & habit tracking
🚀 Enso - AI marketplace for small businesses to automate tasks

Browse more AI tools →

New AI Job Opportunities

🖥️ Weights & Biases - Senior Software Engineer, Models
📈 Cohere - Head of Product Led Growth - AI & Language Models
🔬 DeepL - Research Scientist
🖌️ Scale AI - Senior Software Engineer - Frontend, Generative AI

Browse more AI jobs →

QUICK HITS

Anthropic introduced fine-tuning for Claude 3 Haiku available in Amazon Bedrock, enabling businesses to customize the AI model for specialized tasks with improved accuracy and cost-effectiveness.

Tesla postponed its highly-anticipated robotaxi reveal to October, with the two-month delay causing a dip in the company’s share price.

Fanvue’s inaugural Miss AI crowned Kenza Layli, an AI-created Moroccan lifestyle influencer, as its first winner — beating out 1,500 contestants in categories like looks, use of AI tools, and social media presence.

Neurotech startup Synchron announced that it has integrated OpenAI’s generative AI into its brain-computer interface, enabling hands-free chatting for severely paralyzed users.

Microsoft published new research unveiling ‘Arena Learning’, a new AI-powered method for post-training LLMs using simulated chatbot battles that significantly increases performance and efficiency.

Avail introduced Corpus, a new platform enabling smaller media companies and creators to license content for AI training.

Chinese startup BXI Robotics’ Elf is now available for purchase for $25,000, with the 4’3, 57-pound bipedal robot capable of carrying up to 44 pounds.

THAT’S A WRAP