• sayhi2.ai Newsletter
  • Posts
  • World's first AI software engineer "Kevin" unveiled. Plus, ChatGPT-powered humanoid robot figure 01 revealed!

World's first AI software engineer "Kevin" unveiled. Plus, ChatGPT-powered humanoid robot figure 01 revealed!

To our subscribers via sayhi2.ai,

Starting this week, in addition to Japanese, we have begun sending our newsletter in English. For those who have set their display language on sayhi2.ai to a non-Japanese language, we have automatically switched the language of the newsletter to English. If you wish to continue receiving the newsletter in Japanese, please subscribe to the Japanese version at https://newsletter.sayhi2.ai/subscribe and then unsubscribe from the English version using the link at the bottom of this email. We apologize for any inconvenience this may cause and appreciate your understanding.

Thank you for reading the sayhi2.ai Newsletter!

In this week, we will introduce three popular general-purpose humanoid robots, along with the most impactful news and notable tools.

1. Top 3 recent big news

① Announcement of "Kevin," the world's first AI software engineer

Autonomous AI agents are AI systems that can make independent judgments and solve problems, even with vague tasks. In April of last year, right after the release of GPT-4, AI agents like Godmode garnered significant attention. However, while they could break down tasks, they were almost incapable of completing even simple tasks.

Amidst this, Cognition, a 5-month-old startup, announced its autonomous AI agent, "Kevin," billing it as the "world's first AI software engineer." By simply assigning a broad task to Kevin, such as "Generate an image with the text 'Sara' based on this article," it can handle everything required for program implementation, from setting up the environment to executing scripts and deployment. The demo video below demonstrates how Kevin can thoroughly complete tasks.

One of Kevin's outstanding features is that it provides an experience as if you are actually collaborating with a software engineer. Here are two specific examples:

  • While Kevin is executing a task, you can give additional instructions for modifications. Moreover, if user assistance is needed for authentication or other purposes, Kevin proactively asks the user to perform the necessary actions.

  • Through the terminal, web browser, and code editor, you can follow Kevin's actions in real-time and review its history.

Of course, it's not perfect. For instance, when asked to create a game where an LLM plays chess, the execution did not finish (X). Slow execution is also a drawback; generating the image based on the article, as mentioned earlier, took 40 minutes. A human programmer could likely complete the implementation faster.

Nevertheless, it is undeniable that Kevin is far superior to previous autonomous AI agents in terms of accuracy and user experience. This is an excellent opportunity to consider which aspects of programmers' work generative AI might take over and to what extent.

② Announcement of Figure 01, a ChatGPT-powered humanoid general-purpose robot

Figure AI, a company developing humanoid general-purpose AI robots, announced "Figure 01," a robot equipped with ChatGPT. The company, founded in 2019, recently raised 100 billion yen from OpenAI and NVIDIA.

The demo video below shows Figure 01 cleaning up a kitchen while conversing with a human. It's astonishing to see what's possible with the current combination of technologies. Be sure to watch it with the sound on!

In the demo video, Figure 01 performs the following tasks, all of which would be impossible without AI models:

  • Verbally describing its surroundings to a human

  • Responding to a human's request of "I want something to eat" by handing over an apple (vague instruction → specific action)

  • When asked, "What do you think I should do next?" answering, "Put the dishes back in the dish rack," and executing the action (common sense → next action)

  • At the end, reflecting on the tasks it has performed based on its memory

When you hear about combining ChatGPT with a robot, it's easy to focus on the conversational aspects. However, ChatGPT can also be used for planning what to do next based on visual information, conversation history, and action history. According to an explanation by a Figure AI researcher (X), ChatGPT's output is used to determine which policy neural network weights to load.

In the development of humanoid general-purpose AI robots, Tesla's Optimus Gen 2, unveiled at the end of last year, also garnered significant attention. In the later section, we will introduce three more humanoid robots that are currently in the spotlight!

③ OpenAI CTO answers questions about Sora's details and release date

OpenAI's video generation AI, Sora, has garnered significant attention since its announcement earlier this year. However, its details remain shrouded in mystery. Last week, OpenAI's CTO, Mira, responded to an interview with WSJ, answering various questions about Sora for over 10 minutes.

The content is highly comprehensive and packed with valuable information for those unfamiliar with Sora's details and those following the latest trends. It's a must-watch video.

In this interview, Mira stated that Sora's public release would be in a few months to within the year, and generating a 720p, 20-second video would take a few minutes.

Regarding generation time, for example, Pika, a well-known AI video generation service, takes 45 seconds to 1 minute to generate a 1280×720p, 3-second video (excluding waiting time before generation begins). The generation time per second is equivalent to or less than Pika's video, which is far below most expectations.

After Sora's announcement, AI video generation services like Pika and Runway have released several features aimed at facilitating video editing rather than improving quality. For instance, Pika launched Lip Sync and adding sound effects to video features, while Runway added segmentation functionality to its "Motion Brush," which allows users to specify areas of an image to animate by brushing over them.

In the image generation field, which has strong ties to creative activities, multiple services like Midjourney, Stable Diffusion, and DALLE coexist. In the video generation field, it's unlikely that Sora will dominate, but there's no doubt that Sora's release will be a game-changer. It's necessary to continue closely monitoring developments.

2. Trending AI tools on social media

  • A voice AI tool that recently announced an astonishingly low latency Text-to-Speech feature of less than 0.25 seconds. You can check how natural the conversations can be in the demo video.

  • Mainly provides APIs.

  • Text-to-Speech does not support Japanese, but Speech-to-Text is already supported, and future multilingual expansion is expected.

  • Signing up grants you $200 in credit.

  • A tool for creating and editing 3D models.

  • Has more features compared to other AI tools. For example, it has a 3D version of an upscaling function that replaces textures on 3D models and a 3D-to-Image function.

  • It can also automatically rig a 3D model and animate it with text-to-animation. The process takes about 5 minutes.

  • By signing up, you can experience all features for free.

3. Three general-purpose humanoid robots to watch out for now

① Tesla Optimus - Not actually reaching the world's highest level

The most talked-about announcement regarding recent AI general-purpose humanoid robots would be Tesla's "Optimus Gen-2," led by Elon Musk. Tesla announced it would start developing humanoid robots in 2021, which is a late entry compared to competitors. You can see the performance of Optimus Gen-1 and 2 in the demo video below.

However, as you can see when compared to the two robots mentioned later, "Optimus Gen-2" is not actually reaching the highest level among general-purpose humanoid robots.

Nevertheless, Tesla is highly regarded for reaching this level in just two years since starting development and for performing most of the control using AI technology. The hardware technology and AI-based end-to-end control technology acquired during electric vehicle development are considered significant advantages.

※ By the way, the video recently posted by Musk of a humanoid robot folding a shirt also garnered much attention (X). However, in this video, the robot traces the movements of an operator wearing gloves, which Musk himself acknowledged. According to Forbes, this tracing technology is a classical one that has existed since the 1960s.

② Phoenix - A robot capable of delicate tasks

Sanctuary AI, a Canadian company founded in 2018, specializes in developing humanoid robots with the ability to perform delicate tasks. In this demo video, Sanctuary AI's humanoid robot, "Phoenix," performs 60 tasks such as "plating food" and "playing Jenga."

At the end of last month, they also announced that it could now perform the simple task of sorting objects by color at the same speed as humans. While the speed is indeed at a human level, it also gives a rough and sloppy impression.

③ Atlas - A robot with high mobility

The humanoid robot "Atlas," developed by Boston Dynamics, a long-established company founded in 1992, is known for its high mobility. Many of you may have seen videos of it gracefully performing parkour or moving around, as shown below.

Unlike Tesla's Optimus, which primarily relies on AI control, Atlas is based on existing control technologies. This video illustrates the evolutionary trajectory of robots developed by Boston Dynamics. It shows that Atlas succeeded in a backflip in 2017, indicating that it had acquired high physical capabilities quite early on.

While it's still unknown how humanoid robots will evolve with the addition of AI to existing control technologies, it's difficult to say that breakthrough innovations are currently occurring. We need to continue closely monitoring developments.

Thank you for reading!

We would appreciate your cooperation in completing a survey to help us improve our service. It can be completed with just "one-click".

Which benefited you the most?

Login or Subscribe to participate in polls.

About the Operator

The generative AI-specialized development team Mavericks operates the sayhi2.ai website. The site features over 5,000 AI tools and incorporates various mechanisms to help users efficiently find tools that suit their needs!

Furthermore, starting this year, we have begun listing more than 18,000 GPTs. Please check them out!