sayhi2.ai Newsletter
Posts
Finally surpassing GPT-4! An in-depth look at Claude 3's strengths, and weaknesses, along with personal experiences

Finally surpassing GPT-4! An in-depth look at Claude 3's strengths, and weaknesses, along with personal experiences

マーベリック
March 11, 2024

To our subscribers via sayhi2.ai,

Starting this week, in addition to Japanese, we have begun sending our newsletter in English. For those who have set their display language on sayhi2.ai to a non-Japanese language, we have automatically switched the language of the newsletter to English. If you wish to continue receiving the newsletter in Japanese, please subscribe to the Japanese version at https://newsletter.sayhi2.ai/subscribe and then unsubscribe from the English version using the link at the bottom of this email. We apologize for any inconvenience this may cause and appreciate your understanding.

Thank you for reading the sayhi2.ai Newsletter!

In this week, we'll introduce impactful news and noteworthy tools. We'll particularly focus on "Claude 3," which has been making waves for finally surpassing GPT-4!

📚 Table of Contents

1. Top 3 recent big news
① At last, Claude 3 has surpassed GPT-4!
② OpenAI responds to Elon Musk's lawsuit by revealing email
③ Image generation AI is overcoming its weakness in rendering text

2. Trending AI tools on social media
① Claude
② GoEnhance AI

3. Detailed explanation of Claude 3's strengths and weaknesses, along with firsthand experiences!

1. Top 3 recent big news

① At last, Claude 3 has surpassed GPT-4!

Numerous "GPT-4 level" LLMs have been announced, but when considering usability and functionality, most have fallen short of GPT-4.

Amidst this, American company Anthropic, a competitor of OpenAI, has released the "Claude 3" series. Claude 3 comes in three sizes: Haiku (small), Sonnet (medium), and Opus (large). The highest-performing Opus has outperformed the top models of GPT-4 and Gemini in all 10 benchmarks.

Performance comparison between Claude 3 and other major LLMs. Excerpt from Claude's official announcement.

Claude 3 has received high praise on social media, with many saying, "Finally, something has surpassed GPT-4!" I personally share this sentiment and feel that Claude 3 clearly outperforms GPT-4 in the following two aspects:

1. Writing

Claude 3 is said to have a higher level of writing ability compared to GPT-4. Yoshikawa, a reporter at Japanese media, shared an article actually produced by Claude 3, stating, "Previous AIs lacked the ability to 'connect the dots,' but Claude 3 shows effort in reading between the lines."

2. Image recognition

Claude 3 possesses a far superior "eye" compared to GPT-4V. There are reports of it perfectly reading PDF receipt data, including the format. I have also confirmed that it can accurately extract text information even with some decorative elements. In terms of reading electronic data, it's safe to say that it rivals the highest level of existing OCR technology.

Claude 3 is extremely user-friendly. In fact, almost all of our team members who used GPT-4 daily have switched to Claude 3. In a later section, we'll provide more details on the user experience and shortcomings.

On a positive note, Claude 3, including the highest-performing Opus, is available for free use! (with some limitations) We've summarized how to get started in this post, so please check it out.

② OpenAI responds to Elon Musk's lawsuit by revealing email

On February 29th, Elon Musk made headlines by filing a lawsuit against OpenAI. As one of the founding investors of OpenAI, Musk claimed that the company had strayed from its original mission of developing AI for the "benefit of humanity." He demanded a refund of his investment and a halt to the licensing of technology to Microsoft.

In response, OpenAI released a statement on March 5th, asserting that the development of AI for the public good and for-profit business can coexist without contradicting the company's founding principles. Furthermore, they revealed email correspondence between Musk and OpenAI to support their argument.

One particularly shocking exchange stood out. In an email to Musk, OpenAI's Chief Scientist, Ilya, stated, "The 'Open' in OpenAI means that once AI is built, everyone should benefit from it, not that we have to share the science." To this, musk simply replied, "Yup."

An excerpt from the email exchange between Elon Musk and OpenAI, taken from OpenAI's official announcement.

It's worth noting that Ilya's claim contradicts OpenAI's mission statement from December 2015, which stated that they would "open-source the technology."

By the way, with the recent announcements of GPT-4 level LLMs like Gemini 1.0 Ultra, Gemini 1.5 Pro, and Claude 3, attention has been focused on OpenAI's next move. However, the team leader of the video generation AI "Sora" has revealed that its release is not imminent. OpenAI's next step is eagerly anticipated as it continues to dominate headlines on various topics.

③ Image generation AI is overcoming its weakness in rendering text

At the end of last month, "Stable Diffusion 3" was announced. As shown below, its ability to accurately generate images with text is highlighted as a strength.

Excerpt from the Stability AI official website.

While "text" and "human hands" are often cited as subjects that AI struggles to generate, how far have these challenges been overcome?

Text rendering seems to have improved considerably. For example, the recently updated Ideogram can generate images with text, as shown in the left image below, and Nijijourney, which specializes in generating anime-style images, can produce illustrations that include Japanese text.

(Left) "The Tip of the Iceberg" generated by Ideogram (taken from the gallery)
(Right) Image used by Nijijourney in their official version 6 announcement.

Of course, there are still instances of extra characters appearing or text being distorted. While the above examples intentionally include text, there are also cases where text unintentionally appears in the generated images. Ensuring that such text is contextually appropriate is an even greater challenge that will require further advancements.

As for human hands, the reality is that images are still often generated with an incorrect number of fingers. The recently popular video generation AI "Sora" has also been criticized for its inability to accurately depict hands in some cases.

Nevertheless, steady progress is being made, with examples of realistic, close-up images of hands being generated using Midjourney V6.

Once these two issues are resolved, it will become increasingly difficult to distinguish AI-generated images from real ones. It's entirely possible that we could reach that level within the next 1-2 years.

2. Trending AI tools on social media

Claude (https://sayhi2.ai/product/anthropic_com---product)

The newly announced LLM "Claude 3" is making waves for finally surpassing GPT-4
Image recognition and language capabilities are extremely high
Using workbench, you can try the top-performing Opus for free around 100 times
By paying $20 per month, the same as ChatGPT Plus, you can use it without limits

GoEnhance AI (https://sayhi2.ai/product/goenhance_ai)

A video style transfer tool. It offers over 10 style options to transfer to, including anime, painting, and 3DCG
It boasts extremely high consistency, and for dance videos like those on TikTok, it can convert them with very high accuracy even if the movements are intense
Compared to the similar tool Domo AI, for better or worse, it faithfully reproduces the original video, so if there is a camera shake, the characters can break down
The waiting time is 2-5 minutes, and it takes 1.5 minutes per second of video generation. You can create one 3-second and one 5-second video for free

3. Detailed explanation of Claude 3's strengths and weaknesses, along with firsthand experiences!

① Claude 3 from a writer's perspective

Yoshida, a reporter from ITmedia, a Japanese major tech media company, conducted a fascinating investigation. He had Claude 3 Opus write an introductory article about a new service and compared it with articles actually posted on ITmedia and those written using GPT-4. Here are his impressions:

❝

Compared to GPT-3.5 and GPT-4, Claude 3 makes an effort to read between the lines based on the information provided, which is a good thing. In the writer's experience (without fine-tuning the prompts), AI-generated text lacks the ability to "connect the dots" and can only summarize by rephrasing. While that is still useful, as a writer, one wants to add some value. It's appreciated that Claude 3 can understand this intention even with rough prompts.

❝

The author also occasionally uses GPT-4 to create articles, but honestly, it doesn't lead to a tremendous increase in efficiency. The workload might be reduced from 10 to 9 or 8.5, which is still very important, but it doesn't result in a dramatic reduction in effort..

On the other hand, with Claude 3, even with relatively rough prompts, the output suggests that "the workload might be reduced from 10 to 7 or 6."

I also feel that compared to GPT-4, the probability of Claude 3 generating "not incorrect but off-target text" is significantly lower. I myself have previously asked Midjourney and Stable Diffusion to compare themselves to a high-end restaurant and home cooking, respectively, in an article I wrote (which I ultimately edited by hand). However, I haven't had many opportunities to use LLMs for writing. I'm excited about the prospect of increasing the usability of LLMs in the future.

② Claude 3's weaknesses

Of course, Claude 3 also has its weaknesses. One of them is its tendency to "hallucinate." For example, when asked about "Kakudai V1," an AI upscaler our company developed in February this year, it responded as if it knew about it, but the response included several inaccuracies.

In contrast, GPT-4 correctly responded that it was not aware of the product. Quantitatively, data shows that Claude 3's hallucination rate is more than twice that of GPT-4.

HHEM Leaderboard - a Hugging Face Space by vectara

Discover amazing ML apps made by the community

huggingface.co/spaces/vectara/leaderboard

Thank you for reading!

We would appreciate your cooperation in completing a survey to help us improve our service. It can be completed with just "one-click".

Which benefited you the most?

About the Operator

The generative AI-specialized development team Mavericks operates the sayhi2.ai website. The site features over 5,000 AI tools and incorporates various mechanisms to help users efficiently find tools that suit their needs!

Furthermore, starting this year, we have begun listing more than 18,000 GPTs. Please check them out!

sayhi2.ai - Latest AI Tool Databasesayhi2.ai (👋 Say Hi to AI) is a platform that lists 5000+ latest AI tools and 18000+ GPTs! You can easily find the AI tool you want by using search, chatbots, and sayhi2.ai's original design "Popularity Score". We also deliver newsletters that can catch up with the latest AI trends in 3 minutes!sayhi2.ai