• sayhi2.ai Newsletter
  • Posts
  • Alibaba unveils new technology to make people in photos sing realistically. Plus, free image generation AI tool rivaling Midjourney!

Alibaba unveils new technology to make people in photos sing realistically. Plus, free image generation AI tool rivaling Midjourney!

Thank you for reading the sayhi2.ai Newsletter!

This week, an unprecedented number of important AI news and high-quality AI tools have been announced. Therefore, we will increase the number of topics covered this week and provide concise explanations for a total of 8 AI tools and news items!

1. Top 3 recent big news

β‘  Alibaba announces new technology that can make people in photos sing realistically

Alibaba has released a technology called "EMO" that allows people in images to talk or sing by simply preparing a person's photo and audio. It can even beautifully sing fast rap songs, reaching a quality that is indistinguishable from the real thing. Be sure to check out the video below!

The following two points about EMO are considered particularly innovative:

  • It can create videos of singing or talking from just a single image

  • It can realistically reproduce changes in facial expressions and body movements beyond simple lip-syncing

Over the past year, Alibaba has announced several groundbreaking AI technologies for e-commerce and entertainment purposes. Notable examples include "Animate Anyone," which allows characters or people to dance freely, "Outfit Anyone," an AI model for changing clothes, and "Replace Anything," which can replace any object in an image.

Unfortunately, Alibaba is reluctant to release such cutting-edge technologies, and all four of the above technologies, including EMO, have not had their code released. The emergence of open technologies that achieve equivalent performance is eagerly awaited.

β‘‘ Innovative technology that can generate "non-rectangular images" with transparent backgrounds is released

Today's image generation AI has become capable of generating very high-quality images, but it is still difficult to use in practice. One problem is that it can only generate rectangular images and cannot overlay generated images.

This week, a technology called "LayerDiffusion" that breaks through this current situation was released. The two researchers who announced it are also the authors of "ControlNet," which is no exaggeration to say was the biggest invention in the image generation AI field last year.

LayerDiffusion can generate "images with transparent backgrounds" "according to the background." The following images will be used to explain specific use cases.

First, using the top-left image of an empty room as the background and inputting "a table," a transparent table image is generated and overlaid on the background to place a desk in the room. Using the generated image as the background and inputting "cat on table," an image of a cat sitting on the desk is generated, taking into account the position of the desk. By adding multiple objects in this way, a rich room image like the one in the bottom right can be generated.

Even more amazing is that the transparent image of the cat even incorporates the shadows cast by the light coming through the window. It is clear that the image is being generated "according to the background."

LayerDiffusion is a brand-new technology that was just published in a paper recently, but the code is also being released, and usage reports are expected to increase. Over the next year, it is expected that accuracy will be improved and integration with other image generation techniques will progress, and it will be very exciting to see what new possibilities will be opened up.

β‘’ Microsoft announces next-generation LLM with 1.58-bit weights

The "BitNet b1.58" announced by Microsoft achieves far higher energy efficiency than existing LLMs and has the potential to change hardware development, including GPUs.

In simple terms, deep learning models are matrices, and by optimizing the element values (weights) of each matrix, the desired output can be obtained. In conventional models, these weights were represented by 16-bit floating-point numbers, but in the proposed LLM, the model weights consist only of [-1, 0, 1]. Since 1.58 bits are required to represent [-1, 0, 1], it is called a "1.58-bit LLM" (or more roughly, a 1-bit LLM).

This ultimate quantization technology fundamentally changes the computing paradigm. For example, as shown in the figure below, the multiplication of a matrix and vector consisting of [-1, 0, 1] becomes a simple addition of vector elements. With such innovations, it is expected that hardware with higher computational efficiency than GPUs, which currently dominate the computing paradigm, will be developed.

A conceptual diagram showing how BitNet b1.58 represents weights and changes matrix multiplication compared to existing LLMs. Excerpted from the original paper.

Furthermore, significant speedups and reductions in memory usage and energy consumption have been achieved. As an example, BitNet b1.58 70B has been reported to achieve a 9-fold increase in throughput (generation speed per token) compared to LlaMa 70B.

While 1-bit LLMs have been proposed in the past, what is innovative this time is that they have achieved higher performance than existing models of the same parameter size. However, the model sizes for which performance comparisons have been made are at most 3B, which is small for an LLM, and the results shown are still limited (for example, the number of parameters in GPT-3 is 135B). Further research results are awaited.

2. Trending AI tools on social media

  • A free image generation AI tool that just released version 1.0 on 2/29

  • Achieved performance surpassing Midjourney V6 and DALL-E3 in four categories in human evaluation

  • Excels at generating images with text, with an error rate less than half that of DALL-E3

  • Can be used immediately by simply signing up, and 100 images (25 times) can be generated per day for free

3. There's still more important news this week... Introducing 4 additional news items all at once!

This week, I feel that there has been more impactful AI news than ever before. Therefore, in the first section, we will briefly explain four items that we were unable to cover!

Issues were raised such as refusing to generate images of white people and generating images of black and Asian people for the prompt "German soldier in 1943". More serious issues were also reported, and Google's stock price temporarily dropped by 3.5 trillion yen (4.5%).

Elon Musk was one of the founding investors of OpenAI in 2015. He claims that the original vision of a non-profit organization aiming for "the benefit of humanity" has faded and is in violation of the contract, and is seeking a refund of the invested funds.

It has been reported that Apple has halted its 10-year EV development project and many of the staff have been transferred to the generative AI division. Billions of dollars have been invested in the project to date. Apple, which is seen as lagging behind in generative AI development, can be said to have made a critical decision toward a comeback.

Figure AI is a Silicon Valley-based startup founded in 2022 that develops versatile humanoid robots. Following this funding round, the company's valuation has reached approximately 400 billion yen (2.6 billion dollars). You can watch a video of the robot developed by Figure AI making coffee here.

Thank you for reading!

We would appreciate your cooperation in completing a survey to help us improve our service. It can be completed with just "one click".

Which benefited you the most?

Login or Subscribe to participate in polls.

About the Operator

The generative AI-specialized development team Mavericks operates the sayhi2.ai website. The site features over 5,000 AI tools and incorporates various mechanisms to help users efficiently find tools that suit their needs!

Furthermore, starting this year, we have begun listing more than 18,000 GPTs. Please check them out!