Scientists have made use of a technique known as "knowledge distillation" to condense the Stable Diffusion XL model into a much leaner, more efficient AI image generation model that can run on low-cost hardware.
There is a new artificial intelligence tool that can generate images in under two seconds, and it doesn't require expensive hardware to do that.
A team of researchers in South Korea have compressed a model of image generation referred to as Stable Diffusion XL that has 2.56 billion parameters, or variables that the AI learns during the training process, using a special technique referred to as knowledge distillation.
With just 700 million parameters, KOALA is the smallest model in the world. As a result, it runs very quickly and does not require any expensive and energy-intensive hardware resources.
Their approach, known as knowledge distillation, involves transferring knowledge from a larger model to a smaller one, aiming to maintain performance levels without sacrificing efficiency. By employing a smaller model, computational tasks are expedited.
Leading to quicker processing times and accelerated generation of responses. This method optimizes computational resources while preserving the proficiency of the model.
This tool is designed to operate seamlessly on low-cost graphics processing units (GPUs) and requires approximately 8GB of RAM for processing requests. In contrast to larger models that demand high-end industrial GPUs.
This streamlined approach significantly reduces hardware requirements, making it accessible to a broad range of users. This optimization not only minimizes costs but also enhances scalability, allowing for efficient processing even on modest computing setups.
On December 7, 2023, the team unveiled their discoveries in a paper deposited onto the preprint database arXiv. Moreover, they have generously shared their breakthrough through the open-source AI repository, Hugging Face, fostering collaboration and enabling further exploration and development within the AI community.
This dual approach of academic publication and open-source contribution underscores their commitment to transparency and knowledge dissemination, propelling advancements in AI accessibility and innovation.
The Electronics and Telecommunication Research Institute (ETRI), responsible for these groundbreaking models, has developed a suite comprising five variations. Among these are three iterations of the "KOALA" image generator, tailored to generate images from textual inputs.
Additionally, there are two versions of "Ko-LLaVA," equipped with the capability to respond to text-based queries with either images or videos, showcasing the institute's multifaceted contributions to advancing AI technology.
During testing, KOALA demonstrated its prowess by generating an image corresponding to the prompt "a picture of an astronaut reading a book under the moon on Mars" in a mere 1.6 seconds.
In contrast, OpenAI's DALL·E 2 took 12.3 seconds to generate an image based on the same prompt, while DALL·E 3 required 13.7 seconds, as reported in a statement. This remarkable speed advantage underscores KOALA's efficiency and effectiveness in image generation tasks.
The scientists are now strategizing to seamlessly incorporate the technology they've pioneered into various domains, including existing image generation services, educational platforms, content production pipelines, and diverse business ventures.
This integration holds the promise of enhancing productivity, creativity, and accessibility across these sectors, catalyzing innovation and transformation in digital content creation and dissemination.
Their vision encompasses a wide spectrum of applications, from enriching educational materials with dynamic visual aids to revolutionizing content creation processes in industries ranging from media and entertainment to advertising and beyond.
No comments:
Post a Comment