New Delhi: A new open-source AI tool called audioCraft has been made available by Meta. According to the manufacturer, this programme lets seasoned artists and regular people generate audio and music using straightforward text prompts.
MusicGen, AudioGen, and EnCodec are the three models that makeup AudioCraft. MusicGen can create music from text inputs and be trained using Meta’s music library. On the other hand, audioGen is skilled at creating sound effects for the general public and can produce audio from text inputs. The Encodec decoder has also been upgraded, enabling music creation with higher quality and less undesired artefacts.
Users can create environmental noises and sound effects like dogs barking, automobiles honking, or footsteps on a wooden floor, thanks to Meta’s pre-trained AudioGen models. In addition, Meta gives the code and all of the model weights for the AudioCraft tool. Applications for this new tool include audio production, sound effect creation, compression methods, and music composition.
By open-sourcing these models, Meta aims to give researchers and practitioners access to train their models using their datasets.
Meta claims that generative AI has made significant strides in images, video, and text, but audio has yet to see the same level of development. AudioCraft addresses this gap by providing a more accessible, user-friendly platform for generating high-quality audio.
In its official blog, Meta explains that creating realistic and high-fidelity audio is particularly challenging as it involves modeling complex signals and patterns at different scales. Music, being a composition of local and long-range patterns, presents a unique challenge in audio generation.
AudioCraft is capable of producing high-quality audio over long durations. The company claims it simplifies the design of generative models for audio, making it easier for users to experiment with the existing models.