Generative audio

Content sourced from Wikipedia, licensed under CC BY-SA 3.0.

Generative audio creates new sounds by learning patterns from a library of audio clips. Unlike voice assistants that stitch together small recordings, it trains neural networks to understand the overall properties of a voice and then generate new speech. This means a person’s voice could be made to say things they never spoke, which can be risky if a famous voice is faked. Modern systems use deep learning methods like generative adversarial networks (GANs), where two models compete to produce realistic sound. Other approaches include WaveNet, which models raw audio waveforms. Projects like 15.ai showed that voices can be cloned with surprisingly little training data—about 15 seconds.

This page was last edited on 3 February 2026, at 12:43 (CET).