Much more efficient than the old MP3 format, EnCodec uses the neural acceleration coprocessor found in modern phones to recover details otherwise lost in audio compression, reducing the bandwidth required by up to 10x.
According to Meta, EnCodec can dramatically improve audio fidelity for voice calls picked up over severely bandwidth-constrained connections, for example, in areas with very poor mobile network coverage. So, from 64kbps bandwidth of a standard MP3 compression-based connection, you could reduce it to as low as 6.4kbps without noticeable degradation in audio fidelity.
EnCodec can also work just as well for streaming songs, bringing the fidelity of a 64kbps MP3 up to the level of a 640kbps MP3, almost to the level of a lossless codec.
According to Meta’s description, EnCodec compression works as a three-stage system, with the codec transforming uncompressed data into a “latent space” representation with a reduced sampling rate. The “quantizer” then compresses the representation to the desired size while keeping track of the most important information that will later be used to reconstruct the original signal. (This compressed signal is what is sent over the internet connection or saved to a local file.) For playback, the decoder converts the compressed data back into audio, the reconstruction assisted by the phone’s NPU (neural processing unit) accelerator running in real time.
“The key to lossy compression is identifying changes that will not be perceptible to humans, since perfect reconstruction is impossible at very low sample rates. To do this, we use discriminators to improve the perceptual quality of the generated samples.” The main challenge is to make the reconstructed audio portions as close as possible to the original samples preserved at initial compression, without the listener being able to pick up differences (audio artefacts). In the highest quality mode, EnCodec can compress stereo sound with 48KHz fidelity, dramatically reducing the bandwidth required over an internet connection.