AI language models can exceed PNG and FLAC in lossless compression, says study

FlickOfTheBean@beehaw.org · 1 year ago

AI language models can exceed PNG and FLAC in lossless compression, says study

Butterbee (She/Her)@beehaw.org · 1 year ago

A LOT. You can barely run 13b parameter models on a 24gb gfx card and outputs are like a page or so of text. Translate that over to audio and it would have to be broken down into discrete chunks that the model could use as “prompts” to output a section of audio that fit into the models available output. It might compress better, but it would be exceedingly painful and slow to extract even on AI focused cards. And it would use OODLES of watts to get just a little bit better than flac.

abhibeckert@beehaw.org · edit-2 1 year ago

13b parameters works out to about 9GB. You need a bit more than that since it needs more than just the model in memory, but at 24GB I’d expect at least half of it to go unused. And memory doesn’t use much power at all by the way. LPDDR4 uses something like 0.3 watts while actively reading/writing to it.

The actual computations use more, obviously, but GFX cards are not designed for this task and while they’re fast most of them are also horribly inefficient.

I run 13b parameter models on my ultra portable laptop (which has a small battery, no active cooling (fanless) and no discrete GPU). It has 16GB of RAM not GPU memory - RAM, and I’m running a full operating system, web browsers, etc a the same time. Models like llama2, stable diffusion, etc get perfectly usable performance without using much battery at all (at a guess, single digit watts while performing the calculations).

There is efficient hardware now and there will be even more efficient hardware in the future. My laptop definitely isn’t designed to run these models and on top of that the models aren’t designed to run on a laptop either. There’s plenty more optimisation work to be done in the years to come.

Butterbee (She/Her)@beehaw.org · 1 year ago

Ok, it’s been a while since I tried running a language model so I might have been thinking of the 30b models that were showing up at the time. The point remains though that this thing they were running would be well beyond hardware generally available and completely impractical for realtime use. Like… why would you do all that when flac and png are good enough. It is far cheaper and uses less power to accommodate the slightly less compressed files.