Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

Date:

Share:


Does size matter?

Memory requirements are the most obvious advantage of reducing the complexity of a model’s internal weights. The BitNet b1.58 model can run using just 0.4GB of memory, compared to anywhere from 2 to 5GB for other open-weight models of roughly the same parameter size.

But the simplified weighting system also leads to more efficient operation at inference time, with internal operations that rely much more on simple addition instructions and less on computationally costly multiplication instructions. Those efficiency improvements mean BitNet b1.58 uses anywhere from 85 to 96 percent less energy compared to similar full-precision models, the researchers estimate.

A demo of BitNet b1.58 running at speed on an Apple M2 CPU.

By using a highly optimized kernel designed specifically for the BitNet architecture, the BitNet b1.58 model can also run multiple times faster than similar models running on a standard full-precision transformer. The system is efficient enough to reach “speeds comparable to human reading (5-7 tokens per second)” using a single CPU, the researchers write (you can download and run those optimized kernels yourself on a number of ARM and x86 CPUs, or try it using this web demo).

Crucially, the researchers say these improvements don’t come at the cost of performance on various benchmarks testing reasoning, math, and “knowledge” capabilities (although that claim has yet to be verified independently). Averaging the results on several common benchmarks, the researchers found that BitNet “achieves capabilities nearly on par with leading models in its size class while offering dramatically improved efficiency.”



Despite its smaller memory footprint, BitNet still performs similarly to “full precision” weighted models on many benchmarks.

Despite its smaller memory footprint, BitNet still performs similarly to “full precision” weighted models on many benchmarks.

Despite the apparent success of this “proof of concept” BitNet model, the researchers write that they don’t quite understand why the model works as well as it does with such simplified weighting. “Delving deeper into the theoretical underpinnings of why 1-bit training at scale is effective remains an open area,” they write. And more research is still needed to get these BitNet models to compete with the overall size and context window “memory” of today’s largest models.

Still, this new research shows a potential alternative approach for AI models that are facing spiraling hardware and energy costs from running on expensive and powerful GPUs. It’s possible that today’s “full precision” models are like muscle cars that are wasting a lot of energy and effort when the equivalent of a nice sub-compact could deliver similar results.



Source link

━ more like this

Galaxy Z Fold 8 renders hint at Samsung fixing one big Fold 7 mistake

It was just another slow news Tuesday when Android Headlines dropped the CAD renders of Samsung’s upcoming Galaxy Z Fold 8. Although the...

Samsung’s cheaper Mini LED TVs are now on sale

Samsung has unveiled the budget M70H and M80H Mini LED TVs, promising a bright picture and accurate colors starting at just $400 for...

Chancellor Cracks Down on Fuel Prices to Combat Gouging – London Business News | Londonlovesbusiness.com

Rachel Reeves has addressed the rising prices of petrol and diesel, which have surged following crude oil reaching $100 (£75) a barrel amid...

CBI: Momentum ‘in the retail sector remained poor in March’ – London Business News | Londonlovesbusiness.com

Retail sales volumes experienced a significant decline in the year leading up to March, marking the fastest drop since April 2020, according to...

Spotify wants you to explore music like never before with SongDNA

Spotify is taking music discovery to a whole new level with its latest beta feature. The company has officially announced SongDNA, an interactive...
spot_img