DeepSeek R1 model (technically R1 lite preview - how big is the R1 heavy pro gonna be???) has only been around for a few days, and it is right now the hottest talk of the AI town. I’ve posted about it elsewhere, and was able to play with it a bit myself (or rather the distilled version of it), and so far I have been really impressed. It would take me way too long to compile a really substantive blog post about everything that is going on here, but I still wanted to share a few of my impressions.
The model is big. Really big. The model weights are 671 GB, which does not exactly fit into vRAM, or even the regular old CPU RAM on even a professional level workstation.
The model has been distilled, and smaller versions can be run on almost any device. My favorite small model chatbot implementation is the one using WebGPU and sitting pretty at the HuggingFace website:
Alex Cheema from Eco Labs has successfully run 4-bit quantized R1 on 7 M4 Mac minis. As of this writing he is working on getting the full version running on the slightly bigger cluster.
Those who can’t wait for bigger Macs with more unified memory are opting to try out a cluster with the latest more powerful M4 MacBook Pros.
Which brings me to the following point: as I’ve mentioned above, it’s very, very hard to get a single machine with even the amount of RAM necessary to run R1, let alone vRAM/unified RAM. Apple has been at the forefront of pushing unified RAM in their machines for years, but so far the most one can get is 192 GB. The rumors are that the next generation of M4 Ultra equipped Mac Studios and Mac Pros will have up to 512 GB, but even that is not quite enough. Hopefully the release of R1 pushes hardware manufacturers 9Primarily Apple and Nvidia at this point) to bring out options with 1 TB of unified RAM. As my recent poll on X showed, there is a substantial number of people (a third of my respondents) who would be willing to pay $20K+ for such a system:
According to some online rumors, R1 has really caught the big AI labs off guard. Allegedly they are in a full panic mode. It seems that the organizational morasses within which the big corporations have to operate is finally becoming a major liability.
Was hoping to hear your experience using deepseek here...
Not sure if the largest model good quality data are the keys? Then the search efficiency of LLMs is incredible