The Underestimated Importance of Local Development and vRAM in ML/DL Work
Two more bitter lessons
I have spent a long time navigating the data science and machine learning world, and I keep returning to two points that have proven valuable in my experience. First, a robust local development environment is far more useful than many realize. Second, when dealing with large-scale machine learning or deep learning workloads, having ample GPU memory (vRAM) often matters more than raw processing speed. These insights come from practical lessons learned the hard way.
A local development environment gives you immediate access to your projects without delays caused by network latency or complicated logins. You can organize your files and folders exactly as you see fit, install any libraries you need, and run scripts without extra overhead. This direct control can save time and frustration, particularly when experimenting with new approaches or troubleshooting complex bugs. You can generally try more things, experiment faster, and in every way have much more direct “feel” for what you are building. While cloud solutions offer advantages in certain situations, nothing compares to having a setup tailored to your personal workflow.
In big machine learning or deep learning tasks, dataset size becomes the critical factor. Large datasets often exceed memory limits if you do not have adequate GPU memory. Chunking data into batches is a key feature of deep learning, but loading these batches too frequently from disk can become a serious bottleneck. If your training is slowed by constant disk access, the speed advantages of your compute hardware become less important. You can have the fastest GPU cores available, but if you do not have enough memory, you will still face performance issues. I would rather have several slower GPUs that give me access to more vRAM, than one faster one with less total vRAM.
Despite sharing these points with various stakeholders, I have often been dismissed or met with skepticism. The push in the tech world over the past two decades has been relentlessly towards offloading all the computational tasks to the “Cloud”. The amount of RAM, let alone vRAM, that the local computers come with has hardly budged for over a decade. Just today I was looking into buying a motherboard that can handle more than 192 GB of RAM, and the only places that sell it seem to require you to submit a special inquiry (which, as of this writing, doesn’t get answered). I am hopeful that the rise in popularity of local LLMs becomes the necessary wake up call, both for the hardware manufacturers, as well as for how the work of Data Science and Machine Learning gets done.
Strong preference for good local too. I have recently updated my notebook and I feel the peace of being able to work locally and don't rely on complex setups. I don't require that much gpu, lightgbm goes brrr in a couple of cores
Great points. When I invested in my latest workstation, the usual comment was “Why spending money on that when you can use the cloud?”
I would rather have a strong machine than a cloud subscription.
If I can run it locally, I will run it.