
Incredibuild Team
reading time:
In the world of Machine Learning, time is more than just money, it’s the speed of innovation. But as models grow more complex, data scientists and engineers often find themselves trapped between two bad options: waiting hours for local training to finish on aging hardware, or facing the massive overhead of “lifting and shifting” workloads into complex orchestrators like Kubernetes or Run.ai.
By using Incredibuild to offload TensorFlow image classification workloads from a legacy “Initiator” machine to a high-performance “Helper” node, we achieved a 4.33x speedup with zero code changes and no containerization required allowing training on a shared resource with 0 set up effort.
Many Data teams rely on stable, cost-effective instances for their daily work. In our test environment, our “Initiator” was an AWS g3.4xlarge equipped with an NVIDIA Tesla M60.
While reliable, the M60 struggles with modern deep learning workloads. In our POC, a condensed image classification training task took 19.5 minutes to complete locally. In a real-world scenario with multiple iterations, this equates to a full day of lost productivity while the engineer waits for the model to converge.
Instead of upgrading every engineer’s workstation or spending days configuring a KubeFlow pipeline, we used Incredibuild to “borrow” the power of a more potent machine: the AWS g5.2xlarge featuring the NVIDIA A10G.
We prepared our environment using the AWS Deep Learning Base GPU AMI (Ubuntu 20.04) and configured a TensorFlow 2 Object Detection API project.
The beauty of this approach lies in the simplicity of execution. To move from a slow local run to a high-speed remote run, the command only changes by a single prefix:
Local Run (Slow):
time ./start.sh
Distributed Run (Fast):
time ib_console -f ./start.sh
By offloading the workload to the A10G Helper node, the results were immediate and dramatic:
| Metric | Local (Tesla M60) | Remote (A10G) | Improvement |
| Training Time | 19m 27s | 4m 31s | 4.33x Faster |
| Complexity | N/A | Zero code changes | Native Execution |
By using ib_console, the initiator machine acted as if it had the power of the A10G locally. We monitored the progress in real-time using TensorBoard, seeing the same convergence results but in a fraction of the time.
This isn’t just about speed; it’s about the competitive advantage of flexible infrastructure:
The “GPU shortage” isn’t always about a lack of chips—it’s about the inefficient use of the chips we have. By offloading workloads dynamically, we can turn a standard dev machine into a machine learning powerhouse for the entire team without any changes to the team’’s workflow.
Table of Contents
Shorten your builds
Incredibuild empowers your teams to be productive and focus on innovating.
Incredibuild empowers your teams to be productive and focus on innovating.
| Cookie | Duration | Description |
|---|---|---|
| cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
| cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
| cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
| cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
| cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
| viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |