How Banana Performs Machine Learning Model Optimization with AWS & GCP

Banana offloads undifferentiated AWS & GCP problems to Zeet and focuses on their core technologies.

Banana is a startup that takes a customer's model or data set and converts it into a high-throughput API. Their in-house team of machine learning experts focuses on reducing the cost and latency of the deployed model through ML-specific infrastructure optimizations while maintaining and improving model quality. This process requires engineers without DevOps expertise to comfortably operate a complex multi-cloud architecture.

Today, everything that Banana deploys is deployed on Zeet. Erik Dunteman, Banana's founder, explained the velocity their lean engineering team has achieved with Zeet in an interview.

Before Zeet With Zeet
System-wide production outage every three weeks Zero outage events
10 microservices, 80% with automatic deployment 25 microservices, 100% with automatic deployment — without adding engineering headcount or additional tooling
Banana’s founder, Erik, spends 25% of his time on manual deployments and fixing infrastructure Zeet handles deployments automatically, Erik is 100% focused on architecture and growth

While it might initially seem strange that one infrastructure dev tools company relies on another, this deep dive into Banana's DevOps journey reveals how a complementary architecture lets Erik and Banana offload their undifferentiated problems to Zeet and focus on the core technologies they offer their customers.

Banana's Architecture

After working through an MVP in Google Colab, Banana serves its clients fully provisioned end products: the client provides a model, Banana creates and hosts an API so that the client can consume their model with better, faster, and cheaper results. Using SDKs for Python, Node, and Go, Banana's clients integrate production models directly into their systems. This requires production-grade architecture on carefully provisioned infrastructure.

When a Banana user makes an API call, the request begins its journey at a proxy layer. An async service queue serves as middleware for business logic. Its most important function is validating the request. It checks the API keys, ensures that the key owns the model that it is trying to call, and pulls the ID of the downstream repository that runs the actual model. Then, it passes the call to the server that performs the machine learning.

Every part of the system is now hosted on Zeet, from the middleware to the machine learning servers to custom usage dashboards. The machine learning servers are GPU or TPU clusters on either AWS or GCP, which presents unique architectural challenges.

Architecture Detail: GPU-specific problems

While some of the architectural challenges that Banana faces are shared by any SaaS operator with a Node website, much of the complexity they face comes from using GPUs and other hardware accelerators. General Platform-as-a-Service (PaaS) offerings like Heroku don't offer hardware acceleration, while GPU-specific PaaS options are prohibitively expensive. Similarly, reasonable serverless options simply don't exist for meaningful GPU loads. So Banana needs provisioned servers with GPUs.

Architecture Detail: Multi-Cloud

Amazon Web Services (AWS) and Google Cloud Platform (GCP) offer different machine types and cost structures. To serve a wide range of customer use cases, Banana needs to be able to deploy its models across both. Specifically, for customers who prioritize speed and latency, Banana deploys models on GCP's Tensor Processing Units (TPUs), a cutting-edge hardware accelerator for certain types of machine learning processes. However, runtime on these machines is relatively expensive, so more cost-sensitive models are deployed on AWS, where Banana has contracts to provision GPUs below on-demand pricing across AWS's wide range of GPU options.

Zeet has made it beautifully easy [to deploy between multiple platforms], just change the deployment target and it is up. We don't even need to think about multiple platforms at this point, it's just which platform has the machine type.

This specialized multi-cloud architecture complicates deployment and DevOps procedures but enables an essential component of Banana's service.

Banana's Problem

Banana is an early-stage startup of five people, four of whom are technical. The founder, Erik, manages the architecture and business logic himself, and wants his team of three machine learning engineers to have the best developer tools possible so that they can focus on the core technical problems the business is working to solve. But, he realized their in-house solutions built directly on AWS were slowing them down.

Manual Deployment and Provisioning

Before Zeet, deploying a new or updated model was always a fraught task. Here was Erik's sequence:

  • Provision ~10 GPUs on AWS
  • Launch each one off of an Amazon machine image
  • Docker container starts at runtime
  • "Hacky" script performs git pull on the model repo on restart

A rolling redeploy service tied the system together. Using the AWS SDK, the script restarted every GPU in the fleet one by one. These manual restarts addressed GPUs reaching a stale or unhealthy state and pulled in updates to the model. However, as the architecture grew more complex and its load increased, this approach soon caused more problems than it solved.

As I was maturing as a founder, I realized...the solutions I had naturally landed on, using the AWS API to automate things, got super clunky. We had a lot of forgotten instances. A lot of my job became figuring out the boto3 Python API.

Cascading Reboots

Erik freely admits that Banana's infrastructure was pretty messy before they migrated to Zeet. One major issue that the migration resolved was a battle between two in-house uptime solutions. In addition to the aforementioned rolling redeploy system, Banana relied on a custom health checker that would forcibly restart any instances that were unhealthy.

Unfortunately, the two systems collided in a tangled mess. As the systems didn't communicate with one another, sometimes they would spawn restart chains. The rolling redeploy would restart an instance, but before it was done restarting, the health checker would identify it as unhealthy and restart it. As a result, Erik said "whenever I would go to deploy any updates to the model or the model handling code my next four hours would be dedicated to making sure the rollout went through."

If this approach to deployment had continued, Banana would have incurred the substantial expense of hiring devops engineers or have faced ML engineers’ productivity being hobbled by deployment time. Clearly, a new approach to deployment was needed so that Erik and his team could focus on their business, not their servers.

Stacking Dev Tools for Success

On the surface level, it might not make sense for one infrastructure devtool (Banana) to rely on another infrastructure devtool (Zeet). But, as Erik explains:

"The entire point of dev tools is to focus on a repeated but undifferentiated part of the stack and automate that. ‘Undifferentiated’ means that a team building Uber for Dogs wants to focus on the core aspects of what makes them Uber for Dogs. They don't want to focus on the same infrastructure that Uber for Cats needs to build or a dating app needs to build. In the end there is a certain chunk of infrastructure that is undifferentiated and requires linear time spent to build it. As you get deeper into infrastructure and machine learning, you see things repeat themselves and you can chunk things into smaller and smaller components."

"Zeet is doing the undifferentiated work of the deployment workflow, taking our code that we push to Git and getting it live on whatever machines we need it to be live on, plus handling things like networking, health checks, restarts. That is something we would have to build in house otherwise and we don't want to because it is undifferentiated for us. Zeet takes that same product and sells it to other people which is cool because that proves that it wouldn't be special for us to build it in house."

"Banana does the same thing at the machine learning level. Thankfully, we don't need to think about general infrastructure, we can think about machine learning model optimization-specific nuances and target the areas that are undifferentiated just like Zeet has done with deployment. At the core of what we do, we take a black box model, we don't care what it is, and figure out how to run it faster, cheaper, and with higher quality."

"Basically there is not an overlap in the undifferentiated areas of the stack that we focus on. And we are better equipped to focus on the areas that we care about because we rely on a different dev tool provider for the things that are table stakes."

Banana's New Groove

With Zeet, Banana engineers merge an update to a model into the main branch of its repository on GitHub, and the model goes live automatically. Zeet handles more than just deployment, it provides Banana with a complete production dev tools setup. Today, everyone needs CI/CD. Everyone needs automatic deploys, everyone needs rollbacks, everyone needs the services to not go away when they crash. Now Banana's engineers work on rock-solid infrastructure and spend almost zero time on SRE and traditional DevOps.

What is particularly remarkable is what Erik and the rest of Banana's engineers are doing with the time that they have saved.

Machine Learning Engineers Using Zeet’s API

Not only is Zeet saving Erik time, but it is enabling his team of machine learning engineers to interface directly with their infrastructure in new ways. When Banana onboards a new engineer, one of their tasks is to get a minimal ML model deployed on Zeet and wired into the rest of the stack. This step takes half a day to learn, and then new engineers can deploy whatever they need to in their daily work without ever being blocked by DevOps.

Additionally, Banana extends Zeet's default capabilities through the Zeet API to handle their unique needs. Banana has an in-house autoscaler to balance cost and latency for its customers. This ML-specific infrastructure scales up the model server if latency spikes, then scales back when the additional capacity is no longer needed. Building Banana's autoscaler with the Zeet API lets Erik skip worrying about things like whether or not the replicas he was provisioning are healthy; he just knows that he can rely on the replica count that the API gives him. Working at a higher level of abstraction than AWS's boto3 Python API, his previous tool, makes the autoscaler a more reliable and maintainable piece of infrastructure.

Improved Deployment Velocity & Code Review

This improved developer experience has dramatically increased deployment velocity while making space for adding best practices like branch-based deploys and code reviews. In their first week after switching over to Zeet, Banana's engineering team deployed five times as often as they had the previous week.

But deployment speed isn't a panacea unless you are confident in the code you are deploying. So Zeet also makes code reviews on branch-based PRs the lowest-friction way of deploying code. Banana's engineering team quickly built a habit of performing a code review before merging a branch, as merging a branch now equals deploying the code. With their process now firmly established, the team of four can push as many as five PRs a day into production when needed, where each PR represents a material change to a machine learning model running on complex infrastructure.

Ultimately, pain reduction in undifferentiated areas of their stack enables Banana to focus their engineering efforts on core competencies like smoothing out GPU & TPU idiosyncrasies.

Supporting Each Other, Past and Future

At the end of the interview, Erik mentioned that one of his favorite parts of working with Zeet is the customer service time and speed of fixes going live. When dealing with AWS directly, the things that Banana cared deeply about getting optimized or fixed were very long-tail and just wouldn't get addressed. As a Zeet customer with unique needs, Banana's requests for support for issues with multi-cloud or multiple hardware accelerators are met within hours or days, which improves Zeet for all of its users.

We have a zero-person SRE team now and for the foreseeable future. In a DevOps role, I don't touch SRE and don't do much traditional DevOps.

Banana and Zeet have been growing alongside each other for years through various iterations of each idea, both offering their users developer tools that make undifferentiated infrastructure problems at different levels of the stack disappear. Erik looks forward to scaling Banana without a dedicated DevOps team for the foreseeable future in partnership with Zeet.


Want to try out Zeet in your own engineering team? Signup for a Free Trial today!

Johnny Dallas

Johnny Dallas