First time at Zeet?

8 Dec
2021
-
8
min read

Banana Deploys Your Cloud Machine Learning Models

Banana optimizes machine learning model deployments using Zeet, enhancing uptime and focus on core tech. The collaboration addresses multi-cloud challenges.

Johnny Dallas

CEO & Co-Founder, Zeet
Case Study
Content
heading2
heading3
heading4
heading5
heading6
heading7

Share this article

Banana offloads undifferentiated AWS & GCP problems to Zeet and focuses on their core technologies.

Banana is a startup that takes a customer's data science and AI models or data sets and converts them into a high-throughput real-time API. Whether you’re running natural language processing, deep learning, or computer visions workloads, on a number of frameworks including Tensorflow, Pytorch, and more with Banana, the whole process is trivial, allowing you to focus on your algorithms, models, and data, not the infrastructure to run them.

Their in-house team of data scientists and machine learning experts focuses on reducing the cost and latency of the deployed model through ML-specific infrastructure optimizations while maintaining and improving model quality. This process requires engineers without DevOps expertise to comfortably operate a complex multi-cloud architecture.

Today, everything that Banana deploys is deployed on Zeet. Erik Dunteman, Banana's founder, explained the velocity their lean engineering team has achieved with Zeet in an interview.

Before Zeet With Zeet
System-wide production outage every three weeks Zero outage events
10 microservices, 80% with automatic deployment 25 microservices, 100% with automatic deployment — without adding engineering headcount or additional tooling
Banana’s founder, Erik, spends 25% of his time on manual deployments and fixing infrastructure Zeet handles deployments automatically, Erik is 100% focused on architecture and growth

While it might initially seem strange that one infrastructure dev tools company relies on another, this deep dive into Banana's DevOps journey reveals how a complementary architecture lets Erik and Banana offload their undifferentiated problems to Zeet and focus on the core technologies they offer their customers.‍

Banana's Architecture

After working through an MVP in Google Collab, Banana serves its clients fully provisioned end products: the client provides a machine learning model, Banana creates and hosts an API so that the client can consume their model with better, faster, and cheaper results. Imagine you’re working in a Jupyter Notebook and you can handle all your Machine Learning with one line of code: that’s the power of Banana. Think of it as a sort of Serverless framework, almost like AWS Lambda, but for your ML needs. While tools like Amazon Sagemaker exist, AWS Sagemaker still requires a huge learning curve—Banana is as easy as making an API call and you get everything you need.

Banana handles everything under the hood, and you just provide a model and/or data. Using SDKs for Python, Node, and Go, Banana's clients will deploy ML models, and integrate those ML models directly into their production systems. This requires production-grade architecture on carefully provisioned infrastructure.

When a Banana user makes an API call to Banana’s endpoint, the request begins its journey at a proxy layer. An async service queue serves as middleware for business logic. Its most important function is validating the request. It checks the API keys used against the endpoint, ensures that the key owns the model that it is trying to call and has all the necessary permissions, and pulls the ID of the downstream repository that runs the actual model. Then, it passes the call to the server that performs the machine learning.

Every part of the system is now hosted on Zeet, from the middleware to the machine learning servers to custom usage dashboards. The machine learning servers are GPU or TPU clusters on either an AWS or GCP, which presents unique architectural challenges.

Architecture Detail: GPU-specific problems

While some of the architectural challenges that Banana faces are shared by any SaaS operator with a Node website, much of the complexity they face comes from using GPUs and other hardware accelerators, and doing so across multiple clouds. General Platform-as-a-Service (PaaS) offerings like Heroku don't offer hardware acceleration, while GPU-specific PaaS options are prohibitively expensive, and neither of these options give you much recourse if you wnat to host workloads across more than 1 cloud. Similarly, reasonable serverless options simply don't exist for meaningful GPU loads. So Banana needs provisioned servers with GPUs.

Architecture Detail: Multi-Cloud

Amazon Web Services (AWS) and Google Cloud Platform (GCP) offer different machine types and cost structures. To serve a wide range of customer use cases, Banana needs to be able to deploy its models across both their GCP and AWS accounts, as well as any clouds they might onboard to in the future. Specifically, for customers who prioritize speed and latency, Banana deploys models on GCP's Tensor Processing Units (TPUs), a cutting-edge hardware accelerator for certain types of machine learning processes. However, runtime on these machines is relatively expensive, so more cost-sensitive models are deployed on AWS, where Banana has contracts to provision GPUs below on-demand pricing across AWS's wide range of GPU options.

Zeet has made it beautifully easy [to deploy between multiple platforms], just change the deployment target and it is up. We don't even need to think about multiple platforms at this point, it's just which platform has the machine type.

This specialized multi-cloud architecture complicates ml model deployment and DevOps procedures but enables an essential component of Banana's service.

Banana's Problem

Banana is an early-stage startup of five people, four of whom are technical. The founder, Erik, manages the architecture of their stack and the business logic himself, and wants his team of three machine learning engineers to have the best developer tools possible so that they can focus on the core technical problems the business is working to solve—allowing anyone to deploy machine learning models using a single API endpoint.

The problem doesn’t have an easy solution, as Banana’s Machine Learning engineers are navigating dozens of instance types (from ec2 instances to GPUs), docker images, Kubernetes clusters, and vector databases across multiple cloud providers. Flexibility is of course important, as a high level of configurability and customization is integral to the idiosyncrasies of each model. But, Erik realized their in-house solutions built directly on AWS services, while flexible, were slowing them down more than they was helping them.

Manual Deployment and Provisioning

Before Zeet, deploying a new or updated model was always a fraught task. Here was Erik's sequence:

  • Provision ~10 GPUs on AWS
  • Launch each one off of an Amazon machine image
  • Docker container starts at runtime
  • "Hacky" script performs git pull on the model repo on restart

A rolling redeploy service tied the system together. Using the AWS SDK, the script restarted every GPU in the fleet one by one. These manual restarts addressed GPUs reaching a stale or unhealthy state and pulled in updates to the model. However, as the architecture grew more complex and its load increased, this approach soon caused more problems than it solved.

As I was maturing as a founder, I realized...the solutions I had naturally landed on, using the AWS API to automate things, got super clunky. We had a lot of forgotten instances. A lot of my job became figuring out the boto3 Python API.

Cascading Reboots

Erik freely admits that Banana's infrastructure to deploy machine learning models was pretty messy before they migrated to Zeet. One major issue that the migration resolved was a battle between two in-house uptime solutions. In addition to the aforementioned rolling redeploy system, Banana relied on a custom health checker that would forcibly restart any GPUs or Amazon ec2 instances that were unhealthy.

Unfortunately, the two systems collided in a tangled mess. As the systems didn't communicate with one another, sometimes they would spawn restart chains. The rolling redeploy would restart an instance, but before it was done restarting, the health checker would identify it as unhealthy and restart it. As a result, Erik said "whenever I would go to deploy any updates to the model or the model handling code my next four hours would be dedicated to making sure the rollout went through."

If this approach to deployment had continued, Banana would have incurred the substantial expense of hiring devops engineers or have faced ML engineers’ productivity being hobbled by deployment time. Clearly, Banana needed to optimize their approach to deployment, so that Erik and his team could focus on their business, not their servers.

Stacking Dev Tools for Success

On the surface level, it might not make sense for one infrastructure devtool (Banana) to rely on another infrastructure devtool (Zeet). But, as Erik explains:

"The entire point of dev tools is to focus on a repeated but undifferentiated part of the stack and automate that. ‘Undifferentiated’ means that a team building Uber for Dogs wants to focus on the core aspects of what makes them Uber for Dogs. They don't want to focus on the same infrastructure that Uber for Cats needs to build or a dating app needs to build. In the end there is a certain chunk of infrastructure that is undifferentiated and requires linear time spent to build it. As you get deeper into infrastructure and machine learning modeling, you see things repeat themselves and you can chunk things into smaller and smaller components."

"Zeet is doing the undifferentiated work of the deployment workflow, taking our code that we push to Git and getting it live on whatever machines we need it to be live on, plus handling things like networking, health checks, restarts. That is something we would have to build in house otherwise and we don't want to because it is undifferentiated for us. Zeet takes that same product and sells it to other people which is cool because that proves that it wouldn't be special for us to build it in house."

"Banana does the same thing at the machine learning level. Thankfully, we don't need to think about general infrastructure, we can think about machine learning model optimization-specific nuances and target the areas that are undifferentiated just like Zeet has done with deployment. At the core of what we do, we take a black box model, we don't care what it is, and figure out how to run it faster, cheaper, and with higher quality."

"Basically there is not an overlap in the undifferentiated areas of the stack that we focus on. And we are better equipped to focus on the areas that we care about because we rely on a different dev tool provider for the things that are table stakes."

Banana's New Groove

With Zeet, Banana engineers merge an update to a model into the main branch of its repository on GitHub, and the trained model goes live automatically. Zeet handles more than just deployment, it provides Banana with a complete production dev tools setup. Today, everyone needs CI/CD. Everyone needs off-the-shelf templates for things they deploy over and over again. Everyone needs automatic deploys, everyone needs rollbacks, everyone needs the services to not go away when they crash. Now Banana's engineers work on rock-solid infrastructure and spend almost zero time on SRE and traditional DevOps.

What is particularly remarkable is what Erik and the rest of Banana's engineers are doing with the time that they have saved.

Machine Learning Engineers Using Zeet’s API

Not only is Zeet saving Erik time, but it is enabling his team of machine learning engineers to interface directly with their infrastructure in new ways. When Banana onboards a new engineer, one of their tasks is to get a minimal ML model deployed on Zeet and wired into the rest of the stack. This step takes half a day to learn, and then new engineers can deploy whatever they need to in their daily work without ever being blocked by DevOps.

Additionally, Banana extends Zeet's default capabilities through the Zeet API to handle their unique needs. Banana has an in-house autoscaler to balance cost and latency for its customers. This ML-specific infrastructure is elastic, in that through auto-scaling, it scales up the model server if latency spikes, then scales back when the additional capacity is no longer needed. Building Banana's autoscaler with the Zeet API not only let them save money and mental overhead by avoiding a more complex service like AWS Cloudwatch, but it also solves the problem so much so that Erik spends approximately zero time worrying about things like whether or not the replicas he was provisioning are healthy; he just knows that he can rely on the replica count and metrics that the API gives him. Working at a higher level of abstraction than AWS's boto3 Python API, his previous tool, makes the autoscaler a more reliable and maintainable piece of infrastructure.

Improved Deployment Velocity & Code Review

This improved developer experience has dramatically increased deployment velocity while making space for adding best practices like branch-based deploys and code reviews. In their first week after switching over to Zeet, Banana's engineering team deployed five times as often as they had the previous week.

But deployment speed isn't a panacea unless you are confident in the code you are deploying. So Zeet also makes code reviews on branch-based PRs the lowest-friction way of deploying code. Banana's engineering team quickly built a habit of performing a code review before merging a branch, as merging a branch now equals deploying the code. With their process now firmly established, the team of four can push as many as five PRs a day into production when needed, where each PR represents a material change to a machine learning model running on complex infrastructure.

Ultimately, pain reduction across all their workloads enables Banana to focus their engineering efforts on core competencies like smoothing out GPU & TPU idiosyncrasies.

Supporting Each Other, Past and Future

At the end of the interview, Erik mentioned that one of his favorite parts of working with Zeet is the customer service time and speed of fixes going live. When dealing with AWS directly, the things that Banana cared deeply about getting optimized or fixed were very long-tail and just wouldn't get addressed. As a Zeet customer with unique needs, Banana's requests for support for issues with multi-cloud or multiple hardware accelerators are met within hours or days, which improves Zeet for all of its users.

We have a zero-person SRE team now and for the foreseeable future. In a DevOps role, I don't touch SRE and don't do much traditional DevOps.

Banana and Zeet have been growing alongside each other for years through various iterations of each idea, both offering their users developer tools that make undifferentiated infrastructure problems at different levels of the stack disappear. Banana is the easiest way to go from data + model to output, and Zeet is the easiest way to go from prototype code to production application. Erik looks forward to scaling Banana without a dedicated DevOps team for the foreseeable future in partnership with Zeet.

Want to try out Zeet in your own engineering team? Signup for a Free Trial today!

Subscribe to Changelog newsletter

Jack from the Zeet team shares DevOps & SRE learnings, top articles, and new Zeet features in a twice-a-month newsletter.

Thank you!

Your submission has been processed
Oops! Something went wrong while submitting the form.