Workflows in Google Cloud — the serverless way
Most companies handle business processes that you could think of as workflows. Your inventory management process, e-commerce transaction or IT service likely requires an orchestration of tasks spanning various IT systems. In this article I will put some light on serverless workflow development in Google Cloud.
Let’s have a look at a sample workflow generating invoices:
Each of these steps can be implemented as a call to an API service based on Cloud Run, Cloud Functions or a public SaaS APIs like e.g. SendGrid to send an e-mail with a PDF attachment.
Real-life scenarios are typically more complex than the example above and require continuous tracking of all workflow executions, error handling, decision points and conditional jumps, iterating arrays of entries, data conversions and many other advanced features.
This is where workflow products come to the rescue.
Hold on, but why can’t I just do it all in e.g. Cloud Functions?
First, Cloud Functions can run for up to a few minutes while your workflow may need more time to complete, or you just need to pause in between the steps when polling for a job status. Workflows, especially involving human interactions, can run for days.
Attempting to use multiple Cloud Functions and chain them together with e.g. PubSub works but you have no simple way of developing or operating such a workflow and you quickly end up with dependencies that are difficult to maintain.
Workflow products also support exception handling and give visibility on executions status, including successes and failures. A workflow engine can also recover from errors, significantly improving reliability of applications using it.
Workflow offerings often come with built-in connectors to popular APIs and cloud products, saving developers’ time and enabling them to simply plug into an external IT service.
Workflow products in Google Cloud
Cloud Composer
Cloud Composer is an excellent choice for data engineering pipelines, ETL orchestration, big data processing or machine learning workflows. Having said that, very light, bursty or latency-sensitive workflows typical for many serverless applications are not the best fit for Cloud Composer and underlying Apache Airflow.
To oversimplify, if you want to manage your data processing, ETL or machine learning pipelines and integrate with data products like BigQuery or Dataflow — Cloud Composer is the way to go.
However, if you want to process events or chain APIs in a serverless way, with bursty traffic patterns, high execution volumes or low latency, you likely need to look at Workflows first.
Workflows
Workflows is a fully managed workflow orchestration product operating as part of Google Cloud. It’s fully serverless and requires no infrastructure management.
It scales down to zero, generating no costs for customers at idle times. If your workflow pauses for a few hours in between the tasks, you don’t pay for this time as execution time doesn’t matter. Pricing is based on the number of workflow steps executed.
Workflows scales out automatically with no “cold start” effect and with a fast transition between the steps. This makes Workflows a good fit for latency-sensitive applications.
Example use cases for Workflows
Transaction management
Let’s imagine a workflow that processes customer orders and triggers an inventory refill from an external supplier in case of an out of stock. During order processing we may want to let our sales team know about large customer orders, and Slack would be our way to communicate.
Let’s consider the following diagram.
The workflow manages calls to Google Cloud’s Firestore and external APIs including SendGrid, Slack or a custom API of our inventory supplier. Data is passed between steps as needed and some steps are executed conditionally, depending on other APIs’ outputs.
This workflow is executed once per customer order and every execution is logged for potential inspection or troubleshooting. In case of an error, the workflow can handle retries or exceptions thrown by APIs, thus improving reliability of the application using this workflow.
Processing files uploaded to a storage bucket
Let’s assume we want to build a workflow that tags files uploaded by users based on file extension. As users can upload text files, videos or images, the workflow needs to interact with different APIs to process their contents.
The process starts with a Cloud Function triggered by a Cloud Storage trigger. The function starts a workflow using a Workflows client library and passes a file path to the workflow as a runtime argument.
A workflow is then deciding which API to use depending on the file extension and stores a tag in a Firestore database.
Google Workflows — how can it help?
Use cases shown above can be implemented with Workflows, out of the box.
Workflows handles sequencing of ‘steps’, managing inter-step dependencies. If needed, workflow can also be designed to pause the execution between steps without generating time-related charges.
Almost any HTTP-based API can be used in a workflow step. You can make calls to Internet-based APIs, including SaaS APIs or your private endpoints right from within a workflow, without having to use Cloud Functions or Cloud Run as a wrapper.
Calls to Google Cloud APIs can benefit from built-in IAM authentication and use workflow’s Service Account to authenticate to other Google Cloud products.
Real life workflows often require that steps communicate with each other. Built-in variables can be used by workflow steps to pass the result of their work to other steps.
JSON is today’s data standard format for most APIs. Workflows automatically converts JSON responses to dictionaries, enabling easy access to the information provided by APIs.
The product comes with an expression language supporting logical and arithmetic operators, dictionaries, arrays and other useful features. This enables implementation of simple data manipulations directly in a workflow, without having to use external compute products, and thus simplifying workflow management.
As part of a workflow execution request, a calling service can pass arguments to indicate e.g. identifier of a transaction that needs to be processed.
Variables and expressions together enable implementation of decision points. Workflows can use custom expressions to decide whether to execute a particular step or e.g. jump to another part of the workflow.
Sub-workflows help in reusing parts of a workflow logic, similar to routines in many programming languages.
Your production workflow needs to be able to recover from some of the API or network issues. What helps in such cases is a combination of exception handling and configurable retries that allow a workflow to properly react to a particular error.
Sample workflows implementing features above, and other capabilities, are available here.
Ok, how do I get started with Workflows?
Visit our Workflows site or simply build your first workflow using the Cloud Console. A free tier allows you to give it a try with no charges. Stay tuned as new features are coming soon!
Happy Workflows development! :)
Filip Knapik
Note: Author is the product manager for Cloud Composer and Workflows at Google