Building More Reliable Workflows With Events
Dan Farrelly· 11/10/2022 · 5 min read
Every application starts off with running some basic code in the background. Common use cases include sending email notifications, handling subscription payment webhooks, or periodically fetching data from third party APIs.
At some point, most applications will be required to run tasks in the background that are critical to the core functionality or business value of the product. This is when applications need to get serious and consider more robust solutions as failed tasks can represent lost revenue.
Application-critical tasks are everywhere and might include:
- Ingesting data from user uploads or third party APIs, transforming or validating data and sending user notifications
- User actions that require propagation across several databases, systems or external integrations
- Transaction processing including payments sent and received
All of these tasks must be reliable and fault-tolerant. Failures in these situations can lead data inconsistencies or broken states for users.
What is needed
As tasks gets more complex, it often can be split into multiple independent steps or phases. Many folks use the term "workflows" or step functions to describe these more complex tasks where, often, logic is required to glue the steps together.
Developers or operators need confidence that their critical workflows are running consistently, reliably, and durably. Quite often, one part of a workflow may be more susceptible to failure due to a flaky external API or rate limiting. If this one part of the code fails, it should be independently retried, avoiding the need to restart a job from the beginning.
Lastly, observability is crucial to understanding errors and results from each stage of your task. Status and logs should be grouped by type of workflow and individual workflow runs.
Splitting the code into steps
There are many different ways to split tasks into discrete steps (e.g. actions, sub-tasks, etc.). A simple approach is that teams will chain together multiple queues and workers. This works, but it requires more overhead, it may be difficult to audit, and will likely require separating of related code into separate jobs.
Alternatively, another approach could be to use a workflow orchestration framework. Each has it's own paradigm for building this logic often with significant learning curves.
Writing critical workflows as code
Our goals were to enable developers to write complex jobs combining multiple steps right in code while minimizing custom DSL and concepts to learn.
We don't think writing step functions or workflows as a large JSON configuration is a great developer experience. Also, using a online GUI to design these isn't always ideal as the logic or DAG may be edited or managed outside of the actual code's version control.
We also believe it should be simple to run, from local development, to production where code can be deployed to any number of platforms.
Let's take a use case to demonstrate our approach. A product is a CRM tool that enables a user to upload a large CSV file with contacts information. The API triggers this to run with the event api/contact_list.uploaded
. This advanced CRM tool validates the upload then uses third party APIs to enrich the data with information about the contact's business then waits for the user to review and filter this list down before inserting all data into the CRM database. When the user approve or rejects and selects filers, the API will send another event: api/contact_list.reviewed
. Here's what the code could look like:
import { inngest } from "./client";inngest.createFunction({ name: "Contacts Import and enrichment" },{ event: "api/contact_list.uploaded" },async ({ event, step }) => {const { isValid, errors } = await step.run("Validate upload contents", async () => {// Download the csv file, validate columns and data in each rowconst { isValid, errors } = downloadAndValidateCSV(event.data.filename);return { isValid, errors };});if (!isValid) {return await step.run("Notify user of invalid contents", async () =>await sendContactsImportFailedEmail(event.user.id, errors));}// Enrichment may fail at times due to networking blipawait step.run("Enrich contracts information", async () => {// Call a third party API service to enriches each contact's info// then uploads the data to an object store when complete});const listReviewedEvent = await step.waitForEvent("api/contact_list.reviewed", {timeout: "7d",match: "data.upload_id", // data.upload_id is in both events and must match to proceed})if (listReviewedEvent.data.is_approved === false) {return await step.run("Delete uploaded contact lists", () => { /* ...*/ });}const { totalUsersAdded } = await step.run("Create contacts in CRM", async () => {const contacts = await downloadEnrichedContactList(event.data.filename);const filteredContacts = applyFilters(listReviewedEvent.data.filters);return await insertContactsIntoCRMDatabase(event.data.account_id, filteredContacts);});await step.run("Notify user of successful import", async () =>await sendContactsImportSuccessEmail(event.user.id, totalUsersAdded));})
- All code within each
step.run()
callback will be run independently and will be retried upon failure, resuming from where the function left off. - Using
step.waitForEvent()
enable the workflow to wait for additional user interactions or events via event coordination. - The results of each step, including failures and retries are logged can be inspected on the Inngest dashboard for complete observability.
More info on these and other tools
are in our docs.
Over to you
For critical code that runs in the background, developers need guarantees and observability in a simple to use package. Simple to use, easy to maintain. Investing in any solution should mean transparency, portability and potential to self-host. We keep our system including the execution engine that powers all of this functionality as well as the SDK completely open source: inngest/inngest, inngest/inngest-js.
Our beta release is out today and can be used with Inngest Cloud today - it's free to start using. Come join us on Discord or Github to share feedback or seek support on building out some critical workflows for yourself.