If you are a software developer, you already know what Node.js is. You know how to create an API and may feel like Superman while implementing new features. However, as you attract more users, response time gets slower. With workloads increasing, your application may start to fail. For those with existing applications, this article explores the most likely issues and how message queues, specifically with RabbitMQ in Node.js, effectively helps applications connect and scale. For others planning to build a new application, I incorporate useful tips to help you avoid issues up front.
The Pros and Cons of Node.js
The same language on the backend and frontend
A thriving open source community
Relatively fast (mainly due to the fact that V8 compiles it into native machine code)
The downside of Node.js is that CPU intensive work could slow down or make Node.js completely unresponsive. Although there are plenty of examples on the web with tutorials on how to implement API’s using Node.js, very few address how to deal with making sure Node.js can scale and handle large loads. What approach should be used when you already made significant investments into using Node.js and need to be able to process compute intensive tasks?
The good news is that there are tons of libraries available that can be readily used. The bad news is that there are a lot of cases when third-party code would process data synchronously and could potentially slow down your API’s or make them unusable. Let's explore how message queues come into play.
Message Queue Overview
Offloading work to a separate process is key to building scalable web applications and APIs. You need to be able to transfer compute intensive tasks from the request/response lifecycle. When building applications you can use one of two interaction patterns:
Request-Response (synchronous): An answer is returned with a response. For example, in e-commerce applications, a user is notified immediately if a submitted order has been processed or if there are any issues.
Fire-and-forget (asynchronous): A request has been received and an alternative means to get the response is provided. For example, when users import a large amount of data that needs to be processed, they receive acknowledgement that the data has been received and instructed to check an import queue to view the status. Or, a message is sent upon import completion.
In both synchronous and asynchronous operations, you can offload processing and free up resources so that the application can handle other requests. This is done using message queue. Message queues come in many shapes and sizes, and can organize the flow of messages in different ways. A very important advantage of using message queues in your project is feature growth, because what may start as a simple project can easily grow into a monster if not planned properly. We suggest considering RabbitMQ.
RabbitMQ is an open source message broker software that implements the Advanced Message Queuing Protocol (AMQP). A message broker is an architectural pattern for message validation, transformation and routing. It gives applications a common platform to send and receive messages and a safe place for messages to live until received. Some of the benefits we've seen from using RabbitMQ include:
Connectability and scalability through messaging (i.e., applications can connect to each other as components of a larger application)
Asynchronous messaging (decoupling applications by separating sending and receiving data)
Robust messaging for applications
Runs on all major operating systems
Supports a huge number of developer platforms
Open source and commercially supported
RabbitMQ Main Concepts
RabbitMQ is a very powerful concept when building applications that need to scale. It is a common pattern when building distributed systems to have a work queue that separates work producers from consumers (workers).
Messages are sent by Producers
Messages are delivered to Consumers
Messages go through a Channel
There are many messaging patterns, which have been written about ad nauseum. Two noteworthy messaging patterns in the context of RabbitMQ are Competing Consumers and Remote Procedure Call.
Using Rabbit MQ with a competing consumers messaging pattern
To illustrate, consider a user who needs to import a large amount of data. Processing this data is a CPU-intensive task. A naive implementation could be:
User uploads data (request)
Server processes the data
Respond to the user with a result (reply)
If you try to process this data directly using a Node.js process, you will be blocking event loop. Any requests that come in during data processing will have to wait, which is not good. The system is unresponsive. A better approach would be:
User uploads data (request)
Server sends a message to a queue
A response is sent to the user with a message that data has been received successfully and will be processed (reply).
The message that was sent to a queue contains all required information (user information, file location,etc.), and will be picked up and processed by an available consumer (worker). The user will be notified after the data has been processed. This approach (fire-and-forget) has several advantages:
The server is able to handle other requests immediately
Processing is delayed if there are no resources available at that moment
Processing is retried if it fails the first time
In the above example we applied a competing consumer pattern. When there is a message in a queue, any of the consumers could potentially receive it. Simply speaking, consumers compete with each other to be the message receiver.
Using Rabbit MQ with a remote procedure call pattern
In instances when you have to respond synchronously (request/reply) within a reasonable amount of time, but need to do relatively expensive computations, you can use a remote procedure messaging pattern commonly known as Remote Procedure Call (RPC). This entails a producer sending a message and a server replying with a response message as demonstrated in the following figure.
When a producer connects to a RPC queue, it creates an anonymous exclusive callback queue. It then sends a message with a replyTo property, so that the worker knows where to send a response. The worker is waiting for messages from the RPC queue. After the worker is done processing the message it sends a message back to the Reply Queue. The producer waits for a response by listening to the Reply Queue.
In this scenario, you need to ensure that you have enough workers to process incoming messages. However, you effectively move the load from the Node.js process to another server. When possible, avoid RPC, if you can use an asynchronous approach.
It is a common pattern in systems to process workloads separately. This eliminates tight coupling between producers and consumers, which enables application to be simpler and easier to scale. There are many advantages and benefits of using this approach.
Decoupling - separate work producers and consumers
Reliability - easy to implement retry logic
Efficiency - distributing workload throughout the time
Scalability - distributing work throughout nodes
Making external systems easier to integrate
The same principles can be applied using other queues (ZeroMQ, Kafka, ActiveMQ, Amazon SQS, Azure ServiceBus and more). In followup posts, we will explore implementation details.