Multi-Thread in Node.js: What are Worker Threads?
Node.js is a free, cross-platform JavaScript runtime environment, while single-threaded in nature, executes asynchronous code using several thread instances in the background.
Because of its design, Node.js has received a lot of flak. It seems unusual that Node.js doesn't have direct access to thread instances when compared to programming languages like Java, C, or Python.
The worker_thread module in Node.js 11 allows us to launch many threads on a single core. We could have used the --experimental-worker parameter to utilize this module in Node.js 10, but with Node.js 11, we can now skip it.
Different threads execute different callbacks that are first routed to the event loop due to the non-blocking nature of Node.js. The Node.js runtime is in charge of all of this.
We will go over the following:
- Introduction
- Child Processes, Clustering, and Worker Threads
- Why Threads in JavaScript Will Never Exist?
- Is Node.js a single-threaded?
- Why is Node Unable to Handle CPU-intensive Tasks?
- Worker Threads: What Are They and How Do They Work?
- Creating and Running Workers
- Communication Between Parent and Worker Threads
- Worker Thread Use Cases
Introduction
JavaScript was designed to be a single-threaded programming language that ran in a browser. Being single-threaded means that in the same process, only one set of instructions is executed at any given time.
This made it easy to implement the language and for developers to use it. Previously, JavaScript was only useful for adding interactivity to web pages, form validations, and other activities that didn't require multithreading.
The creator of Node.js, Ryan Dahl, saw this limitation as an opportunity. He sought to build a server-side platform that didn't use thread instances and relied on asynchronous I/O. Concurrency is a difficult problem to tackle. When many threads access the same memory, race conditions can occur that are difficult to reproduce and fix.
JavaScript's early beginnings were limited to adding a little amount of interaction to websites. As a result, there was no need for multithreading. However, times have changed, user demands have grown, and JavaScript has emerged as "the most popular web programming language."
Multithreading is becoming commonplace. Multithreading is not possible with JavaScript because it is a single-threaded language. Fortunately, Node.js is a fantastic remedy for this problem.
Child Processes, Clustering, and Worker Threads
For a long time, Nodes may be multi-threaded by using Child Processes, Clustering, or the more recent recommended approach of employing a module called Worker Threads.
Child processes, which have been accessible from version 0.10, were the first way to create many threads for your application. This was accomplished by creating a node process for each additional thread required.
We can simplify the creation and management of Child Processes with Clustering, which has been a stable release since about version 4. When paired with PM2, it produces fantastic results.
Now, before we start multithreading, there are a few things you should be aware of:
i) For I/O Tasks, Multithreading is Already Available
The libuv thread-pool is a layer of Node that is already multithreaded. If file and folder management, TCP/UDP transactions, compression, and encryption aren't asynchronous by nature, they're sent to libuv and processed in the thread pool.
ii) Worker Threads/Child Processes are Only Useful for Synchronous JavaScript Logic
Using Child Processes or Worker Threads to implement multithreading will only work for synchronous JavaScript code that performs heavy-duty activities like looping and calculations. You will not observe a performance increase if you try to outsource I/O work to Worker Threads.
iii) Difficult to Dynamically Manage Numerous Threads
Adding an extra thread to your application is simple enough, and there are plenty of tutorials available. Creating threads equal to the number of logical cores on your system or virtual machine, and managing work distribution to these thread instances, on the other hand, is far more difficult, and coding this logic is well beyond most of our pay grades.
We are fortunate to live in an open-source world with incredible contributions from the Node community. That is to say, there is already a module that will allow us to dynamically create and manage threads based on the CPU availability of our system or virtual machine.
Why Threads in JavaScript Will Never Exist?
Many people may now believe that a new module should be added to the Node.js core that allows us to construct and sync threads. Isn't that all there is to it?
It's a shame that a mature server-side platform like Node.js doesn't offer a decent manner of tackling this use case.
Adding thread instances, on the other hand, alter the language's character. Threads cannot simply be added as a new collection of classes or functions. The language needs to be changed. For threads to cooperate, languages that support multithreading incorporate keywords like "synchronized."
Even some numeric types in Java, for example, are not atomic; if you don't synchronize their access, two threads could change the value of a variable. After both threads have accessed the variable, it will have a few bytes modified by one thread instance and a few bytes changed by the other, resulting in no meaningful value.
Is Node.js a single-threaded?
Isn't it true that Node.js applications are single-threaded?
Well, sort of.
We can indeed perform things in parallel, but we don't establish threads or synchronize them. When it's time to transfer data back to our JavaScript code, the virtual machine and operating system do the I/O in parallel for us, and the JavaScript section runs in a single thread.
In other words, everything but JavaScript code runs in parallel. Synchronous JavaScript code blocks are always executed one at a time:
let flag = false
function doSomething() {
flag = true
}
If all we do is asynchronous I/O, this is ideal. Our code is made up of small synchronous blocks that execute quickly and send data to files and streams. As a result, JavaScript code is so quick that it doesn't interfere with the execution of other JavaScript code.
Waiting for I/O events takes a lot longer than waiting for JavaScript code to run. Consider the following example:
db.findOne('Select * from customers ... limit 1', function(err, result) {
if (err) {
return console.error(err)
}
console.log(result)
})
console.log('Running query')
setTimeout(function() {
console.log('Getting customers...')
}, 1000)
The "Running query" notice will appear immediately after calling the database query, even if it takes a minute. And, depending on whether the query is still running or not, we'll receive the "Hey there!!!" message a second after launching it.
Node.js application just calls the method and does not prevent other code from running. When the query is completed, it will be notified via the callback, and we will receive the response. Due to its non-blocking nature, it provides different callbacks for different threads.
Why is Node Unable to Handle CPU-intensive Tasks?
The event loop is the name for the single thread that runs in Node programs. When an event is triggered, Node uses it to handle the subsequent execution.
If the event's execution is expensive, as it is for CPU-bound and I/O-bound operations, Node can't afford to run it on the event loop without bogging down the only available thread.
Rather than waiting for the costly process to finish, the event loop registers the callback function associated with the event and moves on to the next event in the loop.
The expensive operation is delegated to the worker pool, a collection of auxiliary threads. The task is executed asynchronously by a thread in the worker pool, which then notifies the event loop.
The callback function registered for the operation on the event loop's thread is then executed.
Because callbacks are called on the event loop, any of them that involve CPU-intensive processes, such as complicated mathematical calculations used in machine learning or large data, will cause the event loop to be blocked for an extended time. During this time, the application will not do any other duties in the event loop, such as responding to client requests.
A Node application's performance suffers greatly in such a situation. For a long time, Node was thought to be unsuitable for CPU-intensive processes.
The introduction of worker threads, however, gave a solution to this issue.
Worker Threads: What Are They and How Do They Work?
In Node v10, worker threads were added as an experimental feature. In version 12, it became stable. Because worker threads aren't a built-in component of Javascript, they don't operate exactly like a standard multithreading system.
However, instead of blocking the application's event loop, it permits expensive tasks to be delegated to different threads. So, how do worker threads function behind the scenes?
A worker thread's job is to execute code that has been specified by the parent or main thread. Each worker operates independently of the others. A worker and their parent, on the other hand, can communicate via a message channel.
When Javascript doesn't enable multithreading, worker threads employ a particular method to keep workers segregated from one another.
We're all aware that Node is built on top of Chrome's V8 engine. V8 allows you to spawn isolated V8 runtimes. V8 Isolate is isolated instances with their Javascript heaps and micro-task queues.
These segregated V8 engines operate worker threads, with each worker having its V8 engine and event queue. In other words, a Node application with workers running has many Node instances operating in the same process.
Even though Javascript does not support concurrency by default, worker threads provide a workaround for running several thread instances in the same process.
Creating and Running Workers
In this example, we'll run a task that calculates the nth term of the Fibonacci sequence. It's a CPU-intensive task that, if run without worker threads, would prevent Node application's single thread, especially as the nth term grows.
Our implementation will be split into two files. The first file, app.js, contains the main thread's code, which includes the creation of a new worker. The code for the work we create is contained in the second file, worker.js. It's where you'll find the code for CPU-intensive Fibonacci calculations.
Let's look at how we can handle the new worker generation in the parent thread.
const { Worker } = require("worker_threads");
let num = 40;
// Create new worker
const worker = new Worker("./worker.js", {
workerData: {
num: num
}
});
// Listen for a message from worker
worker.once("message", result => {
console.log(`${num}th Fibonacci Number: ${result}`);
});
worker.on("error", (error) => {
console.log(error);
});
worker.on("exit", (exitCode) => {
console.log(exitCode);
});
console.log("Executed in the parent thread.");
To start a new worker thread, we use the Worker class. When establishing a new Worker instance, it accepts the following inputs.
new Worker(filename[, options])
The filename option specifies the location of the file containing the code to be executed by the worker thread. As a result, we must specify the location of the worker.js file.
The Worker constructor additionally supports a variety of options, which are detailed in the official Worker documentation. However, we've decided to exclusively use the workerData option.
When the worker starts up, the data given by the workerData option will be available to it. We can easily pass the value of "n" to the Fibonacci sequence calculator using this option.
When the execution is complete, the parent thread adds a few event listeners to the worker to get the results.
We've opted to listen for three events in this implementation. They are as follows:
- Message – When the worker sends a message to the parent, this event occurs.
- Error – If an error happens while running the worker, this is triggered.
- Exit – This is activated when the worker completes their task. If it terminates after a process.exit() call, the exit code will be 0. If worker.terminate() was used to terminate the execution hen code will be 1.
The worker sends a message via the message channel that connects the worker and the parent in the message event. In a subsequent part, we'll look at how communications work.
The parent thread can resume processing without waiting for the results once the worker has been established. The string "Executed in the parent thread." is written to the console before the nth Fibonacci sequence is spawned when the above code is run.
As a result, we get something like this.
Executed in the parent thread
45th Fibonacci Number: 1134903170
Now, inside the worker.js file, we'll write the code that the worker uses.
const { parentPort, workerData } = require("worker_threads");
parentPort.postMessage(getFib(workerData.num))
function getFib(num) {
if (num === 0) {
return 0;
} else if (num === 1) {
return 1;
} else {
return getFib(num - 1) + getFib(num - 2);
}
}
To calculate the nth term of the Fibonacci sequence, we utilize the tutilizersive getFib function.
But it's the method we get data from the parent via the workerData option that's more interesting. The worker can get the value of "num" passed when the object was created using this object.
Then, using this parentPort object, we send a message to the parent thread with the Fibonacci calculation results.
Communication Between Parent and Worker Threads
The parentPort object, as seen in the previous example, allows the worker to communicate with the parent thread.
The MessagePort class is represented by this parentPort. When a message is sent using a MessagePort instance by either the parent or the worker, the message is written to the message channel and a "message" event is triggered to notify the receiver.
Using the same postMessage method, the parent can send a message to the worker.
worker.postMessage("Message from parent");
The message channel can be used to transmit several messages back and forth between the parent and the worker at any time.
Worker Thread Use Cases
- When you need to do a CPU-intensive operation, Worker Threads are a fantastic solution. They speed up filesystem operations and help a lot when you need to do multiple actions at the same time. The best part is that, as previously said, they also operate on single-core machines, ensuring superior performance on any server.
- Worker Threads were used during a large-scale upload operation in which we had to check millions of users and store their data in a database. The operation was around 10 times faster when using a multithreaded technique than when using a single thread.
- Worker Threads were also used to manipulate images. We needed to create three thumbnails (each with different sizes) from a single image, and once again, a multithreaded technique came in handy, saving me time.
Conclusion
If you need to do CPU-intensive operations in your Node.js application, worker threads is a fascinating and useful package. It's similar to threads without shared memory and, as a result, without the risk for race situations. You should feel safe utilizing the worker threads module in production-grade applications now that it's stable in Node.js v12 LTS.
Even though Node does not enable multithreading in the classic sense, Worker Threads is a reasonable workaround. So, if you thought Node applications couldn't have more than one thread, it's time to let go of that belief and give Worker Threads a shot.
Monitor Your Node.js Applications with Atatus
Atatus keeps track of your Node.js application to give you a complete picture of your clients' end-user experience. You can determine the source of delayed response times, database queries, and other issues by identifying backend performance bottlenecks for each API request.
To make bug fixing easier, every Node.js error is captured with a full stack trace and the specific line of source code marked. To assist you in resolving the Node.js error, look at the user activities, console logs, and all Node.js requests that occurred at the moment. Error and exception alerts can be sent by email, Slack, PagerDuty, or webhooks.