Shuhei Kagawa

Histogram for Time-Series Metrics on Node.js

Dec 29, 2018 - Node.js

The "metrics" library

I have been using metrics library for application metrics of Node.js applications at work. It was already widely used in the company when I joined, and I kept using it without questioning much.

The metrics library was ported from Dropwizard metrics, which is a widely-used metrics library for Java and also called as Coda Hale metrics, Yammer metrics, or Metrics Core. It supports various metrics types like Counter, Gauge, Histogram, Meter (a combination of Counter and Histogram), etc., and nice reporting abstraction.

Just before my last working day of 2018, I saw a weird chart with a p99.9 response time metric with only around 50 data points per minute. Outliers were staying for ~15 minutes (much longer than expected) and suddenly disappearing. I thought I was misusing the library. That's why I started reading the source code of metrics library, especially Histogram.

EDS-based Histogram

The metrics library uses Exponentially Decaying Sample (EDS) for Histogram. The name is intimidating, but the implementation is not so complicated.

It sets a priority to each value based on its timing and some randomness, and values of top-1028 priorities survive (by default). As a result, the chance of a value's survival decays as time goes by.

It seems to have a problem that the influence of an old value stays longer than expected, which was fixed in the Java implementation after the metrics library was ported to JavaScript. Maybe I can port the fix to the JavaScript implementation?

But, wait. Why do I need the decay at all? Most of my use cases of the histogram are to plot percentiles of response times. The data points are per minute. All I want for each data point is percentiles of all the response times measured in the last minute. I don't need response times from previous minutes because they are already plotted on the chart. Also, I don't want values in the last half of the minute to have more influence than values in the first half. So, I don't need the decay effect at all.

In addition to that, EDS randomly ignores values. Yes, it samples. Random sampling is a problem because I'm interested in a small number of outliers.

HDR Histogram

I tweeted about these issues, and my former colleague @cbirchall (Thanks!) suggested to take a look at HdrHistogram. I don't understand how it works (yet), but it claims to keep accuracy without sacrificing memory footprint and performance.

Your Latency Metrics Could Be Misleading You — How HdrHistogram Can Help by Will Tomlin on the Hotels.com Technology Blog illustrates shortcomings of the EDS-based histogram and advantages of the HDR histogram pretty well.

OK, I'm sold.

Benchmark on Node.js

Then, how can I use HDR Histogram on Node.js? I found three implementations:

Also EDS-based histogram implementations:

I compared them, excluding node-hdr-histogram because I think it's an overkill to run JVM only for metrics (and won't perform well anyway). The benchmark code is on a gist, and here is the result on Node.js 10.14.2.

Adding 10K values to a histogram:

  • metrics: 173 ops/sec ±2.00% (80 runs sampled)
  • measured: 421 ops/sec ±1.19% (90 runs sampled)
  • hdr-histogram-js: 1,769 ops/sec ±1.84% (92 runs sampled)
  • native-hdr-histogram: 1,516 ops/sec ±0.82% (92 runs sampled)

Extracting 12 different percentiles from a histogram:

  • metrics: 1,721 ops/sec ±1.93% (92 runs sampled)
  • measured: 3,709 ops/sec ±0.78% (93 runs sampled)
  • measured (weighted percentiles): 2,383 ops/sec ±1.30% (90 runs sampled)
  • hdr-histogram-js: 3,509 ops/sec ±0.61% (93 runs sampled)
  • native-hdr-histogram: 2,760 ops/sec ±0.76% (93 runs sampled)

According to the result, hdr-histogram-js is accurate and fast enough. Check out the gist for more details!

Reset Strategy

While HDR Histogram can keep numbers more accurately than Exponentially Decaying Sample, it doesn't throw away old values by itself. We need a strategy to remove old values out of it. In a sense, EDS is a reset strategy. If we don't use it, we need another one.

Documentation of rolling-metrics library lists up strategies and their trade-offs.

  • Reset on snapshot
  • Reset periodically
  • Reset periodically by chunks (rolling time window)
  • Never reset

Reset on snapshot looks a bit hacky (we need to keep metrics collection only once in an interval) but should be easy to implement and practical. Rolling time window looks more rigorous, but a bit tedious to implement, especially about choosing the right parameters.

I made a quick survey of popular libraries and frameworks.

Rolling time window strategy seems to be most popular, but I couldn't find a consensus on default parameters (length of the time window, bucket size, etc.). For the next step, I'll probably start with reset on snapshot strategy and see if it works well.

Update on Jan 11, 2019: I wrote a package to use HDR histogram with rolling time window.

Conclusion

HDR Histogram is more accurate than EDS-based Histogram for tracking response times in a time series. hdr-histogram-js is accurate and performant. It seems to be the best option on Node.js. We need a way to remove old values from a histogram. Reset on snapshot is easy and practical, but rolling time window is more rigorous.

After the research on this topic, I got an impression that HDR Histogram is well-known in the Java/JVM community, but probably not so much in other communities. I made a benchmark on Node.js in this post, but it might be useful to review your metrics implementation on other programming languages or platforms as well.

Node.js under a Microscope: CPU FlameGraph and FlameScope

Sep 16, 2018 - Node.js, Linux

Last week, I had an opportunity to talk about profiling Node.js applications on production at an internal guild meeting at work. Here is a written version of it plus some additional information.

Background

I have been working on Node.js microservices, which fetch data from API servers and render HTML with React, at work. We monitor response times at load balancers, in application metrics and with distributed tracing with OpenTracing. One of the microservices had a weird gap between 99 percentile response times of itself and its dependencies. It was spending an extra 500 milliseconds—but I didn't know why.

My first suspect was the network. It is the place full of uncertainty. After learning and trying different commands and metrics, I took tcpdump and checked packets one by one with my eyes and a script. There were no significant delays that I had expected. So I had to stop blaming the network—or someone else.

CPU Profiling with Linux perf Command

Because the weird latency was happening in the application itself, I wanted to know what's going on in it. There are mainly two ways to achieve this: profiling and tracing. Profiling records some samples and tracing records everything. I wanted to do it on production, so profiling was naturally a good fit because of its smaller overhead.

For Node.js, there are mainly two different tools. One is V8 profiler, and the other is Linux perf. V8 profiler uses the profiler provided by V8. It covers all JavaScript executions and V8 native functions. It works on non-Linux operating systems. If you use non-Linux machines, it might be pretty handy. On the other hand, Linux perf can profile almost anything including Linux kernel, libuv, and all processes on your OS with minimal overhead. However, as the name suggests, it works only on Linux. According to Node CPU Profiling Roadmap, it seems that V8 profiler is the one officially supported by the V8 team, but Linux perf will keep working for a while. After all, I picked Linux perf because of low performance-overhead and small intervention to applications.

Linux perf record records stack traces into a binary file called perf.data by default. The binary file has only addresses and file names of functions. perf script converts the stack traces into a human-readable text file adding function names from program binaries and symbol map files.

# Install dependencies for `perf` command
sudo apt-get install linux-tools-common linux-tools-$(uname -r)
# Test `perf` command
sudo perf top

# Record stack traces 99 times per second for 30 seconds
sudo perf record -F 99 -p ${pid} -g -- sleep 30s
# Generate human readable stack traces
sudo perf script > stacks.${pid}.out

Now we have human-readable stack traces, but it's still hard to browse thousands of stack traces and get insights from them. How can we efficiently analyze them?

CPU Flame Graph

CPU Flame Graph by Brendan Gregg is a great way of visualizing stack traces. It aggregates stack traces into one chart. Frequently executed functions are shown wider and rarely executed functions are narrower in the chart.

CPU Flame Graph A CPU Flame Graph from a sample application

I found some insights about the application on production with CPU Flame Graph:

  • React server-side rendering is considered to be a very CPU-intensive task that blocks Node.js event loop. However, JSON.parse() was using 3x more CPU than React—it might be because we had already optimized React server-side rendering though.
  • Gzip decompression was using the almost same amount of CPU as React server-side rendering.

There are a few tools like FlameGraph and 0x to generate CPU Flame Graph from Linux perf stack traces. However, I eventually didn't need them because FlameScope, which I'll explain next, can generate CPU Flame Graph too.

FlameScope

FlameScope by Netflix is another great tool for visualizing stack traces in a time-series. It shows a heatmap out of stack traces. Each cell represents a short amount of time, 20 ms if 50 cells per second, and its color represents how many times the process was on-CPU. It visualizes patterns of your application's activity.

FlameScope Image from Netflix/flamescope

If you select a time range on the heatmap, FlameScope shows you a CPU Flame Graph of the range. It allows you to examine what happened when in details.

To use FlameScope, check out the repository and run the python server. Then put stack trace files from perf script into examples directory, and open http://localhost:5000.

I found a couple of exciting insights about the application on production using this tool.

Example 1: Heavy Tasks in the Master Process

The application used the cluster module to utilize multiple CPU cores. FlameScope showed that the master process was not busy for most of the time, but it occasionally kept using CPU for 1.5 seconds continuously! FlameScope showed that it was caused by metrics aggregation.

The master process was aggregating application metrics from worker processes, and it was responding to metrics collectors a few times in a minute. When the metrics collectors asked for data, the master process calculated percentiles of response times and prepared a JSON response. The percentile calculation was taking long time because the application had a lot of metrics buckets and the library that we used was using JSON.stringify() and JSON.parse() to deep-copy objects!

Example 2: Frequent Garbage Collections

FlameScope showed that the worker processes were not overloaded for most of the time, but they had a few hundred milliseconds of CPU-busy time in about 10 seconds. It was caused by mark-sweep and mark-compact garbage collections.

The application had an in-memory fallback cache for API calls that was used only when API calls and retries fail. Even when API had problems, the cache hit rate was very low because of the number of permutations. In other words, it was not used almost at all. It cached large API responses for a while and threw them away after the cache expired. It looked innocent at first glance—but it was a problem for V8's generational garbage collector.

The API responses were always promoted to the old generation space causing frequent slow GCs. GC of the old generation is much slower than GC of the young generation. After removing the fallback cache, the application's 99 percentile response time improved by hundreds of milliseconds!

Node.js Gotchas

perf script collects symbols for function addresses from program binaries. For Node.js, we need something special because functions are compiled just in time. As far as I know, there are two ways to record symbols:

  1. Run your Node.js process with --perf-basic-prof-only-functions option. It generates a log file at /tmp/perf-${pid}.map. The file keeps growing. The speed depends on your application, but it was a few megabytes per day for an application at work. Another problem is that functions in V8 keep moving and the addresses in /tmp/perf-${pid}.map get outdated. I wrote a script to fix the issue.
  2. Use mmarchini/node-linux-perf. It generates the same /tmp/perf-${pid}.map as --perf-basic-prof-only-functions does, but on demand. Because it always freshly generates the file, it doesn't contain outdated symbols. It seems to be the way to go, but I haven't tried this on production yet.

In addition to the above, there are a few more Node.js options that you can use to improve your stack traces—though I haven't tried them on production because the stack traces were already good enough for me:

  • --no-turbo-inlining turns off function inlining, which is an optimization done by V8. Because function inlining fuses multiple functions into one, it can make it harder to understand stack traces. Turning it off generates more named frames.
  • --interpreted-frames-native-stack fixes Builtin:InterpereterEntryTrampoline in stack traces. It is available from Node.js 10.4.0. Check out "Interpreted Frames" in Updates from the Diagnostics Summit for more details.

Docker Gotchas

It gets a bit tricky when you are using containers to run your application. There are two ways to use Linux perf with Docker:

  1. Run perf record and perf script in the same Docker container as your application is running
  2. Run perf record and perf script in the host OS

I eventually chose the option 2. I tried the option 1 first but gave up because I was using Alpine Linux as the base image and it was hard to make Linux perf available on it.

To run perf record in the host OS, we need to figure out pid of the application process in the host.

$ ps ax | grep -n 'node --perf'
21574 pts/0    Sl+    2:53 node --perf-basic-prof-only-functions src/index.js
30481 pts/3    S+     0:00 grep --color=auto node --perf
# or
$ pgrep -f 'node --perf'
21574

$ sudo perf record -F 99 -p 21574 -g -- sleep 30s

perf script collects symbols from binaries and symbol files to get human-readable function names. It needs to be able to read the binaries whose functions were recorded with perf script and /tmp/${pid}.map files that applications generate. However, perf script in the host OS cannot read them with the same file names as the container can. (It seems that this is not the case anymore with the latest Linux kernel because its perf command knows containers. But it was the case for me because I was not using the latest kernel.)

I learned how to overcome the issue from Making FlameGraphs with Containerized Java. I just copied necessary files from the container to the host.

# Horrible hack! Binaries to be used depend on your set up. `perf script` tells you what it wants if anything is missing.
sudo docker cp mycontainer:/usr/bin/node /usr/bin/node
sudo docker cp mycontainer:/lib/ld-musl-x86_64.so.1 /lib/ld-musl-x86_64.so.1
sudo docker cp mycontainer:/usr/lib/libstdc++.so.6.0.22 /usr/lib/libstdc++.so.6.0.22

To copy symbol map files, we need to find the pid in the container. We can do it by checking /proc/${host_pid}/status.

$ cat /proc/21574/status | grep NSpid
NSpid:  21574   6
$ sudo docker cp mycontainer:/tmp/perf-6.map /tmp/perf-21574.map

Now everything is ready! Then we can use perf script as usual.

sudo perf script > stacks.${pid}.out

I set up a sample project for profiling a Node.js application on Docker. It was nice to practice profiling a bit before doing it on production!

Conclusion

Linux perf provides great observability to Node.js applications on production. Tools like CPU Flame Graph and FlameScope helped me to identify performance bottlenecks.

There are some gotchas to profile Node.js applications on Docker with Linux perf. It took some time for me to figure out how to do it because Node.js and Linux evolve day by day and I couldn't find many up-to-date resources online. I hope this post is helpful!

2017 in Review

Dec 25, 2017

Berlin

I moved to Berlin from Tokyo at the end of September 2016. 2017 was my almost first year in Berlin.

I like the city so far. It is more relaxed than Tokyo and other big cities in Europe. Summer is especially nice. BBQ makes it even better. After my office moved to a building in front of Spree River, I enjoy my commute crossing Oberbaum Bridge and walking along the river.

Travels

I traveled more than ever. The destinations were Germany (Dresden, Heidelberg, Frankfurt, Köln), Italy (Venice, Florence, Bologna), France (Paris), UK (London), Portugal (Lisbon) and Japan (Tokyo). I had fun in each of them, but if I have to choose one, I will name Lisbon. The city is full of what I miss in Berlin. Fresh and inexpensive seafood, views from hills, cute ceramic tiles, and beautiful weather. The sky was clear on every single day while I was there, and the highest temperature was 18 degrees in December!

Beer

I am glad to have found Fuerst Wiacek. Their German Movies is my No.1 beer so far. Biererei is a gem in Berlin, where I can buy fresh craft beers from Europe with growlers.

British ale was a discovery to me. I liked pubs in London a lot. I also attended the first craft beer hackathon in the world and won 12 crates of craft beer...!

Shopping

I bought an ergonomic keyboard and a neck support pillow. Both of them lifted up my quality of life. My body is getting older.

Language Learning

I learned a bit of German Language. I finished A1 in May and started A2 after a pause of 5 months. While the learning process is prolonged, now German feels less cryptic to me.

Work

I was lucky to join an awesome team. We work together and hang out together. A research at Google shows that psychological safety is a key to team effectiveness. I feel it on my team.

On the technical side, my team joined a relatively large project and completed it on time. I worked mostly in architecture, performance optimization, type checking with Flow, SRE, etc. for apps with React and Node.js. I also helped my colleagues to start building an internal tool with Elm.

Side Projects

I enjoyed working with Elm. I wrote a mobile weather app, flew to Paris for Elm Europe 2017, built a mobile-friendly pixel editor and talked about it at Elm Berlin Meetup. I also helped an experiment of its compiler-side in Haskell, although it is still pending.

I didn't do much with JavaScript for side projects but wrote a tiny library for server-side rendering with tagged template literals while hanging out with friends at a cafe. It's used in the choo ecosystem now.

Aside from building things, I learned monad transformers, etc. from Haskell Book and machine learning with neural networks from Deep Learning Specialization on Coursera.

.then()

After all, I lived a year in a new country and enjoyed it. I have settled down, and now I feel prepared for new challenges next year. Let's see what is going to happen!