www.doublefourteen.io/content/blog/bgpgrep-performance-facts.md

6.2 KiB

title mobile_menu_title date description series categories tags news_keywords
Few performance facts about bgpgrep bgpgrep performance facts 2021-10-19 After a complete codebase rewrite, some tweaks, and a lot of new features, it is time to look back and enjoy some benchmarking between bgpgrep and bgpscanner. Let's see how much things have improved and why.
ubgpsuite - The Micro BGP Suite
benchmarks
development
ubgpsuite
bgpscanner
bgpgrep
Networking
BGP
benchmarks
ubgpsuite
bgpscanner
bgpgrep
benchmarks

Few performance facts about bgpgrep

If you are a performance junkie like me, the first question that probably pops in your mind after a major code rewrite is something like:

Is it faster than before?

Let's now satisfy this curiosity (of mine) with some benchmarking.

Benchmark environment

  • Processor: Intel© Core™ i7-8565U at 1.80GHz (4 cores physical, 8 cores with hyperthreading)
  • Cache layout:
    • L1 data cache: 128 KiB (4 instances)
    • L1 instructions cache: 128 KiB (4 instances)
    • L2 cache: 1 MiB (4 instances)
    • L3 cache: 8 MiB (1 instance)
  • Memory: 16 GB RAM DDR4, in two 8GB banks
  • Hard disk: SAMSUNG MZALQ512HALU-000L1
  • Kernel: Linux 5.10.62-1-lts SMP x86_64 GNU/Linux

To avoid adultering results we also:

  • disable cron and any other background file indexing service;
  • force performance CPU profile, disabling powersave mode;
  • disable Linux address space layout randomization for the duration of our tests;
  • increase kernel performance events sample rate;
  • drop filesystem caches and clean any temporary file;
  • run benchmarks in console mode, outside any desktop environment.

Both bgpscanner and bgpgrep have been compiled in release mode with full optimizations, as documented in their official build instructions, using clang version 12.0.1. For reference, we also do a benchmark run with bgpdump, version 1.6.2, as available from Arch Linux User Repositories (AUR).

Results are calculated by averaging five runs of each command, immediately after one warmup round. MRT data is decompressed upfront, to avoid accounting for decompression overhead, the output is sent directly to /dev/null, to avoid any disk write overhead.

The show's on

We take the data for the first benchmark from RouteViews' Sydney Route Collector, and pull the very first RIB of December 2020, along with any subsequent updates from the same month. This gives us 47.1GB uncompressed MRT data to work with.

We then run our benchmarks with the following commands:

bgpgrep sydney/2020-12/uncompressed.mrt >/dev/null
bgpscanner sydney/2020-12/uncompressed.mrt >/dev/null
bgpdump -mv sydney/2020-12/uncompressed.mrt >/dev/null
Average (sec) Best (sec) Worst (sec) Memory (KiB)
bgpgrep 404.45 401.62 411.38 2076
bgpscanner 453.59 451.93 455.13 2448
bgpdump 2053.73 2037.19 2082.22 2316

bgpgrep is 11% faster than bgpscanner, which is good. Since this benchmark operates mostly on MRT update dumps, let's try the same on a different dataset, mostly made of RIBs. We pull nine RIBs from RIPE RIS NCC [RRC00 Route Collector]](https://data.ris.ripe.net/rrc00/2019.12/), and obtain 25.7GB worth of uncompressed MRT data. This time the benchmark is limited to bgpgrep and bgpscanner.

Executed commands and results:

bgpgrep rrc00/2019-12/rib-uncompressed.mrt >/dev/null
bgpscanner rrc00/2019-12/rib-uncompressed.mrt >/dev/null
Average (sec) Best (sec) Worst (sec) Memory (KiB)
bgpgrep 295.84 292.20 298.14 2112
bgpscanner 333.35 321.73 339.56 3016

The same trend is confirmed, bgpgrep is about 12% faster, indicating that the advantage was not data dependent.

Though, running our benchmarks under average system load might lead to an interesting surprise:

bgpgrep isolario/2021-07/rib-uncompressed.mrt >/dev/null
bgpscanner isolario/2021-07/rib-uncompressed.mrt >/dev/null
Average (sec) Best (sec) Worst (sec) Memory (KiB)
bgpgrep 344.90 342.88 347.03 2260
bgpscanner 411.39 405.13 412.70 2436

These runs have been performed under a regular GNOME desktop session, with other applications running. We used 60.8GB worth of MRT data from the Isolario project Dagobah Collector, from the month of July, 2021 (mostly RIBs). It might strike us that the performance gain now approaches 20%.

The reason might be a smarter use of memory, and the reduced chance of page faults. You might have noticed by our results that bgpgrep memory requirements are moderate compared to bgpscanner, what's less evident is that bgpgrep also keeps its data structures compact and doesn't like moving them around much. This lessens the page pressure on the system (and makes the CPU cache happier). The net effects of this aren't evident in the benchmarking environment, since bgpgrep and bgpscanner, in turns, are the only resource intensive tasks on the system. The initial warmup round contributes to their ideal performance. When more tasks are concurrently fighting over memory, and processes might get swapped to different cores for various reasons, invalidating their cache, the value of bgpgrep approach becomes more prominent.

Conclusion

bgpgrep seems to be a nice improvement over bgpscanner, and I am quite satisfied with the performance improvements. Especially when they come with a more solid codebase.

In the next few weeks I intend to improve the filtering engine. In general I'd like to stop for a bit to polish the codebase to make it more mature, before moving on to implement more features.

Like always, happy hacking to you all!

Lorenzo Cogotti