Monday, February 28, 2011

net-snmp ip-forward table performance problems

Some performance problems are hard and complex, but others seem to be
due to just plain stupidity. This is the saga of SNMP daemon and a
full BGP route table. Way back in 2006, Vyatta discovered that if an
SNMP walk was done on a server with a full BGP route table, it would
peg the CPU and never complete. A full BGP route table is 500K entries
or more so it does a good job of exposing scalability nightmares. The
initial fix was to disable the caching of the route table in SNMP
which made it return no entries. Hardly a good fix, but returning
nothing is better than crashing.

I began investigating with the simple tools of packet capture with wireshark
and syscall capturing with strace. The first discovery was that each
request caused the TCP wrappers library to open and read
/etc/hosts.allow and /etc/hosts.deny. Bogus on two counts:

  1. Debian is shipping the 2 files with no real entries only comments.
    Each packet caused file to be read but there was really no data.
    It would have been better to have the file not exist and have the open fail.
  2. But for our distribution, there was no point in enabling
    TCP wrappers anyway.

The fix was simple to disable tcp-wrappers.

The net-snmp daemon retrieves the ipv4 and ipv6 routing table the old
school way through /proc. This isn't a total disaster but since the
route entries in /proc start with an interface name and net-snmp wants
an ifindex it looks up each entry. That is 300K extra ioctl
calls. Short term hack was to just cache last ifname -> ifindex
translation
; later I replaced it with a netlink route dump
which gives ifindex (surprisingly netlink route dump is already used
in another MIB).

Next observation was that it is stupid to use snmpwalk to walk
the whole system and instead use snmpbulk. This helps but still
the walk would not complete.

The real discovery was when looking at the net-snmp container
code. Internally, net-snmp uses an objectish abstraction to store
data, and the main ones are a flat table and a linked list. The table
is stored in sorted order for fast lookup and sequential access. New
entries are placed at the end of the table and a dirty bit is set for
next lookup. The problem is that each insert also does a lookup for
duplicates which causes a sort
. This makes inserts do quicksort for
each entry -- there is the scalability problem.

To make it more interesting net-snmp creates the route table
twice. First it reads table from /proc and puts entries in one table,
then walks that table to create the cache table used for lookup.

Loading the cache with non-scalable insertion takes several minutes on
a really fast machine, and the cache timeout is 30 seconds. This
ends up causing the CPU load because each request finds a dirty cache
and does a full reload.

Now for the good news, fixing the insert wasn't the hard. The first
step was realizing that the temporary table doesn't have to a table
container, instead it can be changed to a FIFO (linked list). The FIFO
container is O(1) on insert. The actual cache container requires a
different approach. The table container has an unused flag to allow
duplicates in the table. Turning the ALLOW_DUPLICATES flag makes
inserts much faster because the table is not sorted until the first
request. These get the table load down to less than a 5 seconds
on fast machine.

Lastly a couple of other improvements help as well. When the
binary_table is expanded, the code would calloc a new area, copy the
old data and then free the original. This is much worse than just
using realloc which can usually in place expansion when table is
getting large. The sort function can be optimized to avoid calling the
comparison function, and using a faster insertion sort for small sub
sections. These get the load down to less than a second.

Extra credit to the first developer who implements a new net-snmp
container using something better for big tables like AVL or B-tree.

Thursday, March 18, 2010

GTSM

I like seeing LWN writers pick up small patches and explain what they are why they are important. As a developer, often the impact of a change is not obvious and without further explanation significant changes go unnoticed. The recent story about Generalized TTL Security Measures in lwn.net is one such example.
But, when a story comes out, the writer should do research on the background. First, it is nice to give some credit to the author :-) and Vyatta, as well as also some history. I did this patch based on an enhancement request for the current Vyatta version. The starting point was a (unaccepted) patch to Quagga, and existing implementation for FreeBSD systems. It was one of those patches where the kernel change took less time than writing the test programs.

Also, the initial patch wasn't perfect since (nothing ever is), since it broke time wait sockets, and missed the case of ICMP messages. Both should be fixed by the time 2.6.34-rc2 comes out. Also, the necessary support has not been integrated into upstream Quagga (yet).

I appreciate the review and feedback from Eric, Andi, David, and Pekka for making this work.

Wednesday, November 11, 2009

Powerpoint® Karoke contest

Anyone in the Portland area interested in a fun and creative event is invited to the 1st Timbertalkers Powerpoint® Karoke contest on Tuesday 11/24 at noon.

Meeting location is: 9403-B SW Nimbus Ave., Beaverton, Oregon

If you have never done PPTK, here are the rules:

  • Topic is draw from set of 30 topics. Probably 10 to 15 slides
  • Speaker will have 2 to 3 minutes
  • Prizes awarded


In spirit of open source, it will really be a OpenOffice Impress contest, and the slides will be drawn from Creative Commons licensed decks.

Tuesday, October 27, 2009

Ubuntu 9.10 hates kernel developers?

Ubuntu has never been the easiest distribution to do kernel development, but it looks like with 9.10 it has made things too painful. I need to build and install kernels all the time, and usually just update grub menu manually. But now with grub 2 in Ubuntu 9.10 they have wrapped the grub menu in grub-mkconfig. Why?

It would be great if the system was setup so just doing 'make install' in the kernel source put in the kernel and updated the grub.cfg, but no that would make too much sense.

P.s: they managed to break the sky2 driver somehow, the connection won't come up and negotiates the wrong speed. It turned out not to be a kernel problem; wiring issue (speed), combined with some Network Manager changes

Tuesday, October 20, 2009

Japan Linux Symposium

I am giving three talks: 1) routing performance, 2) staging drivers, 3) Vyatta CLI.
So if you are attending JLS please stop by and give me support.

Thursday, September 17, 2009

Netconf / LinuxCon / Linux Plumber's Conference

It will be a busy week. The network developer's are getting together at Netconf over the weekend,
then LinuxCon followed by Linux Plumber's Conference. Hope the weather holds out, Portland has a tendency to rain when ever there is a big event.

Monday, July 20, 2009

Congratulations Microsoft

Nice. Microsoft has released the Hyper-V drivers as GPLv2. I know was a hard step for Microsoft to take, since it means acknowledging GPL and respecting the Linux community. The releasing of the drivers is good news for users, developers, and in the end Microsoft as well. Like most GPL related actions, a lot of work was done behind the scenes to get the offending company into compliance.

This saga started when one of the user's on the Vyatta forum inquired about supporting Hyper-V network driver in the Vyatta kernel. A little googling found the necessary drivers, but on closer examination there was a problem. The driver had both open-source components which were under GPL, and statically linked to several binary parts. The GPL does not permit mixing of closed and open source parts, so this was an obvious violation of the license. Rather than creating noise, my goal was to resolve the problem, so I turned to Greg Kroah-Hartman. Since Novell has a (too) close association with Microsoft, my expectation was that Greg could prod the right people to get the issue resolved.

It took longer than expected, but finally Microsoft decided to do the right thing and release the drivers.