Wednesday, January 23, 2008

FIB Trie saga

For the next release of Vyatta, I wanted to enable the Trie algorithm for routing in our kernel. Since FIB Trie is compatible with the previous hash, I expected no change. Well the day after enabling it caused an immediate failure in the regression test. The regression test plays back a full BGP input stream into the router and polls for the result. The problem was that the Trie to a long time, ... a really long time, to dump the routes. For the full 163395 routes, the dump was taking 20 seconds vs 1/2 sec for FIB_HASH.

As expected the problem turned out to be an N^2 lookup. The code was basically:
walk tree to find a route, and put it in buffer; if buffer is full, then give up, then go
back and walk to the last location. Since the trie has nice fast lookup function the change to just record the last route dumped (rather than offset), then use the lookup to find the location. This dropped the dump down to under 3/10 sec.

Finding this took several days, mostly because of looking at the profile; see "nextleaf" is the hotspot, so let's look at that. The real breakthrough came when I realized there were other operations that were walking the tree, like collecting stats, but they were fast. The next diversion was figuring out all the other suboptimal behaviour (ip route flush calls fflush for each route), which although slow weren't the real issue.