Tuesday, February 17, 2009

Parallelizing netfilter

The Linux networking receive performance has been mostly single threaded until the advent of MSI-X and multiqueue receive hardware. Now with many cards, it is possible to be processing packets on multiple CPU's and cores at once. All this is great, and improves performance for the simple case.

But most users don't just use simple networking. They use useful features like netfilter to do firewalling, NAT, connection tracking and all other forms of wierd and wonderful things. The netfilter code has been tuned over the years, but there are still several hot locks in the receive path. Most of these are reader-writer locks which are actually the worst kind, much worse than a simple spin lock. The problem with locks on modern CPU's is that even for the uncontested case, a lock operation means a full-stop cache miss.

With the help of Eric Duzmet, Rick Jones, Martin Josefsson and others, it looks like there is a solution to most of these. I am excited to see how it all pans out but it could mean a big performance increase for any kind of netfilter packet intensive processing. Stay tuned.