Sunday, September 28, 2008

Asynchronous DNS lookups on Linux with getaddrinfo_a()

A little while back I posted about a bug in the glibc getaddrinfo() implementation which resulted in many CNAME lookups having to be repeated. At that time I teased a future post on the topic of the little known getaddrinfo_a() function - here it is.

type "man getaddrinfo_a". I expect you will get nothing. That was the case for me. Linux is full of non-portable, under-documented, but very powerful interfaces and tools. The upside of these tools is great - I recently referred to netem and ifb which can be used for easy network shaping, and interfaces like tee(), splice() and epoll() are also hugely powerful, but woefully underutilized. I always get a thrill when I stumble across one of these.

Part of the reason for their low profile is portability. And there are times when that matters - though I think it is cited as a bedrock principle more than is really necessary. I think the larger reason is that some of these techniques lack the documentation, web pages, and references in programmer pop-culture necessary to be ubiquitously useful.

Maybe this post will help getaddrinfo_a find its mojo.

This little jewel is a standard part of libc, and has been for many years - you can be assured that it will be present in the runtime of any distribution of the last several years.

getaddrinfo_a() is an asynchronous interface to the DNS resolution routine - getaddrinfo(). Instead of sitting there blocked while getaddrinfo() does its thing, control is returned to your code immediately and your code is interrupted at a later time with the result when it is complete.

Most folks will realize that this is a common need when dealing with DNS resolution. It is a high latency operation and when processing log files, etc, you often have a need to do a lot of them at a time. The asynchronous interface lets you do them in parallel - other than the waiting-for-the-network time, there is very little CPU or even bandwidth overhead involved in a DNS lookup. As such, it is a perfect thing to do in parallel. You really do get linear scaling.

The best documentation seems to be in the design document from Ulrich Drepper. This closely reflects the reality of what was implemented. Adam Langley also has an excellent blog post with an illustration on how to use it. Actually, the header files are more or less enough info too, if you know that getaddrinfo_a() even exists in the first place.

The good news about the API is that you can submit addresses in big batches with one call.

The bad news about the API is that it offers callback either via POSIX signal handling, or by spawning a new thread and running a caller supplied function on it. My attitude is generally to avoid making signal handling a core part of any application, so that's right out. Having libraries spawn threads is also a little disconcerting, but the fact that that mechanism is used here for the callback is really minor compared to how many threads getaddrinfo_a() spawns internally.

I had assumed that the invocation thread would send a few dns packets out onto the wire and then spawn a single thread listening for and multiplexing the responses.. or maybe the listening thread would send out the requests as well and then multiplex the responses. But reading the code shows it actually creates a pretty sizable thread pool wherein each thread calls and blocks on getaddrinfo().

This is more or less the technique most folks roll together by hand, and it works ok - so it is certainly nice to have predone and ubiquitously available in libc rather than rolling it by hand. And it is ridiculous to code it yourself when you are already linking to a library that does it that way. But it seems to have some room for improvement internally in the future.. if that happens, its nice to know that at least the API for it is settled and upgrades should be seamless.

One last giant caveat - in libc 2.7 on 64 bit builds, getaddrinfo_a() appears to overflow the stack and crash immediately on just about any input. This is because the thread spawned internally is created with a 16KB stack which is not enough to initialize the name resolver when using 64 bit data types. Oy! The fix is easy, but be aware that some users may bump into this until fixed libcs are deployed.