View; MT.commentIds = []; libsoup Ignores DNS TTLs - ChaseVenters.org

libsoup Ignores DNS TTLs

| No TrackBacks

I've been using libsoup to run a small SOAP engine for one of the back-office programs I maintain. We've recently upgraded to a new load-balanced architecture, and we are using DNS-based load balancing to fan these SOAP requests out across our servers.

It only took a few days in production to realize that libsoup was doing something nasty. Prior to any HTTP request, you need to create a SoupSession object. This object manages things like connection pools / keepalive. It contains a GHashTable called hosts, which it uses as a cache of connections to a given hostname.

/* Requires host_lock to be locked */
static SoupSessionHost *
get_host_for_uri (SoupSession *session, SoupURI *uri)
{
    SoupSessionPrivate *priv = SOUP_SESSION_GET_PRIVATE (session);
    SoupSessionHost *host;

    host = g_hash_table_lookup (priv->hosts, uri);
    if (host)
        return host;

    host = soup_session_host_new (session, uri);
    g_hash_table_insert (priv->hosts, host->uri, host);

    return host;
}

Unfortunately, entries in this hash table are never removed or expired unless the SoupSession object itself goes away. This sucks for a few reasons:

  1. DNS TTL values are ignored. Instead, the result of the DNS query is cached forever. Obviously this means that if the record is ever changed, libsoup clients need to be restarted to know about it.
  2. DNS load balancing is broken by libsoup, which will repeatedly connect to the same IP address regardless of whether multiple IPs are included in the response to an A query.
  3. You really wouldn't want to write a robot or some other long lived program that would make lots of connections to lots of different hosts using libsoup, as it stands. Aside from the obvious correctness issues listed above, the hosts hash table will experience unbounded growth. Thankfully all of our connections are to the same small set of URLs and hostnames.

I'm not sure how easy it would be to patch libsoup to behave correctly. As far as I can tell the GResolver that libsoup relies on doesn't even report TTLs.

Given the nature of this bug I can only see a few workarounds:

  1. Set the Host HTTP header yourself, do the DNS query yourself using GResolver, and supply the server's IP address to the SoupURI instead of a hostname. This breaks SSL certificate validation.
  2. Recycle/create the SoupSession per-request. This breaks keepalive/connection pooling and has obvious overhead issues.

Given the nature of how I'm using libsoup, I chose the latter option. YMMV.

No TrackBacks

TrackBack URL: http://www.chaseventers.org/cgi-bin/mt/mt-tb.cgi/40

The data is modified by the paginate script