View; MT.commentIds = []; Core dumping from production processes - ChaseVenters.org

Core dumping from production processes

| No TrackBacks

One of the challenges in writing reliable server software is determining how to look for errors without having a big impact on performance, while recovering gracefully from any such failure.

You can use the GCC builtin __builtin_expect to inform the compiler that a certain test is expected to be false most of the time. In doing so, GCC (hopefully) lays out machine code such that the CPU's branch predictor doesn't trip over your assertion and cause a pipeline stall every time it is hit. The Linux kernel does something like this:

#define unlikely(x) (__builtin_expect(!!(x), 0))

if (unlikely(err > 0))
    /* handle error */

Note that the !! is actually two ! operators, which forces the value of the (x) test to be a boolean and avoids a possible warning in certain usages of the unlikely() macro.

So we can define an assertion macro:

#define my_assert(x) \
    do { \
        if (unlikely(!(x))) \
            abort(); \
    } while(0)

This way, if the assertion fails, we call abort() which sends SIGABRT to the process and terminates dumping core. But what if this is a production process? An assertion failure would bring down the system! If the assertion failure could be caused by specially crafted user input, the bug is downgraded to a denial of service attack, which is still a security problem. So the temptation would be to structure your code such that an assertion failure in production mode results in simply aborting the current operation. This way, you don't go forward to undefined results, and you don't dump core and lose the process. You can emit a log message notifying that the error has been encountered.

That said, core dumps can be very helpful in finding a problem. But it's not that easy: if a user is running the software in full production, they might not be keen on the idea of running a version of the software that will core dump on the assertion failure. If you can't reproduce the problem locally, tracking down the source of the problem can be very difficult.

You want a core dump, but the user doesn't want their process to go down. I have an idea for what you might do in this situation. First we arrange an atomic counter in process-global memory:

pthread_mutex_t abort_lk = PTHREAD_MUTEX_INITIALIZER;
int nr_aborts;

Now we implement both logic paths (abort the request and abort the process):

pid_t pid;
int do_abort = 0;

/* assertion: make sure ptr is not null */
if (unlikely(ptr == 0)) {
    pthread_mutex_lock(&abort_lk);
    if (nr_aborts < 5) {
        nr_aborts++;
        do_abort = 1;
    }
    pthread_mutex_unlock(&abort_lk);

    if (do_abort) {
        pid = fork();
        if (pid <= 0)
            goto error;
        nice(19);
        chdir("/somewhere/for/core/dumps");
        abort();
    }

    goto error;
}

Upon assertion failure, we fork a child process, which immediately sends itself SIGABRT. We create a ceiling on the number of times we'll core dump (the nr_aborts counter) so that we don't allow a mischevious attacker (or some other bug) to cause the disk to fill with cores. We also choose the lowest priority for our sacrificial process to attempt to minimize interruption to the system.

The parent simply aborts the current operation and emits a log message, and a core dump is generated containing roughly the same state the process was in when the assertion failure happened. The Linux kernel (and probably others) even help us out here: when you call fork(), you get a child process and the VM is initially shared with the parent but marked copy-on-write. As the parent continues to execute, it will touch pages in the VM causing the OS to have to do some duplication of pages (interrupting the parent thread of execution each time a new page is touched), but much of the memory will be shared as it is being written to disk by the child.

Potential gotchas I can think of come in two flavors: multi-threading and the OOM killer. A pthreads process that calls fork() will only fork the calling thread (and the whole VM, file descriptors, etc. of the process as well). This isn't necessarily that big of a problem: the state you are most likely interested in exists within the thread where the assertion failed anyway, making visibility into the other threads of the process less important.

The OOM killer issue, however, is more concerning. This method of fork() and abort() makes use of copy-on-write, but the process dirtying the shared pages the most is going to be the parent (which hopes to continue executing just as before). If the operating system uses VM overcommit, it might run out of pages when it needs to create a copy. When this situation happens, if the operating system cannot release any of its own memory, and cannot otherwise swap, some (seemingly arbitrary) process must die immediately. This logic is not specified as part of any contract to user-space; different versions of the operating system are liable to do different things. One would hope the parent process doesn't take the SIGKILL.

No TrackBacks

TrackBack URL: http://www.chaseventers.org/cgi-bin/mt/mt-tb.cgi/19

The data is modified by the paginate script