donatas abraitis

Measure TCP metrics LD_PRELOAD-ish way

Why LD_PRELOAD?

Because most of the services do not expose TCP_INFO about the socket natively. So there are two ways, one is harder, one is trivial. I will talk about the latter one.

LD_PRELOAD allows you to intercept syscalls, grab necessary data and return back to normal processing. In turn, LD_PRELOAD can be easily injected using environment variable LD_PRELOAD= (globally or per-service), linked statically or whatever you want.

Here is the basic example to interept accept() syscall, get TCP metrics and return to the real syscall:

/*
# gcc -shared -fPIC accept.c -o accept.so -ldl
# LD_PRELOAD=/usr/share/accept.so /usr/local/bin/service
*/

#define _GNU_SOURCE
#include <sys/syscall.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <stdio.h>
#include <stdint.h>
#include <dlfcn.h>
#include <netinet/tcp.h>

int (*orig_accept)(int sockfd, struct sockaddr *addr, socklen_t *addrlen);

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen)
{
        struct tcp_info tcp;
        int len = sizeof(info);
        char *logfile = "/var/log/pdns/rtt.log";
        FILE *fp;
        int fd;

        setbuf(stdout, NULL);

        orig_accept = dlsym(RTLD_NEXT, "accept");
        fd = orig_accept(sockfd, addr, addrlen);

        if (getsockopt(fd, IPPROTO_TCP, TCP_INFO, (void *)&tcp, (socklen_t *)&len) == -1)
                printf("error: getsockopt()\n");

        fp = fopen(logfile, "a+");
        if (fp == NULL)
                printf("error: cannot open %a\n", logfile);

        fprintf(fp, "%u\n", tcp.tcpi_rtt / 1000);
        fclose(fp);

        return fd;
}

To verify if it’s working or not just type:

# grep accept.so /proc/26663/maps
7fb375525000-7fb375526000 r-xp 00000000 08:01 50341026                   /usr/share/accept.so
7fb375526000-7fb375725000 ---p 00001000 08:01 50341026                   /usr/share/accept.so
7fb375725000-7fb375726000 r--p 00000000 08:01 50341026                   /usr/share/accept.so
7fb375726000-7fb375727000 rw-p 00001000 08:01 50341026                   /usr/share/accept.so

Playground

It’s always useful to measure performance before/after changes to see if it was worth your time. I used this technique to evaluate the effect before the switch to anycast-based DNS solution.

But wait, DNS works in UDP mostly ;-) Yes, it works unless data size exceeds 512 bytes of response. Hence I employed this hackish method to measure DNS using TCP protocol because with UDP it’s not possible due to its nature:

  • UDP doesn’t have 3WHS, which means it’s not possible to measure RTT;
  • It would be possible to measure this if clients would use SO_TIMESTAMP for sending data and compare at receiving side.
Results
# cat /var/log/pdns/rtt.log | python mmhistogram
Values min:0.00 avg:56.07 med=16.00 max:64000.00 dev:514.34 count:2912957
Values:
 value |-------------------------------------------------- count
     0 |                                                   1087
     1 |                                                   97
     2 |                                                   297
     4 |                                                   955
     8 |************************************************** 1503390
    16 |                    ****************************** 908123
    32 |                                               *** 110458
    64 |                                         ********* 287174
   128 |                                                 * 52383
   256 |                                                   28575
   512 |                                                   6807
  1024 |                                                   5729
  2048 |                                                   3534
  4096 |                                                   2191
  8192 |                                                   1430
 16384 |                                                   713
 32768 |                                                   14