donatas abraitis

The enemy of firewalls: TCP Fast Open

When I talked in SRECon16Europe last year with one engineer from Booking.com, he mentioned they are planning to launch TCP Fast Open in server farm under L3 load balancing (ECMP).

I asked if you are deploying TFO in production? The first answer was: “We don’t because we don’t know how to manage TFO key between application nodes”.

Then the question came in my mind: “Why not sync them using automation tools or do this periodically e.g. inserting timestamp inside the cookie?”.

Today I recall this case and wanted to try it to see how it behaves in reality. I won’t dig into abstractions how it works and how to enable it for clients and servers. This post is very useful if you need basic information how TFO works. I will describe other useful things they aren’t covered in the previously mentioned post.

Verify if TFO is working

The first tool to verify if it’s working is tcpdump:

With TFO:

07:03:45.442747 IP 1.1.1.1.41360 > 2.2.2.2.81: Flags [S], seq 3248579141:3248579160, win 29200, options [mss 1460,sackOK,TS val 0 ecr 0,nop,wscale 7,unknown-34 0x2cc5086de402d86a,nop,nop], length 19
07:03:45.442876 IP 2.2.2.2.81 > 1.1.1.1.41360: Flags [S.], seq 1989043609, ack 3248579161, win 28960, options [mss 1460,sackOK,TS val 37965533 ecr 0,nop,wscale 7], length 0
07:03:45.443034 IP 1.1.1.1.41360 > 2.2.2.2.81: Flags [.], ack 1, win 229, options [nop,nop,TS val 1277090187 ecr 37965533], length 0

Without TFO (regular 3WHS):

19:36:19.471384 IP 1.1.1.1.62248 > 2.2.2.2.81: Flags [S], seq 1237548774, win 65535, options [mss 1460,nop,wscale 5,nop,nop,TS val 932576322 ecr 0,sackOK,eol], length 0
19:36:19.471483 IP 2.2.2.2.81 > 1.1.1.1.62248: Flags [S.], seq 3862688090, ack 1237548775, win 28960, options [mss 1460,sackOK,TS val 83119562 ecr 932576322,nop,wscale 7], length 0
19:36:19.478781 IP 1.1.1.1.62248 > 2.2.2.2.81: Flags [.], ack 1, win 4117, options [nop,nop,TS val 932576329 ecr 83119562], length 0

You should notice unknown-34 0x2cc5086de402d86a part in the first output and length 19, which means, that SYN packets arrive with data (19 bytes) and the cookie for TFO is 0x2cc5086de402d86a. The second example shows regular 3-way handshake for TCP without TFO option. In my example, SYN-ACK doesn’t have any data, because I’m not using any application which response to requests.

TFO Queue Length

In this case, I installed HAProxy which has support for TFO but looks like there isn’t any simple way to monitor TFO queues. With stap, you are able to catch as usually almost everything.

probe kernel.statement("tcp_fastopen_queue_check@net/ipv4/tcp_fastopen.c:234")
{
        if ($fastopenq->qlen)
                printf("%d/%d\n", $fastopenq->qlen, $fastopenq->max_qlen);
}

Produced output:

4/2000
5/2000

A little few bits about how TCP options are handled by TCP/IP stack in Kernel

Let’s talk about this option, which is defined in RFC7413:


                                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                                   |      Kind     |    Length     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   ~                            Cookie                             ~
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Kind            1 byte: value = 34
   Length          1 byte: range 6 to 18 (bytes); limited by
                           remaining space in the options field.
                           The number MUST be even.
   Cookie          0, or 4 to 16 bytes (Length - 2)

I started to dig in net/ipv4/tcp_input.c to see how those TCP options are handled at all, because didn’t understand what Kind, Length mean.

The function I was interested is:

void tcp_parse_options(const struct sk_buff *skb,
                       struct tcp_options_received *opt_rx, int estab,
                       struct tcp_fastopen_cookie *foc)

Looking into the code isn’t so much informative without opening another ilustrated tab.

All TCP options are put instantly after TCP header, thus this ptr = (const unsigned char *)(th + 1); moves pointer where options start.

OK, we have the pointer to options, now we need to parse those options one-by-one. int opcode = *ptr++; leads to aforementioned Kind.

Later we have a loop iterating over all options by opsize = *ptr++; which is offset defined as Length.

And now we have Cookie. Iterate over the options until you find TCPOPT_FASTOPEN (this is what we saw in tcpdump unknown-34):

                        case TCPOPT_FASTOPEN:
                                tcp_parse_fastopen_option(
                                        opsize - TCPOLEN_FASTOPEN_BASE,
                                        ptr, th->syn, foc, false);
                                break;

This calls tcp_parse_fastopen_option appropriately and copies cookie value to tcp_fastopen_cookie struct for further processing, which is under net/ipv4/tcp_fastopen.c. Then everything is almost clear like defined in RFC:

struct sock *tcp_try_fastopen(struct sock *sk, struct sk_buff *skb,
                              struct request_sock *req,
                              struct tcp_fastopen_cookie *foc,
                              struct dst_entry *dst)

Ok, but why is TFO the enemy for firewalls?

Firewalls in the middle of connections could cause TFO stuck in transport - blackhole. This could happen for instance in such scenarios:

  • Firewalls drop SYN packets with payload;
  • Firewalls drop SYN/ACK packets with payload in 3WHS.

As a result, the connection will stale for a long time and client will hang up.

This is solvable by introducing tcp_fastopen_blackhole_timeout_sec sysctl which will grow the time exponentially to disable active TFO. It will decrease this time when the TFO is back to normal again.

Sum up

  • ip tcp_metrics show command is useful in some cases;
  • nstat is useful to see some TFO activity;
  • TFO is not so popular yet, but it’s acquiring velocity (e.g.: TLS 1.3 (False Start))