lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a413b206-df50-4445-a4de-494339ea1ce6@linux.dev>
Date: Thu, 11 Jan 2024 22:20:06 -0800
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: Kuniyuki Iwashima <kuni1840@...il.com>, bpf@...r.kernel.org,
 netdev@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Andrii Nakryiko <andrii@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Yonghong Song <yonghong.song@...ux.dev>
Subject: Re: [PATCH v7 bpf-next 0/6] bpf: tcp: Support arbitrary SYN Cookie at
 TC.

On 12/20/23 5:28 PM, Kuniyuki Iwashima wrote:
> Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless
> for the connection request until a valid ACK is responded to the SYN+ACK.
> 
> The cookie contains two kinds of host-specific bits, a timestamp and
> secrets, so only can it be validated by the generator.  It means SYN
> Cookie consumes network resources between the client and the server;
> intermediate nodes must remember which nodes to route ACK for the cookie.
> 
> SYN Proxy reduces such unwanted resource allocation by handling 3WHS at
> the edge network.  After SYN Proxy completes 3WHS, it forwards SYN to the
> backend server and completes another 3WHS.  However, since the server's
> ISN differs from the cookie, the proxy must manage the ISN mappings and
> fix up SEQ/ACK numbers in every packet for each connection.  If a proxy
> node goes down, all the connections through it are terminated.  Keeping
> a state at proxy is painful from that perspective.
> 
> At AWS, we use a dirty hack to build truly stateless SYN Proxy at scale.
> Our SYN Proxy consists of the front proxy layer and the backend kernel
> module.  (See slides of LPC2023 [0], p37 - p48)
> 
> The cookie that SYN Proxy generates differs from the kernel's cookie in
> that it contains a secret (called rolling salt) (i) shared by all the proxy
> nodes so that any node can validate ACK and (ii) updated periodically so
> that old cookies cannot be validated and we need not encode a timestamp for
> the cookie.  Also, ISN contains WScale, SACK, and ECN, not in TS val.  This
> is not to sacrifice any connection quality, where some customers turn off
> TCP timestamps option due to retro CVE.
> 
> After 3WHS, the proxy restores SYN, encapsulates ACK into SYN, and forward
> the TCP-in-TCP packet to the backend server.  Our kernel module works at
> Netfilter input/output hooks and first feeds SYN to the TCP stack to
> initiate 3WHS.  When the module is triggered for SYN+ACK, it looks up the
> corresponding request socket and overwrites tcp_rsk(req)->snt_isn with the
> proxy's cookie.  Then, the module can complete 3WHS with the original ACK
> as is.
> 
> This way, our SYN Proxy does not manage the ISN mappings nor wait for
> SYN+ACK from the backend thus can remain stateless.  It's working very
> well for high-bandwidth services like multiple Tbps, but we are looking
> for a way to drop the dirty hack and further optimise the sequences.
> 
> If we could validate an arbitrary SYN Cookie on the backend server with
> BPF, the proxy would need not restore SYN nor pass it.  After validating
> ACK, the proxy node just needs to forward it, and then the server can do
> the lightweight validation (e.g. check if ACK came from proxy nodes, etc)
> and create a connection from the ACK.
> 
> This series allows us to create a full sk from an arbitrary SYN Cookie,
> which is done in 3 steps.
> 
>    1) At tc, BPF prog calls a new kfunc to create a reqsk and configure
>       it based on the argument populated from SYN Cookie.  The reqsk has
>       its listener as req->rsk_listener and is passed to the TCP stack as
>       skb->sk.
> 
>    2) During TCP socket lookup for the skb, skb_steal_sock() returns a
>       listener in the reuseport group that inet_reqsk(skb->sk)->rsk_listener
>       belongs to.
> 
>    3) In cookie_v[46]_check(), the reqsk (skb->sk) is fully initialised and
>       a full sk is created.
> 
> The kfunc usage is as follows:
> 
>      struct bpf_tcp_req_attrs attrs = {
>          .mss = mss,
>          .wscale_ok = wscale_ok,
>          .rcv_wscale = rcv_wscale, /* Server's WScale < 15 */
>          .snd_wscale = snd_wscale, /* Client's WScale < 15 */
>          .tstamp_ok = tstamp_ok,
>          .rcv_tsval = tsval,
>          .rcv_tsecr = tsecr, /* Server's Initial TSval */
>          .usec_ts_ok = usec_ts_ok,
>          .sack_ok = sack_ok,
>          .ecn_ok = ecn_ok,
>      }
> 
>      skc = bpf_skc_lookup_tcp(...);
>      sk = (struct sock *)bpf_skc_to_tcp_sock(skc);
>      bpf_sk_assign_tcp_reqsk(skb, sk, attrs, sizeof(attrs));
>      bpf_sk_release(skc);
> 
> [0]: https://lpc.events/event/17/contributions/1645/attachments/1350/2701/SYN_Proxy_at_Scale_with_BPF.pdf
> 
> 
> Changes:
>    v7:
>      * Patch 5 & 6
>        * Drop MPTCP support

I think Yonghong's (thanks!) cpuv4 patch 
(https://lore.kernel.org/bpf/20240110051348.2737007-1-yonghong.song@linux.dev/) 
has addressed the issue that the selftest in patch 6 has encountered.

There are some minor comments in v7. Please respin v8 when the cpuv4 patch has 
concluded so that it can kick off the CI also.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ