netdev - Re: [patch net v3] tcp: prohibit TCP_REPAIR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQy=uE68AWKuSddKEt3T2X=HUYzs0SQPX31+HgafuysJzkA@mail.gmail.com>
Date:   Wed, 2 Nov 2022 10:46:36 -0400
From:   Neal Cardwell <ncardwell@...gle.com>
To:     Lu Wei <luwei32@...wei.com>
Cc:     edumazet@...gle.com, davem@...emloft.net, yoshfuji@...ux-ipv6.org,
        dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com,
        xemul@...allels.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [patch net v3] tcp: prohibit TCP_REPAIR_OPTIONS if data was
 already sent

On Wed, Nov 2, 2022 at 8:23 AM Lu Wei <luwei32@...wei.com> wrote:
>
> If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
> of TCPOPT_SACK_PERM is called to enable sack after data is sent
> and before data is acked, ...

This "before data is acked" phrase does not quite seem to match the
sequence below, AFAICT?

How about something like:

 If setsockopt TCP_REPAIR_OPTIONS with opt_code TCPOPT_SACK_PERM
 is called to enable SACK after data is sent and the data sender receives a
 dupack, ...


> ... it will trigger a warning in function
> tcp_verify_left_out() as follows:
>
> ============================================
> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
> tcp_timeout_mark_lost+0x154/0x160
> tcp_enter_loss+0x2b/0x290
> tcp_retransmit_timer+0x50b/0x640
> tcp_write_timer_handler+0x1c8/0x340
> tcp_write_timer+0xe5/0x140
> call_timer_fn+0x3a/0x1b0
> __run_timers.part.0+0x1bf/0x2d0
> run_timer_softirq+0x43/0xb0
> __do_softirq+0xfd/0x373
> __irq_exit_rcu+0xf6/0x140
>
> The warning is caused in the following steps:
> 1. a socket named socketA is created
> 2. socketA enters repair mode without build a connection
> 3. socketA calls connect() and its state is changed to TCP_ESTABLISHED
>    directly
> 4. socketA leaves repair mode
> 5. socketA calls sendmsg() to send data, packets_out and sack_outs(dup
>    ack receives) increase
> 6. socketA enters repair mode again
> 7. socketA calls setsockopt with TCPOPT_SACK_PERM to enable sack
> 8. retransmit timer expires, it calls tcp_timeout_mark_lost(), lost_out
>    increases
> 9. sack_outs + lost_out > packets_out triggers since lost_out and
>    sack_outs increase repeatly
>
> In function tcp_timeout_mark_lost(), tp->sacked_out will be cleared if
> Step7 not happen and the warning will not be triggered. As suggested by
> Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was
> already sent. So this patch checks tp->segs_out, only TCP_REPAIR_OPTIONS
> can be set only if tp->segs_out is 0.
>
> socket-tcp tests in CRIU has been tested as follows:
> $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
>        --ignore-taint
>
> socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
>
> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
> Signed-off-by: Lu Wei <luwei32@...wei.com>
> ---
>  net/ipv4/tcp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index ef14efa1fb70..1f5cc32cf0cc 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>         case TCP_REPAIR_OPTIONS:
>                 if (!tp->repair)
>                         err = -EINVAL;
> -               else if (sk->sk_state == TCP_ESTABLISHED)
> +               else if (sk->sk_state == TCP_ESTABLISHED && !tp->segs_out)

The tp->segs_out field is only 32 bits wide. By my math, at 200
Gbit/sec with 1500 byte MTU it can wrap roughly every 260 secs. So a
caller could get unlucky or carefully sequence its call to
TCP_REPAIR_OPTIONS (based on packets sent so far) to mess up the
accounting and trigger the kernel warning.

How about using some other method to determine if this is safe?
Perhaps using tp->bytes_sent, which is a 64-bit field, which by my
math would take 23 years to wrap at 200 Gbit/sec?

If we're more paranoid about wrapping we could also check
tp->packets_out, and refuse to allow TCP_REPAIR_OPTIONS if either
tp->bytes_sent or tp->packets_out are non-zero. (Or if we're even more
paranoid I suppose we could have a special new bit to track whether
we've ever sent something, but that probably seems like overkill?)

neal