netdev - Re: [patch net v3] tcp: prohibit TCP_REPAIR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 3 Nov 2022 10:11:22 +0800
From:   "luwei (O)" <luwei32@...wei.com>
To:     Neal Cardwell <ncardwell@...gle.com>
CC:     <edumazet@...gle.com>, <davem@...emloft.net>,
        <yoshfuji@...ux-ipv6.org>, <dsahern@...nel.org>, <kuba@...nel.org>,
        <pabeni@...hat.com>, <xemul@...allels.com>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [patch net v3] tcp: prohibit TCP_REPAIR_OPTIONS if data was
 already sent


在 2022/11/2 10:46 PM, Neal Cardwell 写道:
> On Wed, Nov 2, 2022 at 8:23 AM Lu Wei <luwei32@...wei.com> wrote:
>> If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
>> of TCPOPT_SACK_PERM is called to enable sack after data is sent
>> and before data is acked, ...
> This "before data is acked" phrase does not quite seem to match the
> sequence below, AFAICT?
>
> How about something like:
>
>   If setsockopt TCP_REPAIR_OPTIONS with opt_code TCPOPT_SACK_PERM
>   is called to enable SACK after data is sent and the data sender receives a
>   dupack, ...
      yes, thanks for suggestion
>
>
>> ... it will trigger a warning in function
>> tcp_verify_left_out() as follows:
>>
>> ============================================
>> WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
>> tcp_timeout_mark_lost+0x154/0x160
>> tcp_enter_loss+0x2b/0x290
>> tcp_retransmit_timer+0x50b/0x640
>> tcp_write_timer_handler+0x1c8/0x340
>> tcp_write_timer+0xe5/0x140
>> call_timer_fn+0x3a/0x1b0
>> __run_timers.part.0+0x1bf/0x2d0
>> run_timer_softirq+0x43/0xb0
>> __do_softirq+0xfd/0x373
>> __irq_exit_rcu+0xf6/0x140
>>
>> The warning is caused in the following steps:
>> 1. a socket named socketA is created
>> 2. socketA enters repair mode without build a connection
>> 3. socketA calls connect() and its state is changed to TCP_ESTABLISHED
>>     directly
>> 4. socketA leaves repair mode
>> 5. socketA calls sendmsg() to send data, packets_out and sack_outs(dup
>>     ack receives) increase
>> 6. socketA enters repair mode again
>> 7. socketA calls setsockopt with TCPOPT_SACK_PERM to enable sack
>> 8. retransmit timer expires, it calls tcp_timeout_mark_lost(), lost_out
>>     increases
>> 9. sack_outs + lost_out > packets_out triggers since lost_out and
>>     sack_outs increase repeatly
>>
>> In function tcp_timeout_mark_lost(), tp->sacked_out will be cleared if
>> Step7 not happen and the warning will not be triggered. As suggested by
>> Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was
>> already sent. So this patch checks tp->segs_out, only TCP_REPAIR_OPTIONS
>> can be set only if tp->segs_out is 0.
>>
>> socket-tcp tests in CRIU has been tested as follows:
>> $ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
>>         --ignore-taint
>>
>> socket-tcp* represent all socket-tcp tests in test/zdtm/static/.
>>
>> Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
>> Signed-off-by: Lu Wei <luwei32@...wei.com>
>> ---
>>   net/ipv4/tcp.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index ef14efa1fb70..1f5cc32cf0cc 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -3647,7 +3647,7 @@ int do_tcp_setsockopt(struct sock *sk, int level, int optname,
>>          case TCP_REPAIR_OPTIONS:
>>                  if (!tp->repair)
>>                          err = -EINVAL;
>> -               else if (sk->sk_state == TCP_ESTABLISHED)
>> +               else if (sk->sk_state == TCP_ESTABLISHED && !tp->segs_out)
> The tp->segs_out field is only 32 bits wide. By my math, at 200
> Gbit/sec with 1500 byte MTU it can wrap roughly every 260 secs. So a
> caller could get unlucky or carefully sequence its call to
> TCP_REPAIR_OPTIONS (based on packets sent so far) to mess up the
> accounting and trigger the kernel warning.
>
> How about using some other method to determine if this is safe?
> Perhaps using tp->bytes_sent, which is a 64-bit field, which by my
> math would take 23 years to wrap at 200 Gbit/sec?
>
> If we're more paranoid about wrapping we could also check
> tp->packets_out, and refuse to allow TCP_REPAIR_OPTIONS if either
> tp->bytes_sent or tp->packets_out are non-zero. (Or if we're even more
> paranoid I suppose we could have a special new bit to track whether
> we've ever sent something, but that probably seems like overkill?)
>
> neal
> .

I didn't notice that u32 will be easily wrapped in huge network throughput,
thank you neal.

But tcp->packets_out shoud not be used because tp->packets_out can decrease
when expected ack is received, so it can decrease to 0 and this is the common
condition.

-- 
Best Regards,
Lu Wei