netdev - Re: [PATCH net-next] net: Limit logical shift left of TCP probe0 timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <43C8F4BF-CF95-4F21-B9C7-E344D2666C70@linux.alibaba.com>
Date:   Tue, 15 Dec 2020 21:29:10 +0800
From:   Cambda Zhu <cambda@...ux.alibaba.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Jakub Kicinski <kuba@...nel.org>, netdev <netdev@...r.kernel.org>,
        Dust Li <dust.li@...ux.alibaba.com>,
        Tony Lu <tonylu@...ux.alibaba.com>
Subject: Re: [PATCH net-next] net: Limit logical shift left of TCP probe0
 timeout


> On Dec 15, 2020, at 19:06, Eric Dumazet <edumazet@...gle.com> wrote:
> 
> On Tue, Dec 15, 2020 at 3:08 AM Jakub Kicinski <kuba@...nel.org> wrote:
>> 
>> On Sun, 13 Dec 2020 21:59:45 +0800 Cambda Zhu wrote:
>>>> On Dec 13, 2020, at 06:32, Jakub Kicinski <kuba@...nel.org> wrote:
>>>> On Tue,  8 Dec 2020 17:19:10 +0800 Cambda Zhu wrote:
>>>>> For each TCP zero window probe, the icsk_backoff is increased by one and
>>>>> its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
>>>>> probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
>>>>> shift count would be masked to range 0 to 63. And on ARMv7 the result is
>>>>> zero. If the shift count is masked, only several probes will be sent
>>>>> with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
>>>>> needs tcp_retries2 times probes to end this false timeout. Besides,
>>>>> bitwise shift greater than or equal to the width is an undefined
>>>>> behavior.
>>>> 
>>>> If icsk_backoff can reach 64, can it not also reach 256 and wrap?
>>> 
>>> If tcp_retries2 is set greater than 255, it can be wrapped. But for TCP probe0,
>>> it seems to be not a serious problem. The timeout will be icsk_rto and backoff
>>> again. And considering icsk_backoff is 8 bits, not only it may always be lesser
>>> than tcp_retries2, but also may always be lesser than tcp_orphan_retries. And
>>> the icsk_probes_out is 8 bits too. So if the max_probes is greater than 255,
>>> the connection won’t abort even if it’s an orphan sock in some cases.
>>> 
>>> We can change the type of icsk_backoff/icsk_probes_out to fix these problems.
>>> But I think maybe the retries greater than 255 have no sense indeed and the RFC
>>> only requires the timeout(R2) greater than 100s at least. Could it be better to
>>> limit the min/max ranges of their sysctls?
>> 
>> All right, I think the patch is good as is, applied for 5.11, thank you!
> 
> It looks like we can remove the (u64) casts then.
> 
> Also if we _really_ care about icsk_backoff approaching 63, we also
> need to change inet_csk_rto_backoff() ?

Yes, we need. But the socket can close after tcp_orphan_retries times probes even if alive is
always true. And there’re something I’m not very clear yet:
1) inet_csk_rto_backoff() may be not only for TCP, and the RFC 6298 requires the max value
   of RTO is 60 seconds at least. So what’s the proper shift limit?
2) If max_probes is greater than 255, the icsk_probes_out cannot be greater than max_probes
   and the connection may not close forever. This looks more serious.

> 
> Was your patch based on a real world use, or some fuzzer UBSAN report ?
> 

I found this issue not on TCP. I’m developing a private protocol, and in this protocol I made
something like probe0 with max RTO lesser than 120 seconds. I found similar issues on testing
and found Linux TCP have same issues. So it’s not a real world use for TCP and it may be ok to
ignore the issues.

> diff --git a/include/net/inet_connection_sock.h
> b/include/net/inet_connection_sock.h
> index 7338b3865a2a3d278dc27c0167bba1b966bbda9f..a2a145e3b062c0230935c293fc1900df095937d4
> 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -242,9 +242,10 @@ static inline unsigned long
> inet_csk_rto_backoff(const struct inet_connection_sock *icsk,
>                     unsigned long max_when)
> {
> -        u64 when = (u64)icsk->icsk_rto << icsk->icsk_backoff;
> +       u8 backoff = min_t(u8, 32U, icsk->icsk_backoff);
> +       u64 when = (u64)icsk->icsk_rto << backoff;
> 
> -        return (unsigned long)min_t(u64, when, max_when);
> +       return (unsigned long)min_t(u64, when, max_when);
> }
> 
> struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern);
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 78d13c88720fda50e3f1880ac741cea1985ef3e9..fc6e4d40fd94a717d24ebd8aef7f7930a4551fe9
> 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1328,9 +1328,8 @@ static inline unsigned long
> tcp_probe0_when(const struct sock *sk,
> {
>        u8 backoff = min_t(u8, ilog2(TCP_RTO_MAX / TCP_RTO_MIN) + 1,
>                           inet_csk(sk)->icsk_backoff);
> -       u64 when = (u64)tcp_probe0_base(sk) << backoff;
> 
> -       return (unsigned long)min_t(u64, when, max_when);
> +       return min(tcp_probe0_base(sk) << backoff, max_when);
> }
> 
> static inline void tcp_check_probe_timer(struct sock *sk)