netdev - Re: [PATCH net] tcp: use signed arithmetic in tcp_rtx_probe0_timed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQy=3o8MF3eZ-drh1EPbNLfiW183AkUAZwbg4N3S=1DQN_A@mail.gmail.com>
Date: Fri, 7 Jun 2024 11:11:57 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, eric.dumazet@...il.com, 
	Menglong Dong <imagedong@...cent.com>
Subject: Re: [PATCH net] tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()

On Fri, Jun 7, 2024 at 8:56 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> Due to timer wheel implementation, a timer will usually fire
> after its schedule.
>
> For instance, for HZ=1000, a timeout between 512ms and 4s
> has a granularity of 64ms.
> For this range of values, the extra delay could be up to 63ms.
>
> For TCP, this means that tp->rcv_tstamp may be after
> inet_csk(sk)->icsk_timeout whenever the timer interrupt
> finally triggers, if one packet came during the extra delay.
>
> We need to make sure tcp_rtx_probe0_timed_out() handles this case.
>
> Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Cc: Menglong Dong <imagedong@...cent.com>
> ---
>  net/ipv4/tcp_timer.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index 83fe7f62f7f10ab111512a3ef15a97a04c79cb4a..5bfd76a31af6da6473d306d95c296180141f54e0 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -485,8 +485,12 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk,
>  {
>         const struct tcp_sock *tp = tcp_sk(sk);
>         const int timeout = TCP_RTO_MAX * 2;
> -       u32 rcv_delta;
> +       s32 rcv_delta;
>
> +       /* Note: timer interrupt might have been delayed by at least one jiffy,
> +        * and tp->rcv_tstamp might very well have been written recently.
> +        * rcv_delta can thus be negative.
> +        */
>         rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
>         if (rcv_delta <= timeout)
>                 return false;

Nice catch!

Is this a sufficient fix? The icsk_timeout field is unsigned long and
rcv_tstamp is u32. So on 64-bit architectures icsk_timeout is u64 and
rcv_tstamp is u32. AFAICT it is not safe to subtract a u32 jiffies
timestamp from a u64 jiffies timestamp and expect to get an answer we
can use in this simple way (at least in general, after a few weeks of
uptime when the u32 jiffies value has wrapped and the u64 value has
not).

I wonder if we also need something like this for a complete fix:

- rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;
+ rcv_delta = (u32)inet_csk(sk)->icsk_timeout - tp->rcv_tstamp;

thanks,
neal