[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6724e69057445ab66d70f0b28c115e2d8fb5543b@linux.dev>
Date: Thu, 03 Jul 2025 12:03:33 +0000
From: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
To: "Eric Dumazet" <edumazet@...gle.com>
Cc: netdev@...r.kernel.org, mrpre@....com, "Neal Cardwell"
<ncardwell@...gle.com>, "Kuniyuki Iwashima" <kuniyu@...gle.com>, "David
S. Miller" <davem@...emloft.net>, "David Ahern" <dsahern@...nel.org>,
"Jakub Kicinski" <kuba@...nel.org>, "Paolo Abeni" <pabeni@...hat.com>,
"Simon Horman" <horms@...nel.org>, "David Howells" <dhowells@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v1] tcp: Correct signedness in skb remaining
space calculation
2025/7/2 23:34, "Eric Dumazet" <edumazet@...gle.com> 写到:
>
> On Wed, Jul 2, 2025 at 8:28 AM Jiayuan Chen <jiayuan.chen@...ux.dev> wrote:
>
> >
> > July 2, 2025 at 22:02, "Eric Dumazet" <edumazet@...gle.com> wrote:
> >
> > On Wed, Jul 2, 2025 at 6:59 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > >
> >
> > > On Wed, Jul 2, 2025 at 6:42 AM Jiayuan Chen <jiayuan.chen@...ux.dev> wrote:
> >
> > >
> >
> > > July 2, 2025 at 19:00, "Jiayuan Chen" <jiayuan.chen@...ux.dev> wrote:
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > The calculation for the remaining space, 'copy = size_goal - skb->len',
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > was prone to an integer promotion bug that prevented copy from ever being
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > negative.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > The variable types involved are:
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > copy: ssize_t (long)
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > size_goal: int
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > skb->len: unsigned int
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > Due to C's type promotion rules, the signed size_goal is converted to an
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > unsigned int to match skb->len before the subtraction. The result is an
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > unsigned int.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > When this unsigned int result is then assigned to the s64 copy variable,
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > it is zero-extended, preserving its non-negative value. Consequently,
> >
> > >
> >
> > > >
> >
> > >
> >
> > > > copy is always >= 0.
> >
> > >
> >
> > > >
> >
> > >
> >
> > > To better explain this problem, consider the following example:
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > #include <sys/types.h>
> >
> > >
> >
> > > #include <stdio.h>
> >
> > >
> >
> > > int size_goal = 536;
> >
> > >
> >
> > > unsigned int skblen = 1131;
> >
> > >
> >
> > > void main() {
> >
> > >
> >
> > > ssize_t copy = 0;
> >
> > >
> >
> > > copy = size_goal - skblen;
> >
> > >
> >
> > > printf("wrong: %zd\n", copy);
> >
> > >
> >
> > > copy = size_goal - (ssize_t)skblen;
> >
> > >
> >
> > > printf("correct: %zd\n", copy);
> >
> > >
> >
> > > return;
> >
> > >
> >
> > > }
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > Output:
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > wrong: 4294966701
> >
> > >
> >
> > > correct: -595
> >
> > >
> >
> > > '''
> >
> > >
> >
> > > Can you explain how one skb could have more bytes (skb->len) than size_goal ?
> >
> > >
> >
> > > If we are under this condition, we already have a prior bug ?
> >
> > >
> >
> > > Please describe how you caught this issue.
> >
> > >
> >
> > Also, not sure why copy variable had to be changed from "int" to "ssize_t"
> >
> > A nicer patch (without a cast) would be to make it an "int" again/
> >
> > I encountered this issue because I had tcp_repair enabled, which uses
> >
> > tcp_init_tso_segs to reset the MSS.
> >
> > However, it seems that tcp_bound_to_half_wnd also dynamically adjusts
> >
> > the value to be smaller than the current size_goal.
> >
>
> Okay, and what was the end result ?
>
> An skb has a limited amount of bytes that can be put into it
>
> (MAX_SKB_FRAGS * 32K) , and I can't see what are the effects of having
>
Hi Eric,
I'm working with a reproducer generated by syzkaller [1], and its core
logic is roughly as follows:
'''
setsockopt(fd, TCP_REPAIR, 1)
connect(fd);
setsockopt(fd, TCP_REPAIR, -1)
send(fd, small);
sendmmsg(fd, buffer_2G);
'''
First, because TCP_REPAIR is enabled, the send() operation leaves the skb
at the tail of the write_queue. Subsequently, sendmmsg is called to send
2GB of data.
Due to TCP_REPAIR, the size_goal is reduced, which can cause the copy
variable to become negative. However, because of integer promotion bug
mentioned in the previous email, this negative value is misinterpreted as
a large positive number. Ultimately, copy becomes a huge value, approaching
the int32 limit. This, in turn, causes sk->sk_forward_alloc to overflow,
which is the exact issue reported by syzkaller.
On a related note, even without using TCP_REPAIR, the tcp_bound_to_half_wnd()
function can also reduce size_goal on its own. Therefore, my understanding is
that under extreme conditions, we might still encounter an overflow in
sk->sk_forward_alloc.
So, I think we have good reason to change copy to an int.
Powered by blists - more mailing lists