lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250324180056.41739-1-kuniyu@amazon.com>
Date: Mon, 24 Mar 2025 11:00:50 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <edumazet@...gle.com>
CC: <davem@...emloft.net>, <dsahern@...nel.org>, <horms@...nel.org>,
	<kuba@...nel.org>, <kuni1840@...il.com>, <kuniyu@...zon.com>,
	<netdev@...r.kernel.org>, <pabeni@...hat.com>,
	<willemdebruijn.kernel@...il.com>
Subject: Re: [PATCH v1 net 1/3] udp: Fix multiple wraparounds of sk->sk_rmem_alloc.

From: Eric Dumazet <edumazet@...gle.com>
Date: Mon, 24 Mar 2025 11:01:15 +0100
> On Mon, Mar 24, 2025 at 12:11 AM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> >
> > __udp_enqueue_schedule_skb() has the following condition:
> >
> >   if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> >           goto drop;
> >
> > sk->sk_rcvbuf is initialised by net.core.rmem_default and later can
> > be configured by SO_RCVBUF, which is limited by net.core.rmem_max,
> > or SO_RCVBUFFORCE.
> >
> > If we set INT_MAX to sk->sk_rcvbuf, the condition is always false
> > as sk->sk_rmem_alloc is also signed int.
> >
> > Then, the size of the incoming skb is added to sk->sk_rmem_alloc
> > unconditionally.
> >
> > This results in integer overflow (possibly multiple times) on
> > sk->sk_rmem_alloc and allows a single socket to have skb up to
> > net.core.udp_mem[1].
> >
> > For example, if we set a large value to udp_mem[1] and INT_MAX to
> > sk->sk_rcvbuf and flood packets to the socket, we can see multiple
> > overflows:
> >
> >   # cat /proc/net/sockstat | grep UDP:
> >   UDP: inuse 3 mem 7956736  <-- (7956736 << 12) bytes > INT_MAX * 15
> >                                              ^- PAGE_SHIFT
> >   # ss -uam
> >   State  Recv-Q      ...
> >   UNCONN -1757018048 ...    <-- flipping the sign repeatedly
> >          skmem:(r2537949248,rb2147483646,t0,tb212992,f1984,w0,o0,bl0,d0)
> >
> > Previously, we had a boundary check for INT_MAX, which was removed by
> > commit 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc").
> >
> > A complete fix would be to revert it and cap the right operand by
> > INT_MAX:
> >
> >   rmem = atomic_add_return(size, &sk->sk_rmem_alloc);
> >   if (rmem > min(size + (unsigned int)sk->sk_rcvbuf, INT_MAX))
> >           goto uncharge_drop;
> >
> > but we do not want to add the expensive atomic_add_return() back just
> > for the corner case.
> >
> > So, let's perform the first check as unsigned int to detect the
> > integer overflow.
> >
> > Note that we still allow a single wraparound, which can be observed
> > from userspace, but it's acceptable considering it's unlikely that
> > no recv() is called for a long period, and the negative value will
> > soon flip back to positive after a few recv() calls.
> >
> >   # cat /proc/net/sockstat | grep UDP:
> >   UDP: inuse 3 mem 524288  <-- (INT_MAX + 1) >> 12
> >
> >   # ss -uam
> >   State  Recv-Q      ...
> >   UNCONN -2147482816 ...   <-- INT_MAX + 831 bytes
> >          skmem:(r2147484480,rb2147483646,t0,tb212992,f3264,w0,o0,bl0,d14468947)
> >
> > Fixes: 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc")
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> > ---
> >  net/ipv4/udp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index a9bb9ce5438e..a1e60aab29b5 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1735,7 +1735,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
> >          */
> >         rmem = atomic_read(&sk->sk_rmem_alloc);
> >         rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> > -       if (rmem > rcvbuf)
> > +       if ((unsigned int)rmem > rcvbuf)
> 
> SGTM, but maybe make rmem and rcvbuf  'unsigned int ' to avoid casts ?

That's cleaner.  I'll add a small comment above the comparison
not to lose the boundary check by defining these two as int in
the future.


> 
> BTW piling 2GB worth of skbs in a single UDP receive queue means a
> latency spike when
> __skb_queue_purge(&sk->sk_receive_queue) is called, say from
> inet_sock_destruct(),
> which is a problem on its own.

Yes, we need to improve our application a lot :)

Thanks!

> 
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index db606f7e4163809d8220be1c1a4adb5662fc914e..575baac391e8af911fc1eff3f2d8e64bb9aa4c70
> 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1725,9 +1725,9 @@ static int udp_rmem_schedule(struct sock *sk, int size)
>  int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
>  {
>         struct sk_buff_head *list = &sk->sk_receive_queue;
> -       int rmem, err = -ENOMEM;
> +       unsigned int rmem, rcvbuf;
> +       int size, err = -ENOMEM;
>         spinlock_t *busy = NULL;
> -       int size, rcvbuf;
> 
>         /* Immediately drop when the receive queue is full.
>          * Always allow at least one packet.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ