[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250324181116.45359-1-kuniyu@amazon.com>
Date: Mon, 24 Mar 2025 11:10:45 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <willemdebruijn.kernel@...il.com>
CC: <davem@...emloft.net>, <dsahern@...nel.org>, <edumazet@...gle.com>,
<horms@...nel.org>, <kuba@...nel.org>, <kuni1840@...il.com>,
<kuniyu@...zon.com>, <netdev@...r.kernel.org>, <pabeni@...hat.com>
Subject: Re: [PATCH v1 net 1/3] udp: Fix multiple wraparounds of sk->sk_rmem_alloc.
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Date: Mon, 24 Mar 2025 10:59:49 -0400
> Kuniyuki Iwashima wrote:
> > __udp_enqueue_schedule_skb() has the following condition:
> >
> > if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> > goto drop;
> >
> > sk->sk_rcvbuf is initialised by net.core.rmem_default and later can
> > be configured by SO_RCVBUF, which is limited by net.core.rmem_max,
> > or SO_RCVBUFFORCE.
> >
> > If we set INT_MAX to sk->sk_rcvbuf, the condition is always false
> > as sk->sk_rmem_alloc is also signed int.
> >
> > Then, the size of the incoming skb is added to sk->sk_rmem_alloc
> > unconditionally.
> >
> > This results in integer overflow (possibly multiple times) on
> > sk->sk_rmem_alloc and allows a single socket to have skb up to
> > net.core.udp_mem[1].
> >
> > For example, if we set a large value to udp_mem[1] and INT_MAX to
> > sk->sk_rcvbuf and flood packets to the socket, we can see multiple
> > overflows:
> >
> > # cat /proc/net/sockstat | grep UDP:
> > UDP: inuse 3 mem 7956736 <-- (7956736 << 12) bytes > INT_MAX * 15
> > ^- PAGE_SHIFT
> > # ss -uam
> > State Recv-Q ...
> > UNCONN -1757018048 ... <-- flipping the sign repeatedly
> > skmem:(r2537949248,rb2147483646,t0,tb212992,f1984,w0,o0,bl0,d0)
> >
> > Previously, we had a boundary check for INT_MAX, which was removed by
> > commit 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc").
> >
> > A complete fix would be to revert it and cap the right operand by
> > INT_MAX:
> >
> > rmem = atomic_add_return(size, &sk->sk_rmem_alloc);
> > if (rmem > min(size + (unsigned int)sk->sk_rcvbuf, INT_MAX))
> > goto uncharge_drop;
> >
> > but we do not want to add the expensive atomic_add_return() back just
> > for the corner case.
> >
> > So, let's perform the first check as unsigned int to detect the
> > integer overflow.
> >
> > Note that we still allow a single wraparound, which can be observed
> > from userspace, but it's acceptable considering it's unlikely that
> > no recv() is called for a long period, and the negative value will
> > soon flip back to positive after a few recv() calls.
>
> Can we do better than this?
Another approach I had in mind was to restore the original validation
under the recvq lock but without atomic ops like
1. add another u32 as union of sk_rmem_alloc (only for UDP)
2. access it with READ_ONCE() or under the recvq lock
3. perform the validation under the lock
But it requires more changes around the error queue handling and
the general socket impl, so will be too invasive for net.git but
maybe worth a try for net-next ?
> Is this because of the "Always allow at least one packet" below, and
> due to testing the value of the counter without skb->truesize added?
Yes, that's the reason although we don't receive a single >INT_MAX
packet.
>
> /* Immediately drop when the receive queue is full.
> * Always allow at least one packet.
> */
> rmem = atomic_read(&sk->sk_rmem_alloc);
> rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> if (rmem > rcvbuf)
> goto drop;
Powered by blists - more mailing lists