[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67e1b628df780_35010c2948d@willemb.c.googlers.com.notmuch>
Date: Mon, 24 Mar 2025 15:44:40 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>,
willemdebruijn.kernel@...il.com
Cc: davem@...emloft.net,
dsahern@...nel.org,
edumazet@...gle.com,
horms@...nel.org,
kuba@...nel.org,
kuni1840@...il.com,
kuniyu@...zon.com,
netdev@...r.kernel.org,
pabeni@...hat.com
Subject: Re: [PATCH v1 net 1/3] udp: Fix multiple wraparounds of
sk->sk_rmem_alloc.
Kuniyuki Iwashima wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
> Date: Mon, 24 Mar 2025 10:59:49 -0400
> > Kuniyuki Iwashima wrote:
> > > __udp_enqueue_schedule_skb() has the following condition:
> > >
> > > if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf)
> > > goto drop;
> > >
> > > sk->sk_rcvbuf is initialised by net.core.rmem_default and later can
> > > be configured by SO_RCVBUF, which is limited by net.core.rmem_max,
> > > or SO_RCVBUFFORCE.
> > >
> > > If we set INT_MAX to sk->sk_rcvbuf, the condition is always false
> > > as sk->sk_rmem_alloc is also signed int.
> > >
> > > Then, the size of the incoming skb is added to sk->sk_rmem_alloc
> > > unconditionally.
> > >
> > > This results in integer overflow (possibly multiple times) on
> > > sk->sk_rmem_alloc and allows a single socket to have skb up to
> > > net.core.udp_mem[1].
> > >
> > > For example, if we set a large value to udp_mem[1] and INT_MAX to
> > > sk->sk_rcvbuf and flood packets to the socket, we can see multiple
> > > overflows:
> > >
> > > # cat /proc/net/sockstat | grep UDP:
> > > UDP: inuse 3 mem 7956736 <-- (7956736 << 12) bytes > INT_MAX * 15
> > > ^- PAGE_SHIFT
> > > # ss -uam
> > > State Recv-Q ...
> > > UNCONN -1757018048 ... <-- flipping the sign repeatedly
> > > skmem:(r2537949248,rb2147483646,t0,tb212992,f1984,w0,o0,bl0,d0)
> > >
> > > Previously, we had a boundary check for INT_MAX, which was removed by
> > > commit 6a1f12dd85a8 ("udp: relax atomic operation on sk->sk_rmem_alloc").
> > >
> > > A complete fix would be to revert it and cap the right operand by
> > > INT_MAX:
> > >
> > > rmem = atomic_add_return(size, &sk->sk_rmem_alloc);
> > > if (rmem > min(size + (unsigned int)sk->sk_rcvbuf, INT_MAX))
> > > goto uncharge_drop;
> > >
> > > but we do not want to add the expensive atomic_add_return() back just
> > > for the corner case.
> > >
> > > So, let's perform the first check as unsigned int to detect the
> > > integer overflow.
> > >
> > > Note that we still allow a single wraparound, which can be observed
> > > from userspace, but it's acceptable considering it's unlikely that
> > > no recv() is called for a long period, and the negative value will
> > > soon flip back to positive after a few recv() calls.
> >
> > Can we do better than this?
>
> Another approach I had in mind was to restore the original validation
> under the recvq lock but without atomic ops like
>
> 1. add another u32 as union of sk_rmem_alloc (only for UDP)
> 2. access it with READ_ONCE() or under the recvq lock
> 3. perform the validation under the lock
>
> But it requires more changes around the error queue handling and
> the general socket impl, so will be too invasive for net.git but
> maybe worth a try for net-next ?
Definitely not net material. Adding more complexity here
would also need some convincing benchmark data probably.
>
> > Is this because of the "Always allow at least one packet" below, and
> > due to testing the value of the counter without skb->truesize added?
>
> Yes, that's the reason although we don't receive a single >INT_MAX
> packet.
I was surprised that we don't take the current skb size into
account when doing this calculation.
Turns out that this code used to do that.
commit 363dc73acacb ("udp: be less conservative with sock rmem
accounting") made this change:
- if (rmem && (rmem + size > sk->sk_rcvbuf))
+ if (rmem > sk->sk_rcvbuf)
goto drop;
The special consideration to allow one packet is to avoid starvation
with small rcvbuf, judging also from this review comment:
https://lore.kernel.org/netdev/1476938622.5650.111.camel@edumazet-glaptop3.roam.corp.google.com/
That clearly doesn't apply when rcvbuf is near INT_MAX.
Can we separate the tiny budget case and hard drop including the
skb->truesize for normal buffer sizes?
>
> >
> > /* Immediately drop when the receive queue is full.
> > * Always allow at least one packet.
> > */
> > rmem = atomic_read(&sk->sk_rmem_alloc);
> > rcvbuf = READ_ONCE(sk->sk_rcvbuf);
> > if (rmem > rcvbuf)
> > goto drop;
Powered by blists - more mailing lists