netdev - Re: [PATCH net-next v2 3/3] netlink: Fix wraparound of sk->sk_rmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <93633df1-fa0c-49d8-b7e9-32ca2761e63f@redhat.com>
Date: Tue, 24 Jun 2025 09:55:15 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>, jbaron@...mai.com
Cc: davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
 kuniyu@...gle.com, netdev@...r.kernel.org,
 Kuniyuki Iwashima <kuni1840@...il.com>
Subject: Re: [PATCH net-next v2 3/3] netlink: Fix wraparound of
 sk->sk_rmem_alloc

On 6/24/25 1:35 AM, Jakub Kicinski wrote:
> On Wed, 18 Jun 2025 23:13:02 -0700 Kuniyuki Iwashima wrote:
>> From: Jason Baron <jbaron@...mai.com>
>> Date: Wed, 18 Jun 2025 19:13:23 -0400
>>> For netlink sockets, when comparing allocated rmem memory with the
>>> rcvbuf limit, the comparison is done using signed values. This means
>>> that if rcvbuf is near INT_MAX, then sk->sk_rmem_alloc may become
>>> negative in the comparison with rcvbuf which will yield incorrect
>>> results.
>>>
>>> This can be reproduced by using the program from SOCK_DIAG(7) with
>>> some slight modifications. First, setting sk->sk_rcvbuf to INT_MAX
>>> using SO_RCVBUFFORCE and then secondly running the "send_query()"
>>> in a loop while not calling "receive_responses()". In this case,
>>> the value of sk->sk_rmem_alloc will continuously wrap around
>>> and thus more memory is allocated than the sk->sk_rcvbuf limit.
>>> This will eventually fill all of memory leading to an out of memory
>>> condition with skbs filling up the slab.
>>>
>>> Let's fix this in a similar manner to:
>>> commit 5a465a0da13e ("udp: Fix multiple wraparounds of sk->sk_rmem_alloc.")
>>>
>>> As noted in that fix, if there are multiple threads writing to a
>>> netlink socket it's possible to slightly exceed rcvbuf value. But as
>>> noted this avoids an expensive 'atomic_add_return()' for the common
>>> case.  
>>
>> This was because UDP RX path is the fast path, but netlink isn't.
>> Also, it's common for UDP that multiple packets for the same socket
>> are processed concurrently, and 850cbaddb52d dropped lock_sock from
>> the path.
> 
> To be clear -- are you saying we should fix this differently?
> Or perhaps that the problem doesn't exist? The change doesn't
> seem very intrusive..

AFAICS the race is possible even with netlink as netlink_unicast() runs
without the socket lock, too.

The point is that for UDP the scenario with multiple threads enqueuing a
packet into the same socket is a critical path, optimizing for
performances and allowing some memory accounting inaccuracy makes sense.

For netlink socket, that scenario looks a patological one and I think we
should prefer accuracy instead of optimization.

Thanks,

Paolo