lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <01d4264c-e54a-4b21-9dbf-6e31ff6c782f@gmail.com>
Date: Mon, 8 Apr 2024 16:06:25 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, dsahern@...nel.org,
 pabeni@...hat.com, kuba@...nel.org
Subject: Re: [PATCH net-next v3] net: enable SOCK_NOSPACE for UDP

On 4/8/24 15:31, Eric Dumazet wrote:
> On Mon, Apr 8, 2024 at 4:16 PM Pavel Begunkov <asml.silence@...il.com> wrote:
>>
>> wake_up_poll() and variants can be expensive even if they don't actually
>> wake anything up as it involves disabling irqs, taking a spinlock and
>> walking through the poll list, which is fraught with cache bounces.
>> That might happen when someone waits for POLLOUT or even POLLIN as the
>> waitqueue is shared, even though we should be able to skip these
>> false positive calls when the tx queue is not full.
>>
>> Add support for SOCK_NOSPACE for UDP sockets. The udp_poll() change is
>> straightforward and repeats after tcp_poll() and others. In sock_wfree()
>> it's done as an optional feature because it requires support from the
>> poll handlers, however there are users of sock_wfree() that might be
>> unprepared to that.
>>
>> Note, it optimises the sock_wfree() path but not sock_def_write_space().
>> That's fine because it leads to more false positive wake ups, which is
>> tolerable and not performance critical.
>>
>> It wins +5% to throughput testing with a CPU bound tx only io_uring
>> based benchmark and showed 0.5-3% in more realistic workloads.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
>> ---
>>
>> v3: fix a race in udp_poll() (Eric)
>>      clear SOCK_NOSPACE in sock_wfree()
>>
>> v2: implement it in sock_wfree instead of adding a UDP specific
>>      free callback.
>>
>>   include/net/sock.h |  1 +
>>   net/core/sock.c    |  9 +++++++++
>>   net/ipv4/udp.c     | 15 ++++++++++++++-
>>   3 files changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 2253eefe2848..027a398471c4 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -944,6 +944,7 @@ enum sock_flags {
>>          SOCK_XDP, /* XDP is attached */
>>          SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
>>          SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
>> +       SOCK_NOSPACE_SUPPORTED, /* socket supports the SOCK_NOSPACE flag */
>>   };
>>
>>   #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 5ed411231fc7..ae7446570726 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3393,6 +3393,15 @@ static void sock_def_write_space_wfree(struct sock *sk)
>>
>>                  /* rely on refcount_sub from sock_wfree() */
>>                  smp_mb__after_atomic();
>> +
>> +               if (sock_flag(sk, SOCK_NOSPACE_SUPPORTED)) {
>> +                       struct socket *sock = sk->sk_socket;
> 
> It seems sk->sk_socket could be NULL, according to similar helpers
> like sk_stream_write_space()
> 
> udp_lib_close() -> sk_common_release() -> sock_orphan() ...

Yeah, thanks. So sk_socket stays alive even after it's removed,
makes sense, but I wonder why there is no READ_ONCE in likes of
sk_stream_write_space() as seems sk_socket can just randomly
change?

-- 
Pavel Begunkov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ