[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iLcSxm0j3J6n1DAJZ9Atv3CUEuw3Yf5qh2CW+tfN70x6Q@mail.gmail.com>
Date: Mon, 8 Apr 2024 18:05:07 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Pavel Begunkov <asml.silence@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, dsahern@...nel.org,
pabeni@...hat.com, kuba@...nel.org
Subject: Re: [PATCH net-next v3] net: enable SOCK_NOSPACE for UDP
On Mon, Apr 8, 2024 at 5:06 PM Pavel Begunkov <asml.silence@...il.com> wrote:
>
> On 4/8/24 15:31, Eric Dumazet wrote:
> > On Mon, Apr 8, 2024 at 4:16 PM Pavel Begunkov <asml.silence@...il.com> wrote:
> >>
> >> wake_up_poll() and variants can be expensive even if they don't actually
> >> wake anything up as it involves disabling irqs, taking a spinlock and
> >> walking through the poll list, which is fraught with cache bounces.
> >> That might happen when someone waits for POLLOUT or even POLLIN as the
> >> waitqueue is shared, even though we should be able to skip these
> >> false positive calls when the tx queue is not full.
> >>
> >> Add support for SOCK_NOSPACE for UDP sockets. The udp_poll() change is
> >> straightforward and repeats after tcp_poll() and others. In sock_wfree()
> >> it's done as an optional feature because it requires support from the
> >> poll handlers, however there are users of sock_wfree() that might be
> >> unprepared to that.
> >>
> >> Note, it optimises the sock_wfree() path but not sock_def_write_space().
> >> That's fine because it leads to more false positive wake ups, which is
> >> tolerable and not performance critical.
> >>
> >> It wins +5% to throughput testing with a CPU bound tx only io_uring
> >> based benchmark and showed 0.5-3% in more realistic workloads.
> >>
> >> Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
> >> ---
> >>
> >> v3: fix a race in udp_poll() (Eric)
> >> clear SOCK_NOSPACE in sock_wfree()
> >>
> >> v2: implement it in sock_wfree instead of adding a UDP specific
> >> free callback.
> >>
> >> include/net/sock.h | 1 +
> >> net/core/sock.c | 9 +++++++++
> >> net/ipv4/udp.c | 15 ++++++++++++++-
> >> 3 files changed, 24 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/net/sock.h b/include/net/sock.h
> >> index 2253eefe2848..027a398471c4 100644
> >> --- a/include/net/sock.h
> >> +++ b/include/net/sock.h
> >> @@ -944,6 +944,7 @@ enum sock_flags {
> >> SOCK_XDP, /* XDP is attached */
> >> SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
> >> SOCK_RCVMARK, /* Receive SO_MARK ancillary data with packet */
> >> + SOCK_NOSPACE_SUPPORTED, /* socket supports the SOCK_NOSPACE flag */
> >> };
> >>
> >> #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
> >> diff --git a/net/core/sock.c b/net/core/sock.c
> >> index 5ed411231fc7..ae7446570726 100644
> >> --- a/net/core/sock.c
> >> +++ b/net/core/sock.c
> >> @@ -3393,6 +3393,15 @@ static void sock_def_write_space_wfree(struct sock *sk)
> >>
> >> /* rely on refcount_sub from sock_wfree() */
> >> smp_mb__after_atomic();
> >> +
> >> + if (sock_flag(sk, SOCK_NOSPACE_SUPPORTED)) {
> >> + struct socket *sock = sk->sk_socket;
> >
> > It seems sk->sk_socket could be NULL, according to similar helpers
> > like sk_stream_write_space()
> >
> > udp_lib_close() -> sk_common_release() -> sock_orphan() ...
>
> Yeah, thanks. So sk_socket stays alive even after it's removed,
> makes sense, but I wonder why there is no READ_ONCE in likes of
> sk_stream_write_space() as seems sk_socket can just randomly
> change?
sk_stream_write_space() is called with socket lock held (so far)
But for lockless UDP tx path, things are a bit different.
Powered by blists - more mailing lists