[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <455a75bb-f3ea-4cbf-830f-a443a7e0d71a@gmail.com>
Date: Fri, 5 Apr 2024 12:45:47 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, dsahern@...nel.org,
pabeni@...hat.com, kuba@...nel.org
Subject: Re: [PATCH net-next v2] net: enable SOCK_NOSPACE for UDP
On 4/5/24 10:11, Eric Dumazet wrote:
> On Fri, Apr 5, 2024 at 1:37 AM Pavel Begunkov <asml.silence@...il.com> wrote:
>>
>> wake_up_poll() and variants can be expensive even if they don't actually
>> wake anything up as it involves disabling irqs, taking a spinlock and
>> walking through the poll list, which is fraught with cache bounces.
>> That might happen when someone waits for POLLOUT or even POLLIN as the
>> waitqueue is shared, even though we should be able to skip these
>> false positive calls when the tx queue is not full.
>>
>> Add support for SOCK_NOSPACE for UDP sockets. The udp_poll() change is
>> straightforward and repeats after tcp_poll() and others. However, for
>> sock_wfree() it's done as an optional feature flagged by
>> SOCK_SUPPORT_NOSPACE, because the feature requires support from the
>> corresponding poll handler but there are many users of sock_wfree()
>> that might be not prepared.
>>
>> Note, it optimises the sock_wfree() path but not sock_def_write_space().
>> That's fine because it leads to more false positive wake ups, which is
>> tolerable and not performance critical.
>>
>> It wins +5% to throughput testing with a CPU bound tx only io_uring
>> based benchmark and showed 0.5-3% in more realistic workloads.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
>> ---
>> include/net/sock.h | 1 +
>> net/core/sock.c | 5 +++++
>> net/ipv4/udp.c | 15 ++++++++++++++-
>> 3 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 2253eefe2848..027a398471c4 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -944,6 +944,7 @@ enum sock_flags {
>> SOCK_XDP, /* XDP is attached */
>> SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
>> SOCK_RCVMARK, /* Receive SO_MARK ancillary data with packet */
>> + SOCK_NOSPACE_SUPPORTED, /* socket supports the SOCK_NOSPACE flag */
>> };
>>
>> #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 5ed411231fc7..e4f486e9296a 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3393,6 +3393,11 @@ static void sock_def_write_space_wfree(struct sock *sk)
>>
>> /* rely on refcount_sub from sock_wfree() */
>> smp_mb__after_atomic();
>> +
>> + if (sock_flag(sk, SOCK_NOSPACE_SUPPORTED) &&
>> + !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
>> + return;
>> +
>> if (wq && waitqueue_active(&wq->wait))
>> wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
>> EPOLLWRNORM | EPOLLWRBAND);
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 11460d751e73..309fa96e9020 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -342,6 +342,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
>> hslot2->count++;
>> spin_unlock(&hslot2->lock);
>> }
>> + sock_set_flag(sk, SOCK_NOSPACE_SUPPORTED);
>> sock_set_flag(sk, SOCK_RCU_FREE);
>> error = 0;
>> fail_unlock:
>> @@ -2885,8 +2886,20 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
>> /* psock ingress_msg queue should not contain any bad checksum frames */
>> if (sk_is_readable(sk))
>> mask |= EPOLLIN | EPOLLRDNORM;
>> - return mask;
>>
>> + if (!sock_writeable(sk)) {
>
> I think there is a race that you could avoid here by using
>
> if (!(mask & (EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND))) {
Seems like it, I'll add the change, thanks
> (We called datagram_poll() at the beginning of udp_poll())
--
Pavel Begunkov
Powered by blists - more mailing lists