[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <74feda24-37fd-11ea-af0e-1eff9ed4941e@kernel.dk>
Date: Mon, 21 Nov 2022 16:59:26 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Stefan Roesch <shr@...kernel.io>, kernel-team@...com
Cc: olivier@...llion01.com, netdev@...r.kernel.org,
io-uring@...r.kernel.org, kuba@...nel.org
Subject: Re: [PATCH v5 1/3] io_uring: add napi busy polling support
On 11/21/22 12:45?PM, Jens Axboe wrote:
> On 11/21/22 12:14?PM, Stefan Roesch wrote:
>> +/*
>> + * io_napi_add() - Add napi id to the busy poll list
>> + * @file: file pointer for socket
>> + * @ctx: io-uring context
>> + *
>> + * Add the napi id of the socket to the napi busy poll list.
>> + */
>> +void io_napi_add(struct file *file, struct io_ring_ctx *ctx)
>> +{
>> + unsigned int napi_id;
>> + struct socket *sock;
>> + struct sock *sk;
>> + struct io_napi_entry *ne;
>> +
>> + if (!io_napi_busy_loop_on(ctx))
>> + return;
>> +
>> + sock = sock_from_file(file);
>> + if (!sock)
>> + return;
>> +
>> + sk = sock->sk;
>> + if (!sk)
>> + return;
>> +
>> + napi_id = READ_ONCE(sk->sk_napi_id);
>> +
>> + /* Non-NAPI IDs can be rejected */
>> + if (napi_id < MIN_NAPI_ID)
>> + return;
>> +
>> + spin_lock(&ctx->napi_lock);
>> + list_for_each_entry(ne, &ctx->napi_list, list) {
>> + if (ne->napi_id == napi_id) {
>> + ne->timeout = jiffies + NAPI_TIMEOUT;
>> + goto out;
>> + }
>> + }
>> +
>> + ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
>> + if (!ne)
>> + goto out;
>> +
>> + ne->napi_id = napi_id;
>> + ne->timeout = jiffies + NAPI_TIMEOUT;
>> + list_add_tail(&ne->list, &ctx->napi_list);
>> +
>> +out:
>> + spin_unlock(&ctx->napi_lock);
>> +}
>
> I think this all looks good now, just one minor comment on the above. Is
> the expectation here that we'll basically always add to the napi list?
> If so, then I think allocating 'ne' outside the spinlock would be a lot
> saner, and then just kfree() it for the unlikely case where we find a
> duplicate.
After thinking about this a bit more, I don't think this is done in the
most optimal fashion. If the list is longer than a few entries, this
check (or check-alloc-insert) is pretty expensive and it'll add
substantial overhead to the poll path for sockets if napi is enabled.
I think we should do something ala:
1) When arming poll AND napi has been enabled for the ring, then
alloc io_napi_entry upfront and store it in ->async_data.
2) Maintain the state in the io_napi_entry. If we're on the list,
that can be checked with just list_empty(), for example. If not
on the list, assign timeout and add.
3) Have regular request cleanup free it.
This could be combined with an alloc cache, I would not do that for the
first iteration though.
This would make io_napi_add() cheap - no more list iteration, and no
more allocations. And that is arguably the most important part, as that
is called everytime the poll is woken up. Particularly for multishot
that makes a big difference.
It's also designed much better imho, moving the more expensive bits to
the setup side.
--
Jens Axboe
Powered by blists - more mailing lists