lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 21 Nov 2022 16:59:26 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     Stefan Roesch <shr@...kernel.io>, kernel-team@...com
Cc:     olivier@...llion01.com, netdev@...r.kernel.org,
        io-uring@...r.kernel.org, kuba@...nel.org
Subject: Re: [PATCH v5 1/3] io_uring: add napi busy polling support

On 11/21/22 12:45?PM, Jens Axboe wrote:
> On 11/21/22 12:14?PM, Stefan Roesch wrote:
>> +/*
>> + * io_napi_add() - Add napi id to the busy poll list
>> + * @file: file pointer for socket
>> + * @ctx:  io-uring context
>> + *
>> + * Add the napi id of the socket to the napi busy poll list.
>> + */
>> +void io_napi_add(struct file *file, struct io_ring_ctx *ctx)
>> +{
>> +	unsigned int napi_id;
>> +	struct socket *sock;
>> +	struct sock *sk;
>> +	struct io_napi_entry *ne;
>> +
>> +	if (!io_napi_busy_loop_on(ctx))
>> +		return;
>> +
>> +	sock = sock_from_file(file);
>> +	if (!sock)
>> +		return;
>> +
>> +	sk = sock->sk;
>> +	if (!sk)
>> +		return;
>> +
>> +	napi_id = READ_ONCE(sk->sk_napi_id);
>> +
>> +	/* Non-NAPI IDs can be rejected */
>> +	if (napi_id < MIN_NAPI_ID)
>> +		return;
>> +
>> +	spin_lock(&ctx->napi_lock);
>> +	list_for_each_entry(ne, &ctx->napi_list, list) {
>> +		if (ne->napi_id == napi_id) {
>> +			ne->timeout = jiffies + NAPI_TIMEOUT;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
>> +	if (!ne)
>> +		goto out;
>> +
>> +	ne->napi_id = napi_id;
>> +	ne->timeout = jiffies + NAPI_TIMEOUT;
>> +	list_add_tail(&ne->list, &ctx->napi_list);
>> +
>> +out:
>> +	spin_unlock(&ctx->napi_lock);
>> +}
> 
> I think this all looks good now, just one minor comment on the above. Is
> the expectation here that we'll basically always add to the napi list?
> If so, then I think allocating 'ne' outside the spinlock would be a lot
> saner, and then just kfree() it for the unlikely case where we find a
> duplicate.

After thinking about this a bit more, I don't think this is done in the
most optimal fashion. If the list is longer than a few entries, this
check (or check-alloc-insert) is pretty expensive and it'll add
substantial overhead to the poll path for sockets if napi is enabled.

I think we should do something ala:

1) When arming poll AND napi has been enabled for the ring, then
    alloc io_napi_entry upfront and store it in ->async_data.

2) Maintain the state in the io_napi_entry. If we're on the list,
    that can be checked with just list_empty(), for example. If not
    on the list, assign timeout and add.

3) Have regular request cleanup free it.

This could be combined with an alloc cache, I would not do that for the
first iteration though.

This would make io_napi_add() cheap - no more list iteration, and no
more allocations. And that is arguably the most important part, as that
is called everytime the poll is woken up. Particularly for multishot
that makes a big difference.

It's also designed much better imho, moving the more expensive bits to
the setup side.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ