lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <3e56c92b-567c-7bb4-2644-dc1ad1d8c3ae@samba.org> Date: Fri, 21 Oct 2022 11:36:38 +0200 From: Stefan Metzmacher <metze@...ba.org> To: Pavel Begunkov <asml.silence@...il.com>, io-uring <io-uring@...r.kernel.org>, Jens Axboe <axboe@...nel.dk> Cc: Jakub Kicinski <kuba@...nel.org>, netdev <netdev@...r.kernel.org>, Dylan Yudaken <dylany@...com> Subject: Re: IORING_SEND_NOTIF_REPORT_USAGE (was Re: IORING_CQE_F_COPIED) Hi Pavel, >>>> So far I came up with a IORING_SEND_NOTIF_REPORT_USAGE opt-in flag >>>> and the reporting is done in cqe.res with IORING_NOTIF_USAGE_ZC_USED (0x00000001) >>>> and/or IORING_NOTIF_USAGE_ZC_COPIED (0x8000000). So the caller is also >>>> able to notice that some parts were able to use zero copy, while other >>>> fragments were copied. >>> >>> Are we really interested in multihoming and probably some very edge cases? >>> I'd argue we're not and it should be a single bool hint indicating whether >>> zc is viable or not. It can do more complex calculations _if_ needed, e.g. >>> looking inside skb's and figure out how many bytes were copied but as for me >>> it should better be turned into a single bool in the end. Could also be the >>> number of bytes copied, but I don't think we can't have the accuracy for >>> that (e.g. what we're going to return if some protocol duplicates an skb >>> and sends to 2 different devices or is processing it in a pipeline?) >>> >>> So the question is what is the use case for having 2 flags? >> >> It's mostly for debugging. > > Ok, than it sounds like we don't need it. Maybe I could add some trace points to the callback? >>> btw, now we've got another example why the report flag is a good idea, >> >> I don't understand that line... > > I'm just telling that IORING_SEND_NOTIF_* instead of unconditional reporting > is more flexible and extendible from the uapi perspective. ok >>> we can't use cqe.res unconditionally because we want to have a "one CQE >>> per request" mode, but it's fine if we make it and the report flag >>> mutually exclusive. >> >> You mean we can add an optimized case where SEND[MSG]_ZC would not >> generate F_MORE and skips F_NOTIF, because we copied or the transmission >> path was really fast? > > It is rather about optionally omitting the first (aka completion) cqe and > posting only the notification cqe, which makes a lot of sense for UDP and > some TCP use cases. OK. >> Then I'd move to IORING_CQE_F_COPIED again... > [...] >>>> -struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) >>>> +static void __io_notif_complete_tw_report_usage(struct io_kiocb *notif, bool *locked) >>> >>> Just shove all that into __io_notif_complete_tw(). >> >> Ok, and then optimze later? > > Right, I'm just tired of back porting patches by hand :) ok, I just assumed it would be 6.1 only. >> Otherwise we could have IORING_CQE_F_COPIED by default without opt-in >> flag... Do you still want an opt-in flag to get IORING_CQE_F_COPIED? If so what name do you want it to be? >>>> +static void io_uring_tx_zerocopy_callback_report_usage(struct sk_buff *skb, >>>> + struct ubuf_info *uarg, >>>> + bool success) >>>> +{ >>>> + struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg); >>>> + >>>> + if (success && !nd->zc_used && skb) >>>> + nd->zc_used = true; >>>> + else if (unlikely(!success && !nd->zc_copied)) >>>> + nd->zc_copied = true; >>> >>> It's fine but racy, so let's WRITE_ONCE() to indicate it. >> >> I don't see how this could be a problem, but I can add it. > > It's not a problem, but better to be a little be more explicit > about parallel writes. ok. >>>> diff --git a/io_uring/notif.h b/io_uring/notif.h >>>> index 5b4d710c8ca5..5ac7a2745e52 100644 >>>> --- a/io_uring/notif.h >>>> +++ b/io_uring/notif.h >>>> @@ -13,10 +13,12 @@ struct io_notif_data { >>>> struct file *file; >>>> struct ubuf_info uarg; >>>> unsigned long account_pages; >>>> + bool zc_used; >>>> + bool zc_copied; >>> >>> IIRC io_notif_data is fully packed in 6.1, so placing zc_{used,copied} >>> there might complicate backporting (if any). We can place them in io_kiocb >>> directly and move in 6.2. Alternatively account_pages doesn't have to be >>> long. >> >> As far as I can see kernel-dk-block/io_uring-6.1 alread has your >> shrink patches included... > > Sorry, I mean 6.0 So you want to backport to 6.0? Find the current version below, sizeof(struct io_kiocb) will grow from 3*64 + 24 to 3*64 + 32 (on x64_64) to it stays within 4 cache lines. I tried this first: union { u8 iopoll_completed; struct { u8 zc_used:1; u8 zc_copied:1; }; }; But then WRITE_ONCE() complains about a bitfield write. So let me now about the opt-in flag and I'll prepare real commits including a patch that moves from struct io_kiocb to struct io_notif_data on top. metze diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index f5b687a787a3..189152ad78d6 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -515,6 +515,9 @@ struct io_kiocb { u8 opcode; /* polled IO has completed */ u8 iopoll_completed; + /* these will be moved to struct io_notif_data in 6.1 */ + bool zc_used; + bool zc_copied; /* * Can be either a fixed buffer index, or used with provided buffers. * For the latter, before issue it points to the buffer group ID, diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index ab7458033ee3..738d6234d1d9 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -350,6 +350,7 @@ struct io_uring_cqe { #define IORING_CQE_F_MORE (1U << 1) #define IORING_CQE_F_SOCK_NONEMPTY (1U << 2) #define IORING_CQE_F_NOTIF (1U << 3) +#define IORING_CQE_F_COPIED (1U << 4) enum { IORING_CQE_BUFFER_SHIFT = 16, diff --git a/io_uring/notif.c b/io_uring/notif.c index e37c6569d82e..033aca064b10 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -18,6 +18,10 @@ static void __io_notif_complete_tw(struct io_kiocb *notif, bool *locked) __io_unaccount_mem(ctx->user, nd->account_pages); nd->account_pages = 0; } + + if (notif->zc_copied || !notif->zc_used) + notif->cqe.flags |= IORING_CQE_F_COPIED; + io_req_task_complete(notif, locked); } @@ -28,6 +32,11 @@ static void io_uring_tx_zerocopy_callback(struct sk_buff *skb, struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg); struct io_kiocb *notif = cmd_to_io_kiocb(nd); + if (success && !notif->zc_used && skb) + WRITE_ONCE(notif->zc_used, true); + else if (!success && !notif->zc_copied) + WRITE_ONCE(notif->zc_copied, true); + if (refcount_dec_and_test(&uarg->refcnt)) { notif->io_task_work.func = __io_notif_complete_tw; io_req_task_work_add(notif); @@ -55,6 +64,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) nd->account_pages = 0; nd->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN; nd->uarg.callback = io_uring_tx_zerocopy_callback; + notif->zc_used = notif->zc_copied = false; refcount_set(&nd->uarg.refcnt, 1); return notif; }
Powered by blists - more mailing lists