[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c09f6af-653f-db3f-2378-02dca2bc07f7@gmail.com>
Date: Wed, 15 Jul 2020 22:42:04 +0300
From: Pavel Begunkov <asml.silence@...il.com>
To: Matthew Wilcox <willy@...radead.org>,
Andy Lutomirski <luto@...capital.net>
Cc: Stefano Garzarella <sgarzare@...hat.com>,
Miklos Szeredi <miklos@...redi.hu>,
Kees Cook <keescook@...omium.org>,
Christian Brauner <christian.brauner@...ntu.com>,
strace-devel@...ts.strace.io, io-uring@...r.kernel.org,
Linux API <linux-api@...r.kernel.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: strace of io_uring events?
On 15/07/2020 20:11, Matthew Wilcox wrote:
> On Wed, Jul 15, 2020 at 07:35:50AM -0700, Andy Lutomirski wrote:
>>> On Jul 15, 2020, at 4:12 AM, Miklos Szeredi <miklos@...redi.hu> wrote:
>>>
>>> <feff>Hi,
>
> feff? Are we doing WTF-16 in email now? ;-)
>
>>>
>>> This thread is to discuss the possibility of stracing requests
>>> submitted through io_uring. I'm not directly involved in io_uring
>>> development, so I'm posting this out of interest in using strace on
>>> processes utilizing io_uring.
>>>
>>> io_uring gives the developer a way to bypass the syscall interface,
>>> which results in loss of information when tracing. This is a strace
>>> fragment on "io_uring-cp" from liburing:
>>>
>>> io_uring_enter(5, 40, 0, 0, NULL, 8) = 40
>>> io_uring_enter(5, 1, 0, 0, NULL, 8) = 1
>>> io_uring_enter(5, 1, 0, 0, NULL, 8) = 1
>>> ...
>>>
>>> What really happens are read + write requests. Without that
>>> information the strace output is mostly useless.
>>>
>>> This loss of information is not new, e.g. calls through the vdso or
>>> futext fast paths are also invisible to strace. But losing filesystem
>>> I/O calls are a major blow, imo.
To clear details for those who are not familiar with io_uring:
io_uring has a pair of queues, submission (SQ) and completion queues (CQ),
both shared between kernel and user spaces. The userspace submits requests
by filling a chunk of memory in SQ. The kernel picks up SQ entries in
(syscall io_uring_enter) or asynchronously by polling SQ.
CQ entries are filled by the kernel completely asynchronously and
in parallel. Some users just poll CQ to get them, but also have a way
to wait for them.
>>>
>>> What do people think?
>>>
>>> From what I can tell, listing the submitted requests on
>>> io_uring_enter() would not be hard. Request completion is
>>> asynchronous, however, and may not require io_uring_enter() syscall.
>>> Am I correct?
Both, submission and completion sides may not require a syscall.
>>>
>>> Is there some existing tracing infrastructure that strace could use to
>>> get async completion events? Should we be introducing one?
There are static trace points covering all needs.
And if not used the whole thing have to be zero-overhead. Otherwise
there is perf, which is zero-overhead, and this IMHO won't fly.
>>
>> Let’s add some seccomp folks. We probably also want to be able to run
>> seccomp-like filters on io_uring requests. So maybe io_uring should
>> call into seccomp-and-tracing code for each action.
>
> Adding Stefano since he had a complementary proposal for iouring
> restrictions that weren't exactly seccomp.
>
--
Pavel Begunkov
Powered by blists - more mailing lists