lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbcd759e-2453-4570-a2a0-c9ad67ae9277@gmail.com>
Date: Thu, 20 Mar 2025 10:46:35 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Stefan Metzmacher <metze@...ba.org>, Jens Axboe <axboe@...nel.dk>,
 Joe Damato <jdamato@...tly.com>, Christoph Hellwig <hch@...radead.org>,
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
 horms@...nel.org, linux-api@...r.kernel.org, linux-arch@...r.kernel.org,
 viro@...iv.linux.org.uk, jack@...e.cz, kuba@...nel.org, shuah@...nel.org,
 sdf@...ichev.me, mingo@...hat.com, arnd@...db.de, brauner@...nel.org,
 akpm@...ux-foundation.org, tglx@...utronix.de, jolsa@...nel.org,
 linux-kselftest@...r.kernel.org
Cc: David Wei <dw@...idwei.uk>
Subject: Re: [RFC -next 00/10] Add ZC notifications to splice and sendfile

On 3/19/25 19:15, Stefan Metzmacher wrote:
> Am 19.03.25 um 19:37 schrieb Jens Axboe:
>> On 3/19/25 11:45 AM, Joe Damato wrote:
>>> On Wed, Mar 19, 2025 at 11:20:50AM -0600, Jens Axboe wrote:
...
>> My argument would be the same as for other features - if you can do it
>> simpler this other way, why not consider that? The end result would be
>> the same, you can do fast sendfile() with sane buffer reuse. But the
>> kernel side would be simpler, which is always a kernel main goal for
>> those of us that have to maintain it.
>>
>> Just adding sendfile2() works in the sense that it's an easier drop in
>> replacement for an app, though the error queue side does mean it needs
>> to change anyway - it's not just replacing one syscall with another. And
>> if we want to be lazy, sure that's fine. I just don't think it's the
>> best way to do it when we literally have a mechanism that's designed for
>> this and works with reuse already with normal send zc (and receive side
>> too, in the next kernel).
> 
> A few month (or even years) back, Pavel came up with an idea
> to implement some kind of splice into a fixed buffer, if that
> would be implemented I guess it would help me in Samba too.
> My first usage was on the receive side (from the network).

I did it as a testing ground for infra needed for ublk zerocopy,
but if that's of interest I can resurrect the patches and see
where it goes, especially since the aforementioned infra just got
queued.

> But the other side might also be possible now we have RWF_DONTCACHE.
> Instead of dropping the pages from the page cache, it might
> be possible move them to fixed buffer instead.
> It would mean the pages would be 'stable' when they are
> no longer part of the pagecache.
> But maybe my assumption for that is too naive...

That's an interesting idea

> Anyway that splice into a fixed buffer would great to have,
> as the new IORING_OP_RECV_ZC, requires control over the
> hardware queues of the nic and only allows a single process

Right, it basically borrows a hardware rx queue and that
needs CAP_NET_ADMIN, and the user also has to set up steering
rules.

> to provide buffers for that receive queue (at least that's how
> I understand it). And that's not possible for multiple process
> (maybe not belonging to the same high level application and likely

It's up to the user to decide who returns buffers back (and how to
sychronise that) as the api is just a user mapped ring. Regardless,
it's not a finished project, David and I looked at features we want
to add to make life easier for multithreaded apps that can't throw
that many queues. I see your point though.

> non-root applications). So it would be great have splice into
> fixed buffer as alternative to IORING_OP_SPLICE/IORING_OP_TEE,
> as it would be more flexible to use in combination with
> IORING_OP_SENDMSG_ZC as well as IORING_OP_WRITE[V]_FIXED with RWF_DONTCACHE.
> 
> I guess such a splice into fixed buffer linked to IORING_OP_SENDMSG_ZC
> would be the way to simulate the sendfile2() in userspace?

Right, and that approach allows to handle intermediate errors,
which is why it doesn't need to put restrictions on the input
file.

-- 
Pavel Begunkov


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ