linux-kernel - Re: [PATCHSET v8 0/12] Uncached buffered IO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250107193532.f8518eb71a469b023b6a9220@linux-foundation.org>
Date: Tue, 7 Jan 2025 19:35:32 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Jens Axboe <axboe@...nel.dk>
Cc: linux-mm@...ck.org, linux-fsdevel@...r.kernel.org, hannes@...xchg.org,
 clm@...a.com, linux-kernel@...r.kernel.org, willy@...radead.org,
 kirill@...temov.name, bfoster@...hat.com
Subject: Re: [PATCHSET v8 0/12] Uncached buffered IO

On Fri, 20 Dec 2024 08:47:38 -0700 Jens Axboe <axboe@...nel.dk> wrote:

> So here's a new approach to the same concent, but using the page cache
> as synchronization. Due to excessive bike shedding on the naming, this
> is now named RWF_DONTCACHE, and is less special in that it's just page
> cache IO, except it prunes the ranges once IO is completed.
> 
> Why do this, you may ask? The tldr is that device speeds are only
> getting faster, while reclaim is not. Doing normal buffered IO can be
> very unpredictable, and suck up a lot of resources on the reclaim side.
> This leads people to use O_DIRECT as a work-around, which has its own
> set of restrictions in terms of size, offset, and length of IO. It's
> also inherently synchronous, and now you need async IO as well. While
> the latter isn't necessarily a big problem as we have good options
> available there, it also should not be a requirement when all you want
> to do is read or write some data without caching.

Of course, we're doing something here which userspace could itself do:
drop the pagecache after reading it (with appropriate chunk sizing) and
for writes, sync the written area then invalidate it.  Possible
added benefits from using separate threads for this.

I suggest that diligence requires that we at least justify an in-kernel
approach at this time, please.

And there's a possible middle-ground implementation where the kernel
itself kicks off threads to do the drop-behind just before the read or
write syscall returns, which will probably be simpler.  Can we please
describe why this also isn't acceptable?

Also, it seems wrong for a read(RWF_DONTCACHE) to drop cache if it was
already present.  Because it was presumably present for a reason.  Does
this implementation already take care of this?  To make an application
which does read(/etc/passwd, RWF_DONTCACHE) less annoying?

Also, consuming a new page flag isn't a minor thing.  It would be nice
to see some justification around this, and some decription of how many
we have left.