[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250107193532.f8518eb71a469b023b6a9220@linux-foundation.org>
Date: Tue, 7 Jan 2025 19:35:32 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Jens Axboe <axboe@...nel.dk>
Cc: linux-mm@...ck.org, linux-fsdevel@...r.kernel.org, hannes@...xchg.org,
clm@...a.com, linux-kernel@...r.kernel.org, willy@...radead.org,
kirill@...temov.name, bfoster@...hat.com
Subject: Re: [PATCHSET v8 0/12] Uncached buffered IO
On Fri, 20 Dec 2024 08:47:38 -0700 Jens Axboe <axboe@...nel.dk> wrote:
> So here's a new approach to the same concent, but using the page cache
> as synchronization. Due to excessive bike shedding on the naming, this
> is now named RWF_DONTCACHE, and is less special in that it's just page
> cache IO, except it prunes the ranges once IO is completed.
>
> Why do this, you may ask? The tldr is that device speeds are only
> getting faster, while reclaim is not. Doing normal buffered IO can be
> very unpredictable, and suck up a lot of resources on the reclaim side.
> This leads people to use O_DIRECT as a work-around, which has its own
> set of restrictions in terms of size, offset, and length of IO. It's
> also inherently synchronous, and now you need async IO as well. While
> the latter isn't necessarily a big problem as we have good options
> available there, it also should not be a requirement when all you want
> to do is read or write some data without caching.
Of course, we're doing something here which userspace could itself do:
drop the pagecache after reading it (with appropriate chunk sizing) and
for writes, sync the written area then invalidate it. Possible
added benefits from using separate threads for this.
I suggest that diligence requires that we at least justify an in-kernel
approach at this time, please.
And there's a possible middle-ground implementation where the kernel
itself kicks off threads to do the drop-behind just before the read or
write syscall returns, which will probably be simpler. Can we please
describe why this also isn't acceptable?
Also, it seems wrong for a read(RWF_DONTCACHE) to drop cache if it was
already present. Because it was presumably present for a reason. Does
this implementation already take care of this? To make an application
which does read(/etc/passwd, RWF_DONTCACHE) less annoying?
Also, consuming a new page flag isn't a minor thing. It would be nice
to see some justification around this, and some decription of how many
we have left.
Powered by blists - more mailing lists