[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2sjhov4poma4o4efvwe2xk474iorxwvf4ifqa5oee74744ke2e@lipjana3f5ti>
Date: Tue, 12 Nov 2024 11:50:46 +0200
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Dave Chinner <david@...morbit.com>
Cc: Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, hannes@...xchg.org, clm@...a.com, linux-kernel@...r.kernel.org,
willy@...radead.org, linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-xfs@...r.kernel.org
Subject: Re: [PATCH 10/16] mm/filemap: make buffered writes work with
RWF_UNCACHED
On Tue, Nov 12, 2024 at 07:02:33PM +1100, Dave Chinner wrote:
> I think the post-IO invalidation that these IOs do is largely
> irrelevant to how the page cache processes the write. Indeed,
> from userspace, the functionality in this patchset would be
> implemented like this:
>
> oneshot_data_write(fd, buf, len, off)
> {
> /* write into page cache */
> pwrite(fd, buf, len, off);
>
> /* force the write through the page cache */
> sync_file_range(fd, off, len, SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER);
>
> /* Invalidate the single use data in the cache now it is on disk */
> posix_fadvise(fd, off, len, POSIX_FADV_DONTNEED);
> }
>
> Allowing the application to control writeback and invalidation
> granularity is a much more flexible solution to the problem here;
> when IO is sequential, delayed allocation will be allowed to ensure
> large contiguous extents are created and that will greatly reduce
> file fragmentation on XFS, btrfs, bcachefs and ext4. For random
> writes, it'll submit async IOs in batches...
>
> Given that io_uring already supports sync_file_range() and
> posix_fadvise(), I'm wondering why we need an new IO API to perform
> this specific write-through behaviour in a way that is less flexible
> than what applications can already implement through existing
> APIs....
Attaching the hint to the IO operation allows kernel to keep the data in
page cache if it is there for other reason. You cannot do it with a
separate syscall.
Consider a scenario of a nightly backup of the data. The same data is in
cache because the actual workload needs it. You don't want backup task to
invalidate the data from cache. Your snippet would do that.
--
Kiryl Shutsemau / Kirill A. Shutemov
Powered by blists - more mailing lists