lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2sjhov4poma4o4efvwe2xk474iorxwvf4ifqa5oee74744ke2e@lipjana3f5ti>
Date: Tue, 12 Nov 2024 11:50:46 +0200
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Dave Chinner <david@...morbit.com>
Cc: Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org, 
	linux-fsdevel@...r.kernel.org, hannes@...xchg.org, clm@...a.com, linux-kernel@...r.kernel.org, 
	willy@...radead.org, linux-btrfs@...r.kernel.org, linux-ext4@...r.kernel.org, 
	linux-xfs@...r.kernel.org
Subject: Re: [PATCH 10/16] mm/filemap: make buffered writes work with
 RWF_UNCACHED

On Tue, Nov 12, 2024 at 07:02:33PM +1100, Dave Chinner wrote:
> I think the post-IO invalidation that these IOs do is largely
> irrelevant to how the page cache processes the write. Indeed,
> from userspace, the functionality in this patchset would be
> implemented like this:
> 
> oneshot_data_write(fd, buf, len, off)
> {
> 	/* write into page cache */
> 	pwrite(fd, buf, len, off);
> 
> 	/* force the write through the page cache */
> 	sync_file_range(fd, off, len, SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WAIT_AFTER);
> 
> 	/* Invalidate the single use data in the cache now it is on disk */
> 	posix_fadvise(fd, off, len, POSIX_FADV_DONTNEED);
> }
> 
> Allowing the application to control writeback and invalidation
> granularity is a much more flexible solution to the problem here;
> when IO is sequential, delayed allocation will be allowed to ensure
> large contiguous extents are created and that will greatly reduce
> file fragmentation on XFS, btrfs, bcachefs and ext4. For random
> writes, it'll submit async IOs in batches...
> 
> Given that io_uring already supports sync_file_range() and
> posix_fadvise(), I'm wondering why we need an new IO API to perform
> this specific write-through behaviour in a way that is less flexible
> than what applications can already implement through existing
> APIs....

Attaching the hint to the IO operation allows kernel to keep the data in
page cache if it is there for other reason. You cannot do it with a
separate syscall.

Consider a scenario of a nightly backup of the data. The same data is in
cache because the actual workload needs it. You don't want backup task to
invalidate the data from cache. Your snippet would do that.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ