lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 22 Apr 2024 16:03:56 +0100
From: Matthew Wilcox <willy@...radead.org>
To: John Garry <john.g.garry@...cle.com>
Cc: axboe@...nel.dk, brauner@...nel.org, djwong@...nel.org,
	viro@...iv.linux.org.uk, jack@...e.cz, akpm@...ux-foundation.org,
	dchinner@...hat.com, tytso@....edu, hch@....de,
	martin.petersen@...cle.com, nilay@...ux.ibm.com,
	ritesh.list@...il.com, mcgrof@...nel.org,
	linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-mm@...ck.org, ojaswin@...ux.ibm.com, p.raghav@...sung.com,
	jbongio@...gle.com, okiselev@...zon.com
Subject: Re: [PATCH RFC 5/7] fs: iomap: buffered atomic write support

On Mon, Apr 22, 2024 at 02:39:21PM +0000, John Garry wrote:
> Add special handling of PG_atomic flag to iomap buffered write path.
> 
> To flag an iomap iter for an atomic write, set IOMAP_ATOMIC.
> 
> For a folio associated with a write which has IOMAP_ATOMIC set, set
> PG_atomic.
> 
> Otherwise, when IOMAP_ATOMIC is unset, clear PG_atomic.
> 
> This means that for an "atomic" folio which has not been written back, it
> loses it "atomicity". So if userspace issues a write with RWF_ATOMIC set
> and another write with RWF_ATOMIC unset and which fully or partially
> overwrites that same region as the first write, that folio is not written
> back atomically. For such a scenario to occur, it would be considered a
> userspace usage error.
> 
> To ensure that a buffered atomic write is written back atomically when
> the write syscall returns, RWF_SYNC or similar needs to be used (in
> conjunction with RWF_ATOMIC).
> 
> As a safety check, when getting a folio for an atomic write in
> iomap_get_folio(), ensure that the length matches the inode mapping folio
> order-limit.
> 
> Only a single BIO should ever be submitted for an atomic write. So modify
> iomap_add_to_ioend() to ensure that we don't try to write back an atomic
> folio as part of a larger mixed-atomicity BIO.
> 
> In iomap_alloc_ioend(), handle an atomic write by setting REQ_ATOMIC for
> the allocated BIO.
> 
> When a folio is written back, again clear PG_atomic, as it is no longer
> required. I assume it will not be needlessly written back a second time...

I'm not taking a position on the mechanism yet; need to think about it
some more.  But there's a hole here I also don't have a solution to,
so we can all start thinking about it.

In iomap_write_iter(), we call copy_folio_from_iter_atomic().  Through no
fault of the application, if the range crosses a page boundary, we might
partially copy the bytes from the first page, then take a page fault on
the second page, hence doing a short write into the folio.  And there's
nothing preventing writeback from writing back a partially copied folio.

Now, if it's not dirty, then it can't be written back.  So if we're
doing an atomic write, we could clear the dirty bit after calling
iomap_write_begin() (given the usage scenarios we've discussed, it should
always be clear ...)

We need to prevent the "fall back to a short copy" logic in
iomap_write_iter() as well.  But then we also need to make sure we don't
get stuck in a loop, so maybe go three times around, and if it's still
not readable as a chunk, -EFAULT?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ