[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241107100105.tktkxs5qhkjwkckg@quack3>
Date: Thu, 7 Nov 2024 11:01:05 +0100
From: Jan Kara <jack@...e.cz>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Jan Kara <jack@...e.cz>, Asahi Lina <lina@...hilina.net>,
Dave Chinner <david@...morbit.com>,
Matthew Wilcox <willy@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Sergio Lopez Pascual <slp@...hat.com>,
linux-fsdevel@...r.kernel.org, nvdimm@...ts.linux.dev,
linux-kernel@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH] dax: Allow block size > PAGE_SIZE
On Wed 06-11-24 11:59:44, Dan Williams wrote:
> Jan Kara wrote:
> [..]
> > > This WARN still feels like the wrong thing, though. Right now it is the
> > > only thing in DAX code complaining on a page size/block size mismatch
> > > (at least for virtiofs). If this is so important, I feel like there
> > > should be a higher level check elsewhere, like something happening at
> > > mount time or on file open. It should actually cause the operations to
> > > fail cleanly.
> >
> > That's a fair point. Currently filesystems supporting DAX check for this in
> > their mount code because there isn't really a DAX code that would get
> > called during mount and would have enough information to perform the check.
> > I'm not sure adding a new call just for this check makes a lot of sense.
> > But if you have some good place in mind, please tell me.
>
> Is not the reason that dax_writeback_mapping_range() the only thing
> checking ->i_blkbits because 'struct writeback_control' does writeback
> in terms of page-index ranges?
To be fair, I don't remember why we've put the assertion specifically into
dax_writeback_mapping_range(). But as Dave explained there's much more to
this blocksize == pagesize limitation in DAX than just doing writeback in
terms of page-index ranges. The whole DAX entry tracking in xarray would
have to be modified to properly support other entry sizes than just PTE &
PMD sizes because otherwise the entry locking just doesn't provide the
guarantees that are expected from filesystems (e.g. you could have parallel
modifications happening to a single fs block in pagesize < blocksize case).
> All other dax entry points are filesystem controlled that know the
> block-to-pfn-to-mapping relationship.
>
> Recall that dax_writeback_mapping_range() is historically for pmem
> persistence guarantees to make sure that applications write through CPU
> cache to media.
Correct.
> Presumably there are no cache coherency concerns with fuse and dax
> writes from the guest side are not a risk of being stranded in CPU
> cache. Host side filesystem writeback will take care of them when / if
> the guest triggers a storage device cache flush, not a guest page cache
> writeback.
I'm not so sure. When you call fsync(2) in the guest on virtiofs file, it
should provide persistency guarantees on the file contents even in case of
*host* power failure. So if the guest is directly mapping host's page cache
pages through virtiofs, filemap_fdatawrite() call in the guest must result
in fsync(2) on the host to persist those pages. And as far as I vaguely
remember that happens by KVM catching the arch_wb_cache_pmem() calls and
issuing fsync(2) on the host. But I could be totally wrong here.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists