[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7f0c0a15-8847-4266-974e-c3567df1c25a@asahilina.net>
Date: Tue, 5 Nov 2024 00:31:22 +0900
From: Asahi Lina <lina@...hilina.net>
To: Jan Kara <jack@...e.cz>
Cc: Dan Williams <dan.j.williams@...el.com>,
Matthew Wilcox <willy@...radead.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Sergio Lopez Pascual
<slp@...hat.com>, linux-fsdevel@...r.kernel.org, nvdimm@...ts.linux.dev,
linux-kernel@...r.kernel.org, asahi@...ts.linux.dev
Subject: Re: [PATCH] dax: Allow block size > PAGE_SIZE
On 11/4/24 7:57 PM, Jan Kara wrote:
> On Fri 01-11-24 21:22:31, Asahi Lina wrote:
>> For virtio-dax, the file/FS blocksize is irrelevant. FUSE always uses
>> large DAX blocks (2MiB), which will work with all host page sizes. Since
>> we are mapping files into the DAX window on the host, the underlying
>> block size of the filesystem and its block device (if any) are
>> meaningless.
>>
>> For real devices with DAX, the only requirement should be that the FS
>> block size is *at least* as large as PAGE_SIZE, to ensure that at least
>> whole pages can be mapped out of the device contiguously.
>>
>> Fixes warning when using virtio-dax on a 4K guest with a 16K host,
>> backed by tmpfs (which sets blksz == PAGE_SIZE on the host).
>>
>> Signed-off-by: Asahi Lina <lina@...hilina.net>
>> ---
>> fs/dax.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Well, I don't quite understand how just relaxing the check is enough. I
> guess it may work with virtiofs (I don't know enough about virtiofs to
> really tell either way) but for ordinary DAX filesystem it would be
> seriously wrong if DAX was used with blocksize > pagesize as multiple
> mapping entries could be pointing to the same PFN which is going to have
> weird results.
Isn't that generally possible by just mapping the same file multiple
times? Why would that be an issue?
Of course having a block size smaller than the page size is never going
to work because you would not be able to map single blocks out of files
directly. But I don't see why a larger block size would cause any
issues. You'd just use several pages to map a single filesystem block.
For example, if the block size is 16K and the page size is 4K, then a
single file block would be DAX mapped as four contiguous 4K pages in
both physical and virtual memory.
> If virtiofs can actually map 4k subpages out of 16k page on
> host (and generally perform 4k granular tracking etc.), it would seem more
> appropriate if virtiofs actually exposed the filesystem 4k block size instead
> of 16k blocksize? Or am I missing something?
virtiofs itself on the guest does 2MiB mappings into the SHM region, and
then the guest is free to map blocks out of those mappings. So as long
as the guest page size is less than 2MiB, it doesn't matter, since all
files will be aligned in physical memory to that block size. It behaves
as if the filesystem block size is 2MiB from the point of view of the
guest regardless of the actual block size. For example, if the host page
size is 16K, the guest will request a 2MiB mapping of a file, which the
VMM will satisfy by mmapping 128 16K pages from its page cache (at
arbitrary physical memory addresses) into guest "physical" memory as one
contiguous block. Then the guest will see the whole 2MiB mapping as
contiguous, even though it isn't in physical RAM, and it can use any
page granularity it wants (that is supported by the architecture) to map
it to a userland process.
>
> Honza
>
>> diff --git a/fs/dax.c b/fs/dax.c
>> index c62acd2812f8d4981aaba82acfeaf972f555362a..406fb75bdbe9d17a6e4bf3d4cb92683e90f05910 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -1032,7 +1032,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
>> int ret = 0;
>> unsigned int scanned = 0;
>>
>> - if (WARN_ON_ONCE(inode->i_blkbits != PAGE_SHIFT))
>> + if (WARN_ON_ONCE(inode->i_blkbits < PAGE_SHIFT))
>> return -EIO;
>>
>> if (mapping_empty(mapping) || wbc->sync_mode != WB_SYNC_ALL)
>>
>> ---
>> base-commit: 81983758430957d9a5cb3333fe324fd70cf63e7e
>> change-id: 20241101-dax-page-size-83a1073b4e1b
>>
>> Cheers,
>> ~~ Lina
>>
~~ Lina
Powered by blists - more mailing lists