[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d41be32e-4d17-40b2-9dc1-950cdfe32556@asahilina.net>
Date: Fri, 22 Nov 2024 20:31:06 +0900
From: Asahi Lina <lina@...hilina.net>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: Dan Williams <dan.j.williams@...el.com>, Jan Kara <jack@...e.cz>,
Alexander Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>, Matthew Wilcox
<willy@...radead.org>, Sergio Lopez Pascual <slp@...hat.com>,
asahi@...ts.linux.dev, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Vivek Goyal <vgoyal@...hat.com>,
linux-mips@...r.kernel.org
Subject: Re: [PATCH] fuse: dax: No-op writepages callback
On 11/14/24 12:17 AM, Asahi Lina wrote:
>
>
> On 11/13/24 7:48 PM, Miklos Szeredi wrote:
>> On Tue, 12 Nov 2024 at 20:55, Asahi Lina <lina@...hilina.net> wrote:
>>>
>>> When using FUSE DAX with virtiofs, cache coherency is managed by the
>>> host. Disk persistence is handled via fsync() and friends, which are
>>> passed directly via the FUSE layer to the host. Therefore, there's no
>>> need to do dax_writeback_mapping_range(). All that ends up doing is a
>>> cache flush operation, which is not caught by KVM and doesn't do much,
>>> since the host and guest are already cache-coherent.
>>
>> The conclusion seems convincing. But adding Vivek, who originally
>> added this in commit 9483e7d5809a ("virtiofs: define dax address space
>> operations").
>>
>> What I'm not clearly seeing is how virtually aliased CPU caches
>> interact with this. In mm/filemap.c I see the flush_dcache_folio()
>> calls which deal with the kernel mapping of a page being in a
>> different cacheline as the user mapping. How does that work in the
>> virt environment?
>>
>
> Oof, I forgot those architectures existed...
>
> The only architecture that has both a KVM implementation and selects
> ARCH_HAS_CPU_CACHE_ALIASING is mips. Is it possible that no MIPS
> implementations with virtualization also have cache aliasing, and we can
> just not care about this?
I think this either isn't a problem, or it's already broken anyway. The
way Linux deals with cache aliasing for mmap is by using page coloring,
which forces mmap virtual addresses to keep a fixed color relationship
to avoid aliasing at the userspace map. Since virtiofs uses aligned 2MiB
blocks (larger than any L1 dcache size), *as long as* the SHM window is
suitably aligned by the host VMM it should map without aliasing in
guest-physical space (if it isn't aligned the mmap will fail in the host
anyway). Making sure the alignment is sufficient would be the
responsibility of the host VMM (qemu/libkrun/whatever). That ensures
coherency between host userspace and guest kernel mappings (there is no
coherency with host kernel mappings since the direct map addresses won't
be colored properly, but that is what the flush_dcache_folio() stuff in
the host kernel takes care of).
As long as the cache info is passed to the guest properly, the guest
should in turn do the right alignment for mmap. That makes userspace on
the guest and userspace on the host coherent.
Put another way: If this doesn't work without flushing it's already
broken. The architecture to deal with dcache aliasing in Linux assumes
all userspace mappings are coherent, and the kernel only needs to deal
with coherency between its own direct-map view and userspace mappings.
If it's a DAX mapping and arbitrary processes *outside* the guest can
have maps of the page and mutate them under the guest kernel, if it's
not coherent, it's already broken. There's no possible codepath for the
guest kernel to request flushing the dcache for userspace processes on
the host. Indeed, since it's supposed to be coherent and userspace
reads/writes on host and guest (or other guests) cannot be controlled to
introduce cache maintenance, no cache-flushing solution can work at all.
CCing linux-mips in case they know more.
~~ Lina
Powered by blists - more mailing lists