[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fed3341a-365f-4099-b58d-8687732d193f@linux.alibaba.com>
Date: Tue, 27 Jan 2026 04:13:31 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: Cong Wang <cwang@...tikernel.io>, Matthew Wilcox <willy@...radead.org>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Cong Wang <xiyou.wangcong@...il.com>, multikernel@...ts.linux.dev
Subject: Re: [ANNOUNCE] DAXFS: A zero-copy, dmabuf-friendly filesystem for
shared memory
On 2026/1/27 03:48, Cong Wang wrote:
> On Mon, Jan 26, 2026 at 11:16 AM Matthew Wilcox <willy@...radead.org> wrote:
>>
>> On Mon, Jan 26, 2026 at 09:38:23AM -0800, Cong Wang wrote:
>>> If you are interested in adding multikernel support to EROFS, here is
>>> the codebase you could start with:
>>> https://github.com/multikernel/linux. PR is always welcome.
>>
>> I think the onus is rather the other way around. Adding a new filesystem
>> to Linux has a high bar to clear because it becomes a maintenance burden
>> to the rest of us. Convince us that what you're doing here *can't*
>> be done better by modifying erofs.
>>
>> Before I saw the email from Gao Xiang, I was also going to suggest that
>> using erofs would be a better idea than supporting your own filesystem.
>> Writing a new filesystem is a lot of fun. Supporting a new filesystem
>> and making it production-quality is a whole lot of pain. It's much
>> better if you can leverage other people's work. That's why DAX is a
>> support layer for filesystems rather than its own filesystem.
>
> Great question.
>
> The core reason is multikernel assumes little to none compatibility.
>
> Specifically for this scenario, struct inode is not compatible. This
> could rule out a lot of existing filesystems, except read-only ones.
I don't quite get the point here, assuming you know filesystems.
>
> Now back to EROFS, it is still based on a block device, which
> itself can't be shared among different kernels. ramdax is actually
> a perfect example here, its label_area can't be shared among
> different kernels.
>
> Let's take one step back: even if we really could share a device
> with multiple kernels, it still could not share the memory footprint,
> with DAX + EROFS, we would still get:
> 1) Each kernel creates its own DAX mappings
> 2) And faults pages independently
>
> There is no cross-kernel page sharing accounting.
>
> I hope this makes sense.
No, EROFS on-disk format designs for any backend, so you could
use this format backed by:
1) raw block device
2) file
3) a pure ramdaxfs (it's still WIP)
Why not? because an ordinary container image user doesn't assume
a fs especially for a particular type of device, especially for
golden image usage.
You cannot say, oh, I build an image, maybe, you have to use it
just for ramdax usage, oh, you backed by a file on the block
device, you have to convert to another format to use:
EROFS on-disk format should allow for _all the device backend_.
At a quick glance of your code, it seems it's much premature
and ineffective because subdirectories just like a link chain,
and maybe it is only somewhat reasonable for ramdax usage,
but it's still _not_ cache-friendly.
The reason why it doesn't work for you because _multikernel_
isn't an offical upsteam requirement, all upstream virtualization
users directly use virtio-pmem now.
I think for the upstream kernels, you'd like to make multikernel
an offical upstream requirement first, then there will be drivers
for you to do multikernel ramdax, rather than the raw usage of
1) memremap
2) vmf_insert_mixed
in the filesystem drivers, I do think they are _red line_ for
any new filesytem drivers (instead of legacy cramfs MTD XIP
old code).
Anyway, I really think your current use cases are already
covered by EROFS for many years.
Thanks,
Gao Xiang
>
> Regards,
> Cong
Powered by blists - more mailing lists