[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGHCLaSA9SnM+rtURV=U=hJ4kxpqUim6t7SgvxMNnAed0XaHMg@mail.gmail.com>
Date: Mon, 26 Jan 2026 16:02:48 -0800
From: Cong Wang <cwang@...tikernel.io>
To: Matthew Wilcox <willy@...radead.org>
Cc: Gao Xiang <hsiangkao@...ux.alibaba.com>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Cong Wang <xiyou.wangcong@...il.com>,
multikernel@...ts.linux.dev
Subject: Re: [ANNOUNCE] DAXFS: A zero-copy, dmabuf-friendly filesystem for
shared memory
On Mon, Jan 26, 2026 at 12:40 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Mon, Jan 26, 2026 at 11:48:20AM -0800, Cong Wang wrote:
> > Specifically for this scenario, struct inode is not compatible. This
> > could rule out a lot of existing filesystems, except read-only ones.
>
> I don't think you understand that there's a difference between *on disk*
> inode and *in core* inode. Compare and contrast struct ext2_inode and
> struct inode.
>
> > Now back to EROFS, it is still based on a block device, which
> > itself can't be shared among different kernels. ramdax is actually
> > a perfect example here, its label_area can't be shared among
> > different kernels.
> >
> > Let's take one step back: even if we really could share a device
> > with multiple kernels, it still could not share the memory footprint,
> > with DAX + EROFS, we would still get:
> > 1) Each kernel creates its own DAX mappings
> > 2) And faults pages independently
> >
> > There is no cross-kernel page sharing accounting.
> >
> > I hope this makes sense.
>
> No, it doesn't. I'm not suggesting that you use erofs unchanged, I'm
> suggesting that you modify erofs to support your needs.
I just tried:
https://github.com/multikernel/linux/commit/a6dc3351e78fc2028e4ca0ea02e781ca0bfefea3
Unfortunately, the multi-kernel derivation is still there and probably
hard to eliminate without re-architecturing EROFS, here is why:
DAXFS Inode (line 202-216):
struct daxfs_base_inode {
__le32 ino;
__le32 mode;
...
__le64 size;
__le64 data_offset; /* ← INTRINSIC: stored directly in inode
*/
...
};
DAXFS Read Path:
// Pseudocode - what DAXFS does
void *data = base + inode->data_offset + file_offset;
copy_to_iter(data, len, to);
// DONE. No metadata parsing, no derivation.
EROFS Read Path:
// What EROFS does (even in memory mode)
struct erofs_map_blocks map = { .m_la = pos };
erofs_map_blocks(inode, &map); // ← DERIVES physical address
// Inside erofs_map_blocks():
// - Check inode layout type (compact? extended?
chunk-indexed?)
// - For chunk-indexed: walk chunk table
// - For plain: compute from inode
// - Handle inline data, holes, compression...
src = base + map.m_pa;
Please let me know if I miss anything here.
Also, the speculative branching support is also harder for EROFS,
please see my updated README here:
https://github.com/multikernel/daxfs/blob/main/README.md
(Skip to the Branching section.)
Thanks.
Cong Wang
Powered by blists - more mailing lists