linux-kernel - Re: [ANNOUNCE] DAXFS: A zero-copy, dmabuf-friendly filesystem for shared memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAGHCLaQbr2Q1KwEJhsZGuaFV=m6WEkxsgurg30+pjSQ4dHQ_1Q@mail.gmail.com>
Date: Mon, 26 Jan 2026 09:38:23 -0800
From: Cong Wang <cwang@...tikernel.io>
To: Gao Xiang <hsiangkao@...ux.alibaba.com>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Cong Wang <xiyou.wangcong@...il.com>, multikernel@...ts.linux.dev
Subject: Re: [ANNOUNCE] DAXFS: A zero-copy, dmabuf-friendly filesystem for
 shared memory

Hi Xiang,

On Sun, Jan 25, 2026 at 8:04 PM Gao Xiang <hsiangkao@...ux.alibaba.com> wrote:
>
> Hi Cong,
>
> On 2026/1/25 01:10, Cong Wang wrote:
> > Hello,
> >
> > I would like to introduce DAXFS, a simple read-only filesystem
> > designed to operate directly on shared physical memory via the DAX
> > (Direct Access).
> >
> > Unlike ramfs or tmpfs, which operate within the kernel’s page cache
> > and result in fragmented, per-instance memory allocation, DAXFS
> > provides a mechanism for zero-copy reads from contiguous memory
> > regions. It bypasses the traditional block I/O stack, buffer heads,
> > and page cache entirely.
> >
> > Key Features
> > - Zero-Copy Efficiency: File reads resolve to direct memory loads,
> > eliminating page cache duplication and CPU-driven copies.
> > - True Physical Sharing: By mapping a contiguous physical address or a
> > dma-buf, multiple kernel instances or containers can share the same
> > physical pages.
> > - Hardware Integration: Supports mounting memory exported by GPUs,
> > FPGAs, or CXL devices via the dma-buf API.
> > - Simplicity: Uses a self-contained, read-only image format with no
> > runtime allocation or complex device management.
> >
> > Primary Use Cases
> > - Multikernel Environments: Sharing a common Docker image across
> > independent kernel instances via shared memory.
> > - CXL Memory Pooling: Accessing read-only data across multiple hosts
> > without network I/O.
> > - Container Rootfs Sharing: Using a single DAXFS base image for
> > multiple containers (via OverlayFS) to save physical RAM.
> > - Accelerator Data: Zero-copy access to model weights or lookup tables
> > stored in device memory.
>
> Actually, EROFS DAX is already used for this way for various users,
> including all the usage above.
>
> Could you explain why EROFS doesn't suit for your use cases?

EROFS does not support direct physical memory operations. As you
mentioned, it relies on other layers like ramdax to function in these
scenarios.

I have looked into ramdax, and it does not seem suitable for
multikernel use case. Specifically, the ending 128K is shared across
multiple kernels, which would cause significant issues. For reference:

87 dimm->label_area = memremap(start + size - LABEL_AREA_SIZE,
88 LABEL_AREA_SIZE, MEMREMAP_WB);
...
154 static int ramdax_set_config_data(struct nvdimm *nvdimm, int buf_len,
155 struct nd_cmd_set_config_hdr *cmd)
156 {
157 struct ramdax_dimm *dimm = nvdimm_provider_data(nvdimm);
158
159 if (sizeof(*cmd) > buf_len)
160 return -EINVAL;
161 if (struct_size(cmd, in_buf, cmd->in_length) > buf_len)
162 return -EINVAL;
163 if (size_add(cmd->in_offset, cmd->in_length) > LABEL_AREA_SIZE)
164 return -EINVAL;
165
166 memcpy(dimm->label_area + cmd->in_offset, cmd->in_buf, cmd->in_length);
167
168 return 0;
169 }

Not to mention other cases like GPU/SmartNIC etc..

If you are interested in adding multikernel support to EROFS, here is
the codebase you could start with:
https://github.com/multikernel/linux. PR is always welcome.

Thanks,
Cong Wang