lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17c39e1d-2376-c90f-5e87-ed1982a7cff9@linux.alibaba.com>
Date:   Wed, 24 May 2023 08:56:12 +0800
From:   Gao Xiang <hsiangkao@...ux.alibaba.com>
To:     Mike Snitzer <snitzer@...nel.org>, Du Rui <durui@...ux.alibaba.com>
Cc:     Giuseppe Scrivano <gscrivan@...hat.com>, dm-devel@...hat.com,
        linux-kernel@...r.kernel.org, Alasdair Kergon <agk@...hat.com>,
        Alexander Larsson <alexl@...hat.com>,
        Joseph Qi <joseph.qi@...ux.alibaba.com>
Subject: Re: [dm-devel] dm overlaybd: targets mapping OverlayBD image

Hi Mike,

On 2023/5/24 10:28, Mike Snitzer wrote:
> On Fri, May 19 2023 at  6:27P -0400,
> Du Rui <durui@...ux.alibaba.com> wrote:
> 
>> OverlayBD is a novel layering block-level image format, which is design
>> for container, secure container and applicable to virtual machine,
>> published in USENIX ATC '20
>> https://www.usenix.org/system/files/atc20-li-huiba.pdf
>>
>> OverlayBD already has a ContainerD non-core sub-project implementation
>> in userspace, as an accelerated container image service
>> https://github.com/containerd/accelerated-container-image
>>
>> It could be much more efficient when do decompressing and mapping works
>> in the kernel with the framework of device-mapper, in many circumstances,
>> such as secure container runtime, mobile-devices, etc.
>>
>> This patch contains a module, dm-overlaybd, provides two kinds of targets
>> dm-zfile and dm-lsmt, to expose a group of block-devices contains
>> OverlayBD image as a overlaid read-only block-device.
>>
>> Signed-off-by: Du Rui <durui@...ux.alibaba.com>
> 
> <snip, original patch here: [1] >
> 
> I appreciate that this work is being done with an eye toward
> containerd "community" and standardization but based on my limited
> research it appears that this format of OCI image storage/use is only
> used by Alibaba? (but I could be wrong...)

Not necessarily Alibaba, actually OverlayBD solution is open-source to
containerd, at least I think it's an opensource project and I saw some
Microsoft Azure guys are also working on this.

> 
> But you'd do well to explain why the userspace solution isn't
> acceptable. Are there security issues that moving the implementation
> to kernel addresses?

OverlayBD user-space solution was actually the original Alibaba solution
widely used in Alibaba internally, and Nydus might be the another one
(used but limited, Ant group and Bytedance use Nydus more widely.) Since
Alibaba group is a big company, it's pretty normal to have two similiar
competing solutions together.

After I joined Alibaba, personally, I persuaded OverlayBD guys switching
from their stacked storage solution to a simple fs solution, because:

  - It allows a simple on-disk format rather than a long storage stack
    with a random fs, it increases the overall attack vector: which I
    think this year LSF/MM already discuss about that;

  - Different random fses cannot share page cache across images. IOWs,
    many in-kernel fses actually doesn't suit for container image use
    cases;

Also consider this one:
  - Apart from the detailed on-disk design, this attempt is just a
    read-only solution without  1) on-demand load;  2) write support;

  - Very similar to the exist approaches:
    dm-qcow2  https://lore.kernel.org/r/164846619932.251310.3668540533992131988.stgit@pro/
    dm-vdo   https://lore.kernel.org/r/20230523214539.226387-1-corwin@redhat.com/

I also persuaded Nydus guys from their own format to erofs format, but
I failed to persuaded Overlaybd guys.

> 
> I also have doubts that this solution is _actually_ more performant
> than a proper filesystem based solution that allows page cache sharing
> of container image data across multiple containers.

Agreed.

> 
> There is an active discussion about, and active development effort
> for, using overlayfs + erofs for container images.  I'm reluctant to
> merge this DM based container image approach without wider consensus
> from other container stakeholders.

I'm too tired about these different container image solutions.  I will
go on improve EROFS, and hopefully it will finally useful to everyone.

Thanks,
Gao Xiang

> 
> But short of reaching wider consensus on the need for these DM
> targets: there is nothing preventing you from carrying these changes
> in your alibaba kernel.
> 
> Mike
> 
> [1]: https://patchwork.kernel.org/project/dm-devel/patch/9505927dabc3b6695d62dfe1be371b12f5bdebf7.1684491648.git.durui@linux.alibaba.com/
> 
> --
> dm-devel mailing list
> dm-devel@...hat.com
> https://listman.redhat.com/mailman/listinfo/dm-devel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ