netdev - Re: [PATCH 2/8] loopfs: implement loopfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADyDSO54FV7OaVwWremmnNbTkvw6hQ-KTLJdEg3V5rfBi8n3Yw@mail.gmail.com>
Date:   Sun, 12 Apr 2020 12:38:54 +0200
From:   David Rheinsberg <david.rheinsberg@...il.com>
To:     Christian Brauner <christian.brauner@...ntu.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        lkml <linux-kernel@...r.kernel.org>, linux-block@...r.kernel.org,
        linux-api@...r.kernel.org, Jonathan Corbet <corbet@....net>,
        Serge Hallyn <serge@...lyn.com>,
        "Rafael J. Wysocki" <rafael@...nel.org>, Tejun Heo <tj@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Saravana Kannan <saravanak@...gle.com>,
        Jan Kara <jack@...e.cz>, David Howells <dhowells@...hat.com>,
        Seth Forshee <seth.forshee@...onical.com>,
        Tom Gundersen <teg@...m.no>,
        Christian Kellner <ckellner@...hat.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Stéphane Graber <stgraber@...ntu.com>,
        linux-doc@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH 2/8] loopfs: implement loopfs

Hey

On Thu, Apr 9, 2020 at 10:27 AM Christian Brauner
<christian.brauner@...ntu.com> wrote:
> On Thu, Apr 09, 2020 at 07:39:18AM +0200, David Rheinsberg wrote:
> > With loopfs in place, any process can create its own user_ns, mount
> > their private loopfs and create as many loop-devices as they want.
> > Hence, this limit does not serve as an effective global
> > resource-control. Secondly, anyone with access to `loop-control` can
> > now create loop instances until this limit is hit, thus causing anyone
> > else to be unable to create more. This effectively prevents you from
> > sharing a loopfs between non-trusting parties. I am unsure where that
> > limit would actually be used?
>
> Restricting it globally indeed wasn't the intended use-case for it. This
> was more so that you can specify an instance limit, bind-mount that
> instance into several places and sufficiently locked down users cannot
> exceed the instance limit.

But then these users can each exhaust the limit individually. As such,
you cannot share this instance across users that have no
trust-relationship. Fine with me, but I still don't understand in
which scenario the limit would be useful. Anyone can create a user-ns,
create a new loopfs mount, and just happily create more loop-devices.
So what is so special that you want to restrict the devices on a
_single_ mount instance?

> I don't think we'd be getting much out of a global limit per se I think
> the initial namespace being able to reserve a bunch of devices
> they can always rely on being able create when they need them is more
> interesting. This is similat to what devpts implements with the
> "reserved" mount option and what I initially proposed for binderfs. For
> the latter it was deemed unnecessary by others so I dropped it from
> loopfs too.

The `reserve` of devpts has a fixed 2-tier system: A global limit, and
a init-ns reserve. This does nothing to protect one container from
another.

Furthermore, how do you intend to limit user-space from creating an
unbound amount of loop devices? Unless I am mistaken, with your
proposal *any* process can create a new loopfs with a basically
unlimited amount of loop-devices, thus easily triggering unbound
kernel allocations. I think this needs to be accounted. The classic
way is to put a per-uid limit into `struct user_struct` (done by
pipes, mlock, epoll, mq, etc.). An alternative is `struct ucount`,
which allows hierarchical management (inotify uses that, as an
example).

> I also expect most users to pre-create devices in the initial namespace
> instance they need (e.g. similar to what binderfs does or what loop
> devices currently have). Does that make sense to you?

Our use-case is to get programmatic access to loop-devices, so we can
build customer images on request (especially to create XFS images,
since mkfs.xfs cannot write them, IIRC). We would be perfectly happy
with a kernel-interface that takes a file-descriptor to a regular file
and returns us a file-descriptor to a newly created block device
(which is automatically destroyed when the last file-descriptor to it
is closed). This would be ideal *to us*, since it would do automatic
cleanup on crashes.

We don't need any representation of the loop-device in the
file-system, as long as we can somehow mount it (either by passing the
bdev-FD to the new mount-api, or by using /proc/self/fd/ as
mount-source).

With your proposed loop-fs we could achieve something close to it:
Mount a private loopfs, create a loop-device, and rely on automatic
cleanup when the mount-namespace is destroyed.

Thanks
David