linux-kernel - Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200806080540.GA18865@gmail.com>
Date:   Thu, 6 Aug 2020 01:05:40 -0700
From:   Andrei Vagin <avagin@...il.com>
To:     Kirill Tkhai <ktkhai@...tuozzo.com>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        viro@...iv.linux.org.uk, adobriyan@...il.com, davem@...emloft.net,
        akpm@...ux-foundation.org, christian.brauner@...ntu.com,
        areber@...hat.com, serge@...lyn.com, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org,
        Pavel Tikhomirov <ptikhomirov@...tuozzo.com>
Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to
 expose namespaces lineary

On Mon, Aug 03, 2020 at 01:03:17PM +0300, Kirill Tkhai wrote:
> On 31.07.2020 01:13, Eric W. Biederman wrote:
> > Kirill Tkhai <ktkhai@...tuozzo.com> writes:
> > 
> >> On 30.07.2020 17:34, Eric W. Biederman wrote:
> >>> Kirill Tkhai <ktkhai@...tuozzo.com> writes:
> >>>
> >>>> Currently, there is no a way to list or iterate all or subset of namespaces
> >>>> in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories,
> >>>> but some also may be as open files, which are not attached to a process.
> >>>> When a namespace open fd is sent over unix socket and then closed, it is
> >>>> impossible to know whether the namespace exists or not.
> >>>>
> >>>> Also, even if namespace is exposed as attached to a process or as open file,
> >>>> iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because
> >>>> this multiplies at tasks and fds number.
> >>>
> >>> I am very dubious about this.
> >>>
> >>> I have been avoiding exactly this kind of interface because it can
> >>> create rather fundamental problems with checkpoint restart.
> >>
> >> restart/restore :)
> >>
> >>> You do have some filtering and the filtering is not based on current.
> >>> Which is good.
> >>>
> >>> A view that is relative to a user namespace might be ok.    It almost
> >>> certainly does better as it's own little filesystem than as an extension
> >>> to proc though.
> >>>
> >>> The big thing we want to ensure is that if you migrate you can restore
> >>> everything.  I don't see how you will be able to restore these files
> >>> after migration.  Anything like this without having a complete
> >>> checkpoint/restore story is a non-starter.
> >>
> >> There is no difference between files in /proc/namespaces/ directory and /proc/[pid]/ns/.
> >>
> >> CRIU can restore open files in /proc/[pid]/ns, the same will be with /proc/namespaces/ files.
> >> As a person who worked deeply for pid_ns and user_ns support in CRIU, I don't see any
> >> problem here.
> > 
> > An obvious diffference is that you are adding the inode to the inode to
> > the file name.  Which means that now you really do have to preserve the
> > inode numbers during process migration.
> >
> > Which means now we have to do all of the work to make inode number
> > restoration possible.  Which means now we need to have multiple
> > instances of nsfs so that we can restore inode numbers.
> > 
> > I think this is still possible but we have been delaying figuring out
> > how to restore inode numbers long enough that may be actual technical
> > problems making it happen.
> 
> Yeah, this matters. But it looks like here is not a dead end. We just need
> change the names the namespaces are exported to particular fs and to support
> rename().
> 
> Before introduction a principally new filesystem type for this, can't
> this be solved in current /proc?

do you mean to introduce names for namespaces which users will be able
to change? By default, this can be uuid.

And I have a suggestion about the structure of /proc/namespaces/.

Each namespace is owned by one of user namespaces. Maybe it makes sense
to group namespaces by their user-namespaces?

/proc/namespaces/
                 user
                 mnt-X
                 mnt-Y
                 pid-X
                 uts-Z
                 user-X/
                        user
                        mnt-A
                        mnt-B
                        user-C
                        user-C/
                               user
                 user-Y/
                        user

Do we try to invent cgroupfs for namespaces?

Thanks,
Andrei