[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140902214035.GT4462@madcap2.tricolour.ca>
Date: Tue, 2 Sep 2014 17:40:35 -0400
From: Richard Guy Briggs <rgb@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: linux-api@...r.kernel.org, containers@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org,
Andy Lutomirski <luto@...capital.net>, linux-audit@...hat.com,
serge@...lyn.com
Subject: Re: [PATCH V4 1/8] namespaces: assign each namespace instance a
serial number
On 14/08/28, Eric W. Biederman wrote:
> Richard Guy Briggs <rgb@...hat.com> writes:
> > On 14/08/23, Eric W. Biederman wrote:
> >> Richard Guy Briggs <rgb@...hat.com> writes:
> >>
> >> > Generate and assign a serial number per namespace instance since boot.
> >> >
> >> > Use a serial number per namespace (unique across one boot of one kernel)
> >> > instead of the inode number (which is claimed to have had the right to change
> >> > reserved and is not necessarily unique if there is more than one proc fs) to
> >> > uniquely identify it per kernel boot.
> >>
> >> This approach is just broken.
> >>
> >> For this to work with migration (aka criu) you need to implement a
> >> namespace of namespaces. You haven't done this, and therefore
> >> such an interface will break existing userspace.
> >>
> >> Inside of audit I can understand not caring about these issues,
> >> but you go foward and expose these serial numbers in proc,
> >> and generally make this infrastructure available to others.
> >>
> >> The deep issue with migration is that we move tasks from one machine
> >> from another and on the destination machine we need to have all of the
> >> same global identifiers for software to function properly.
> >>
> >> My weasel words around the proc inode numbers is to preserve to allow us
> >> room to be able to restore those ids if it every becomes relevant for
> >> migration.
> >
> > What do you do if the inode number is already in use on the target
> > host?
>
> Since the inode numbers are relative to a superblock or a pid namespace
> the numbers that are in use can be restored on the target system
> by creating them in the appropriate namespace.
So you seem to be advocating for a namespace of namespaces, since
neither host can create a new namespace without consulting the others in
its pool for a new free number.
> The support does not exist in the kernel today for doing that because no
> one has cared but as architected the support can be added if needed to
> support migration.
>
> >> That is the proc inode numbers (technically) live in a pid namespace,
> >> (aka a mount of proc). So depending on the pid namespace you are in
> >> or the mount of proc you look in the numbers could change.
> >>
> >> Qualifications like that must exist to have a prayer of ever supporting
> >> process migration in the crazy corner cases where people start caring
> >> about inode numbers.
> >>
> >> We currently don't and inode numbers for a namespace will never change
> >> after a namespace is created. So I think you really are ok using the
> >> proc inode numbers. I am happy declaring by fiat that the inode numbers
> >> that audit uses are the numbers connected to the initial pid namespace.
> >
> > But once a namespace/container is migrated, it is a different audit that
> > is looking at it (unless we create an audit manager or entity that
> > functions at the level of a container manager), so audit should not care.
>
> These numbers were exported to everyone as a general purpose facility in
> proc. If audit is global and audit doesn't migrate you are right it
> doesn't matter. However if these numbers are used by anyone else for
> anything else it causes a problem.
So let us restrict their use to audit, by removing them from
/proc/<pid>/ns/ and only exposing them via netlink calls to audit gated
by CAP_AUDIT_WRITE or CAP_AUDIT_CONTROL.
> Further given that people run entire distributions in containers we may
> reach the point where we wish to run auditd in a container in the
> future. I would hate to paint ourselves into a corner with a design
> that could never allow audit to migrate. Support that case someday
> seems a valid naive desire.
Agreed. That is an option we do not want to rule out at this point.
I'll need to think about this one more.
> >> At a fairly basic level anything that is used to identify namespaces for
> >> any general purpose use needs to have most if not all of the same
> >> properties of the proc inode numbers. The most important of which is
> >> being tied to some context/namespace so there is a ability if we ever
> >> need it to migrate those numbers from one machine to another.
> >
> > Sooo... does it make any sense to have those inode or serial numbers be
> > blank inside the namespace/container itself, but only visible to its
> > manager outside the container (unless it is the initial namespace)?
>
> Mostly I think it makes sense to use the inode numbers from the initial
> pid namespace. They already exist. They already are unique. (Which
> means I don't need to maintain more code and more special cases). And
> the do what you need now.
Will inode numbers never be re-used once they are freed? Guaranteed?
> I probably haven't followed closely enough but I don't see what makes
> inode numbers undesirable.
This posting:
https://www.redhat.com/archives/linux-audit/2013-March/msg00032.html
> Eric
- RGB
--
Richard Guy Briggs <rbriggs@...hat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists