[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tx4wmlcj.fsf@x220.int.ebiederm.org>
Date: Thu, 28 Aug 2014 15:05:00 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Richard Guy Briggs <rgb@...hat.com>
Cc: linux-api@...r.kernel.org, containers@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org,
Andy Lutomirski <luto@...capital.net>, linux-audit@...hat.com,
serge@...lyn.com
Subject: Re: [PATCH V4 1/8] namespaces: assign each namespace instance a serial number
Richard Guy Briggs <rgb@...hat.com> writes:
> On 14/08/23, Eric W. Biederman wrote:
>> Richard Guy Briggs <rgb@...hat.com> writes:
>>
>> > Generate and assign a serial number per namespace instance since boot.
>> >
>> > Use a serial number per namespace (unique across one boot of one kernel)
>> > instead of the inode number (which is claimed to have had the right to change
>> > reserved and is not necessarily unique if there is more than one proc fs) to
>> > uniquely identify it per kernel boot.
>>
>> This approach is just broken.
>>
>> For this to work with migration (aka criu) you need to implement a
>> namespace of namespaces. You haven't done this, and therefore
>> such an interface will break existing userspace.
>>
>> Inside of audit I can understand not caring about these issues,
>> but you go foward and expose these serial numbers in proc,
>> and generally make this infrastructure available to others.
>>
>> The deep issue with migration is that we move tasks from one machine
>> from another and on the destination machine we need to have all of the
>> same global identifiers for software to function properly.
>>
>> My weasel words around the proc inode numbers is to preserve to allow us
>> room to be able to restore those ids if it every becomes relevant for
>> migration.
>
> What do you do if the inode number is already in use on the target
> host?
Since the inode numbers are relative to a superblock or a pid namespace
the numbers that are in use can be restored on the target system
by creating them in the appropriate namespace.
The support does not exist in the kernel today for doing that because no
one has cared but as architected the support can be added if needed to
support migration.
>> That is the proc inode numbers (technically) live in a pid namespace,
>> (aka a mount of proc). So depending on the pid namespace you are in
>> or the mount of proc you look in the numbers could change.
>>
>> Qualifications like that must exist to have a prayer of ever supporting
>> process migration in the crazy corner cases where people start caring
>> about inode numbers.
>>
>> We currently don't and inode numbers for a namespace will never change
>> after a namespace is created. So I think you really are ok using the
>> proc inode numbers. I am happy declaring by fiat that the inode numbers
>> that audit uses are the numbers connected to the initial pid namespace.
>
> But once a namespace/container is migrated, it is a different audit that
> is looking at it (unless we create an audit manager or entity that
> functions at the level of a container manager), so audit should not care.
These numbers were exported to everyone as a general purpose facility in
proc. If audit is global and audit doesn't migrate you are right it
doesn't matter. However if these numbers are used by anyone else for
anything else it causes a problem.
Further given that people run entire distributions in containers we may
reach the point where we wish to run auditd in a container in the
future. I would hate to paint ourselves into a corner with a design
that could never allow audit to migrate. Support that case someday
seems a valid naive desire.
>> At a fairly basic level anything that is used to identify namespaces for
>> any general purpose use needs to have most if not all of the same
>> properties of the proc inode numbers. The most important of which is
>> being tied to some context/namespace so there is a ability if we ever
>> need it to migrate those numbers from one machine to another.
>
> Sooo... does it make any sense to have those inode or serial numbers be
> blank inside the namespace/container itself, but only visible to its
> manager outside the container (unless it is the initial namespace)?
Mostly I think it makes sense to use the inode numbers from the initial
pid namespace. They already exist. They already are unique. (Which
means I don't need to maintain more code and more special cases). And
the do what you need now.
I probably haven't followed closely enough but I don't see what makes
inode numbers undesirable.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists