[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o8zr8cz3.fsf@x220.int.ebiederm.org>
Date: Wed, 11 Sep 2019 03:16:16 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Yonghong Song <yhs@...com>,
Carlos Antonio Neira Bustos <cneirabustos@...il.com>,
"netdev\@vger.kernel.org" <netdev@...r.kernel.org>,
"brouer\@redhat.com" <brouer@...hat.com>,
"bpf\@vger.kernel.org" <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
Al Viro <viro@...iv.linux.org.uk> writes:
> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
>>
>> Carlos,
>>
>> Discussed with Eric today for what is the best way to get
>> the device number for a namespace. The following patch seems
>> a reasonable start although Eric would like to see
>> how the helper is used in order to decide whether the
>> interface looks right.
>>
>> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
>> Author: Yonghong Song <yhs@...com>
>> Date: Mon Sep 9 21:50:51 2019 -0700
>>
>> nsfs: add an interface function ns_get_inum_dev()
>>
>> This patch added an interface function
>> ns_get_inum_dev(). Given a ns_common structure,
>> the function returns the inode and device
>> numbers. The function will be used later
>> by a newly added bpf helper.
>>
>> Signed-off-by: Yonghong Song <yhs@...com>
>>
>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>> index a0431642c6b5..a603c6fc3f54 100644
>> --- a/fs/nsfs.c
>> +++ b/fs/nsfs.c
>> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>> return ERR_PTR(-EINVAL);
>> }
>>
>> +/* Get the device number for the current task pidns.
>> + */
>> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
>> +{
>> + *inum = ns->inum;
>> + *dev = nsfs_mnt->mnt_sb->s_dev;
>> +}
>
> Umm... Where would it get the device number once we get (hell knows
> what for) multiple nsfs instances? I still don't understand what
> would that be about, TBH... Is it really per-userns? Or something
> else entirely? Eric, could you give some context?
My goal is not to paint things into a corner, with future changes.
Right now it is possible to stat a namespace file descriptor and
get a device and inode number. Then compare that.
I don't want people using the inode number in nsfd as some magic
namespace id.
We have had times in the past where there was more than one superblock
and thus more than one device number. Further if userspace ever uses
this heavily there may be times in the future where for
checkpoint/restart purposes we will want multiple nsfd's so we can
preserve the inode number accross a migration.
Realistically there will probably just some kind of hotplug notification
to userspace to say we have hotplugged your operatining system as
a migration notification.
Now the halway discussion did not quite capture everything I was trying
to say but it at least got to the right ballpark.
The helper in fs/nsfs.c should be:
bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
{
return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
}
That way if/when there are multiple inodes identifying the same
namespace the bpf programs don't need to change.
Up farther in the stack it should be something like:
> BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino)
> {
> return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino);
> }
>
> const struct bpf_func_proto bpf_current_pidns_match_proto = {
> .func = bpf_current_pins_match,
> .gpl_only = true,
> .ret_type = RET_INTEGER
> .arg1_type = ARG_PTR_TO_DEVICE_NUMBER,
> .arg2_type = ARG_PTR_TO_INODE_NUMBER,
> };
That allows comparing what the bpf came up with with whatever value
userspace generated by stating the file descriptor.
That is the least bad suggestion I currently have for that
functionality. It really would be better to not have that filter in the
bpf program itself but in the infrastructure that binds a program to a
set of tasks.
The problem with this approach is whatever device/inode you have when
the namespace they refer to exits there is the possibility that the
inode will be reused. So your filter will eventually start matching on
the wrong thing.
Eric
Powered by blists - more mailing lists