[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b0a325e-9187-702f-eba7-bfcc7e3f7eb4@fb.com>
Date: Thu, 12 Sep 2019 05:49:00 +0000
From: Yonghong Song <yhs@...com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>,
Al Viro <viro@...iv.linux.org.uk>
CC: Carlos Antonio Neira Bustos <cneirabustos@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"brouer@...hat.com" <brouer@...hat.com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data
from current task New bpf helper bpf_get_current_pidns_info.
On 9/11/19 9:16 AM, Eric W. Biederman wrote:
> Al Viro <viro@...iv.linux.org.uk> writes:
>
>> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
>>>
>>> Carlos,
>>>
>>> Discussed with Eric today for what is the best way to get
>>> the device number for a namespace. The following patch seems
>>> a reasonable start although Eric would like to see
>>> how the helper is used in order to decide whether the
>>> interface looks right.
>>>
>>> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
>>> Author: Yonghong Song <yhs@...com>
>>> Date: Mon Sep 9 21:50:51 2019 -0700
>>>
>>> nsfs: add an interface function ns_get_inum_dev()
>>>
>>> This patch added an interface function
>>> ns_get_inum_dev(). Given a ns_common structure,
>>> the function returns the inode and device
>>> numbers. The function will be used later
>>> by a newly added bpf helper.
>>>
>>> Signed-off-by: Yonghong Song <yhs@...com>
>>>
>>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>>> index a0431642c6b5..a603c6fc3f54 100644
>>> --- a/fs/nsfs.c
>>> +++ b/fs/nsfs.c
>>> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>>> return ERR_PTR(-EINVAL);
>>> }
>>>
>>> +/* Get the device number for the current task pidns.
>>> + */
>>> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
>>> +{
>>> + *inum = ns->inum;
>>> + *dev = nsfs_mnt->mnt_sb->s_dev;
>>> +}
>>
>> Umm... Where would it get the device number once we get (hell knows
>> what for) multiple nsfs instances? I still don't understand what
>> would that be about, TBH... Is it really per-userns? Or something
>> else entirely? Eric, could you give some context?
>
> My goal is not to paint things into a corner, with future changes.
> Right now it is possible to stat a namespace file descriptor and
> get a device and inode number. Then compare that.
>
> I don't want people using the inode number in nsfd as some magic
> namespace id.
>
> We have had times in the past where there was more than one superblock
> and thus more than one device number. Further if userspace ever uses
> this heavily there may be times in the future where for
> checkpoint/restart purposes we will want multiple nsfd's so we can
> preserve the inode number accross a migration.
>
> Realistically there will probably just some kind of hotplug notification
> to userspace to say we have hotplugged your operatining system as
> a migration notification.
>
> Now the halway discussion did not quite capture everything I was trying
> to say but it at least got to the right ballpark.
>
> The helper in fs/nsfs.c should be:
>
> bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
> {
> return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
> }
>
> That way if/when there are multiple inodes identifying the same
> namespace the bpf programs don't need to change.
Thanks, Eric. This is indeed better. The bpf helper should focus
on comparing dev/ino, instead of return the dev/ino to bpf program.
So overall, nsfs related change will look like:
diff --git a/fs/nsfs.c b/fs/nsfs.c
index a0431642c6b5..7e78d89c2172 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -245,6 +245,11 @@ struct file *proc_ns_fget(int fd)
return ERR_PTR(-EINVAL);
}
+bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
+{
+ return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
+}
+
static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
{
struct inode *inode = d_inode(dentry);
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index d31cb6215905..79639807e960 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -81,6 +81,7 @@ extern void *ns_get_path(struct path *path, struct
task_struct *task,
typedef struct ns_common *ns_get_path_helper_t(void *);
extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t
ns_get_cb,
void *private_data);
+extern bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino);
extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
const struct proc_ns_operations *ns_ops);
>
> Up farther in the stack it should be something like:
>
>> BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino)
>> {
>> return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino);
>> }
>>
>> const struct bpf_func_proto bpf_current_pidns_match_proto = {
>> .func = bpf_current_pins_match,
>> .gpl_only = true,
>> .ret_type = RET_INTEGER
>> .arg1_type = ARG_PTR_TO_DEVICE_NUMBER,
>> .arg2_type = ARG_PTR_TO_INODE_NUMBER,
>> };
>
> That allows comparing what the bpf came up with with whatever value
> userspace generated by stating the file descriptor.
>
>
> That is the least bad suggestion I currently have for that
> functionality. It really would be better to not have that filter in the
> bpf program itself but in the infrastructure that binds a program to a
> set of tasks.
>
> The problem with this approach is whatever device/inode you have when
> the namespace they refer to exits there is the possibility that the
> inode will be reused. So your filter will eventually start matching on
> the wrong thing.
I come up with a differeent helper definition, which is much more
similar to existing bpf_get_current_pid_tgid() and helper definition
much more conforms to bpf convention.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5e28718928ca..bc26903c80c7 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -11,6 +11,8 @@
#include <linux/uidgid.h>
#include <linux/filter.h>
#include <linux/ctype.h>
+#include <linux/pid_namespace.h>
+#include <linux/proc_ns.h>
#include "../../lib/kstrtox.h"
@@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = {
.arg4_type = ARG_PTR_TO_LONG,
};
#endif
+
+BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u32, inum)
+{
+ struct task_struct *task = current;
+ struct pid_namespace *pidns;
+ pid_t pid, tgid;
+
+ if (unlikely(!task))
+ return -EINVAL;
+
+
+ pidns = task_active_pid_ns(task);
+ if (unlikely(!pidns))
+ return -ENOENT;
+
+ if (!ns_match(&pidns->ns, (dev_t)dev, inum))
+ return -EINVAL;
+
+ pid = task_pid_nr_ns(task, pidns);
+ tgid = task_tgid_nr_ns(task, pidns);
+
+ return (u64) tgid << 32 | pid;
+}
+
+const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = {
+ .func = bpf_get_ns_current_pid_tgid,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_ANYTHING,
+ .arg2_type = ARG_ANYTHING,
+};
Existing usage of bpf_get_current_pid_tgid() can be converted
to bpf_get_ns_current_pid_tgid() if ns dev/inode number
is supplied. For bpf_get_ns_current_pid_tgid(), checking
return value ( < 0 or not) is needed.
>
> Eric
>
Powered by blists - more mailing lists