[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c0e67fc7-be66-c4c6-6aad-316cbba18757@fb.com>
Date: Fri, 6 Sep 2019 23:21:14 +0000
From: Yonghong Song <yhs@...com>
To: Al Viro <viro@...iv.linux.org.uk>,
Carlos Neira <cneirabustos@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"ebiederm@...ssion.com" <ebiederm@...ssion.com>,
"brouer@...hat.com" <brouer@...hat.com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data
from current task New bpf helper bpf_get_current_pidns_info.
On 9/6/19 9:00 AM, Al Viro wrote:
> On Fri, Sep 06, 2019 at 04:46:47PM +0100, Al Viro wrote:
>
>>> Where do I begin?
>>> * getname_kernel() is there for purpose
>>> * so's kern_path(), damnit
>>
>> Oh, and filename_lookup() *CAN* sleep, obviously. So that
>> GFP_ATOMIC above is completely pointless.
>>
>>>> +
>>>> + inode = d_backing_inode(kp.dentry);
>>>> + pidns_info->dev = (u32)inode->i_rdev;
>
> In the original variant of patchset it used to be ->i_sb->s_dev,
> which is also bloody strange - you are not asking filename_lookup()
> to follow symlinks, so you'd get that of whatever filesystem
> /proc/self/ns resides on.
>
> ->i_rdev use makes no sense whatsoever - it's a symlink and
> neither it nor its target are device nodes; ->i_rdev will be
> left zero for both.
>
> What data are you really trying to get there?
Let me explain a little bit background here.
The ultimate goal is for bpf program to filter over
(pid_namespace, tgid/pid inside pid_namespace)
so bpf based tools can run inside the container.
Typically, pid namespace is achieved by looking at
/proc/self/ns/pid:
-bash-4.4$ lsns
NS TYPE NPROCS PID USER COMMAND
4026531835 cgroup 44 8261 yhs /usr/lib/systemd/systemd --user
4026531836 pid 44 8261 yhs /usr/lib/systemd/systemd --user
4026531837 user 44 8261 yhs /usr/lib/systemd/systemd --user
4026531838 uts 44 8261 yhs /usr/lib/systemd/systemd --user
4026531839 ipc 44 8261 yhs /usr/lib/systemd/systemd --user
4026531840 mnt 44 8261 yhs /usr/lib/systemd/systemd --user
4026532008 net 44 8261 yhs /usr/lib/systemd/systemd --user
-bash-4.4$ readlink /proc/self/ns/pid
pid:[4026531836]
-bash-4.4$ stat /proc/self/ns/pid
File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
Size: 0 Blocks: 0 IO Block: 1024 symbolic link
Device: 4h/4d Inode: 344795989 Links: 1
Access: (0777/lrwxrwxrwx) Uid: (128203/ yhs) Gid: ( 100/ users)
Context: user_u:base_r:base_t
Access: 2019-09-06 16:06:09.431616380 -0700
Modify: 2019-09-06 16:06:09.431616380 -0700
Change: 2019-09-06 16:06:09.431616380 -0700
Birth: -
-bash-4.4$
Based on a discussion with Eric Biederman back in 2019 Linux
Plumbers, Eric suggested that to uniquely identify a
namespace, device id (major/minor) number should also
be included. Although today's kernel implementation
has the same device for all namespace pseudo files,
but from uapi perspective, device id should be included.
That is the reason why we try to get device id which holds
pid namespace pseudo file.
Do you have a better suggestion on how to get
the device id for 'current' pid namespace? Or from design, we
really should not care about device id at all?
Powered by blists - more mailing lists