netdev - Re: [PATCH net] bpf: expose netns inode to bpf programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8737fvt25a.fsf@xmission.com>
Date:   Sat, 04 Feb 2017 10:06:41 +1300
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Alexei Starovoitov <ast@...com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "David S . Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Ahern <dsa@...ulusnetworks.com>,
        Tejun Heo <tj@...nel.org>, Thomas Graf <tgraf@...g.ch>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net] bpf: expose netns inode to bpf programs

Andy Lutomirski <luto@...capital.net> writes:

> On Thu, Feb 2, 2017 at 8:33 PM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
>> Alexei Starovoitov <ast@...com> writes:
>>
>>> On 1/26/17 11:07 AM, Andy Lutomirski wrote:
>>>> On Thu, Jan 26, 2017 at 10:32 AM, Alexei Starovoitov <ast@...com> wrote:
>>>>> On 1/26/17 10:12 AM, Andy Lutomirski wrote:
>>>>>>
>>>>>> On Thu, Jan 26, 2017 at 9:46 AM, Alexei Starovoitov <ast@...com> wrote:
>>>>>>>
>>>>>>> On 1/26/17 8:37 AM, Andy Lutomirski wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Think of bpf programs as safe kernel modules. They don't have
>>>>>>>>> confined boundaries and program authors, if not careful, can shoot
>>>>>>>>> themselves in the foot. We're not trying to prevent that because
>>>>>>>>> it's impossible to check that the program is sane. Just like
>>>>>>>>> it's impossible to check that kernel module is sane.
>>>>>>>>> But in case of bpf we check that bpf program is _safe_ from the kernel
>>>>>>>>> point of view. If it's doing some garbage, it's program's business.
>>>>>>>>> Does it make more sense now?
>>>>>>>>>
>>>>>>>>
>>>>>>>> With all due respect, I think this is not an acceptable way to think
>>>>>>>> about BPF at all.  If you think of BPF this way, I think there needs
>>>>>>>> to be a real discussion at KS or similar as to whether this is okay.
>>>>>>>> The reason is simple: the kernel promises a stable ABI to userspace
>>>>>>>> but not to kernel modules.  By thinking of BPF as more like a module,
>>>>>>>> you're taking a big shortcut that will either result in ABI breakage
>>>>>>>> down the road or in committing to a problematic stable ABI.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> you misunderstood the analogy.
>>>>>>> bpf abi is certainly stable. that's why we were careful of not
>>>>>>> exposing anything to it that is not already stable.
>>>>>>>
>>>>>>
>>>>>> In that case I don't understand what you're trying to say.  Eric
>>>>>> thinks your patch exposes a bad interface.  A bad interface for
>>>>>> userspace is a very different thing from a bad interface available to
>>>>>> kernel modules.  Are you saying that BPF is kernel-module-like in that
>>>>>> the ABI exposed to BPF programs doesn't need to meet the same quality
>>>>>> standards as userspace ABIs?
>>>>>
>>>>>
>>>>> of course not.
>>>>> ns.inum is already exposed to user space as a value.
>>>>> This patch exposes it to bpf program in a convenient and stable way,
>>>>
>>>> Here's what I'm imaging Eric is thinking:
>>>>
>>>> ns.inum is currently exposed to userspace via procfs.  In principle,
>>>> the value could be local to a namespace, though, which would enable
>>>> CRIU to be able to preserve namespace inode numbers across a
>>>> checkpoint+restore operation.  If this happened, the contained and
>>>> restored procfs would see a different inode number than the outermost
>>>> procfs.
>>>
>>> sure. there are many different ways for the program to see inode
>>> that either was already reused or disappeared.
>>> What I'm saying that it is expected. We cannot prevent that from
>>> bpf side. Just like ifindex value read by the program can be bogus
>>> as in the example I just provided.
>>
>> The point is that we can make the inode number stable across migration
>> and the user space API for namespaces has been designed with that
>> possibility in mind.
>
> How does it help if BPF starts exposing both inode number and device
> number?

Adding the device number comparison helps in that it is explicit what is
being compared against.  That gives me at least a bit of a namespace
for the namespaces, and a program from a sufficiently wrong context will
have it's comparisons fail rather than having a match.

I think the operation that is exported in the BPF should be a full
comparison operation of device and inode number so that it could be
optimized/compiled to something else depending upon the context.

AKA the compilation of the bpf program would have the opportunity to
remove the namespace dependency and make the program work in a global
context.  So we don't have to carry namespace information around at run
time.

> ISTM any ability to migrate namespaces and to migrate eBPF programs
> that know about namespaces needs to have the eBPF program firmly
> rooted in some namespace (or perhaps cgroup in this case) so that it
> can see a namespaced view of the world.  For this to work, presumably
> we need to make sure that eBPF programs that are installed by programs
> that are in a container don't see traffic that isn't in that
> container.  This is part of why I think that we should consider
> preventing programs that aren't in the root namespace (perhaps *all*
> the root namespaces) from installing bpf+cgroup programs in the first
> place until there's a clearer understanding of how this all fits
> together.

Andy I agree.  At least to the point those programs are
reading attributes that are in a namespace.  Something that should be
straight forward to verify in the bpf checker when installing the
program.

Eric