netdev - Re: [PATCH v2] bpf: Restrict cgroup bpf hooks to the init netns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <242993d5-e27c-7177-a021-610d16b150e4@cumulusnetworks.com>
Date:   Tue, 24 Jan 2017 13:29:07 -0700
From:   David Ahern <dsa@...ulusnetworks.com>
To:     Andy Lutomirski <luto@...capital.net>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Tejun Heo <tj@...nel.org>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH v2] bpf: Restrict cgroup bpf hooks to the init netns

On 1/24/17 11:54 AM, Andy Lutomirski wrote:
>>> My point is that, in 4.10-rc, it doesn't work right, and I doubt this
>>> problem is restricted to just 'ip vrf'.  Without some kind of change
>>> to the way that netns and cgroup+bpf interact, anything that uses
>>> sk_bound_dev_if or reads the ifindex on an skb will be subject to a
>>> huge footgun that unprivileged programs can trigger and any future
>>> attempt to make the cgroup+bpf work for unprivileged users is going to
>>> be more complicated than it deserves to be.
>>
>> For n-th time, the current BPF_PROG_TYPE_CGROUP* is root only and
>> speculation about unprivileged usage are not helping the discussion.
> 
> ...which has nothing to do with my example of how it's broken.  I used
> 'ip vrf' the way it was intended to be used and then I ran code in it
> that uses only APIs that predate eBPF.  The result was that the
> program obtained a broken socket that had sk_bound_dev_if filled in
> with a nonsense index.  There's no speculation -- it's just broken.

Again, there are many ways to arrive at this point where a socket is bound to a device index that is invalid. Nothing stops you from downloading a bpf program with an invalid ifindex. That is up to the user.

Users do not run around exec'ing commands in random network contexts (namespace, vrf, device, whatever) and expect them to just work.

> 
> Maybe you can argue that this is a missing feature in cgroup+bpf (no
> API to query which netns is in use) and a corresponding bug in 'ip
> vrf', but I see this as evidence that cgroup+bpf as it exists in 4.10
> is not carefully enough throught through.  The only non-example user
> of it that I can find (ip vrf) is buggy and can't really be fixed
> using mechanisms that exist in 4.10-rc.

The argument is that cgroups and namespaces are completely disjoint infrastructure and that users need to know what they are doing. If one uses a management tool (ip in this case) in a consistent manner it will work. If 'ip netns' is used to change namespaces, vrf is reset on the switch. And, I have a patch now to properly nest namespaces and vrfs so that mgmt vrf in multiple namespaces will work properly.

> 
>>
>>> things up so that unshare will malfunction.  It should avoid
>>> malfunctioning when running Linux programs that are unaware of it.
>>
>> I agree that for VRF use case it will help to make programs netns
>> aware by adding new bpf_get_current_netns_id() or something helper,
>> but it's up to the program to function properly or be broken.
> 
> This will cause David's code to run slower, and I think he wants very
> high performance.

This is a socket create path not a packet path. While overhead should be contained, a few extra cycles should be fine.

Adding the capability to allow users to check the netns id would offer a solution to the namespace problem, but there is nothing that *requires* a bpf program to do it.

Who's to say an admin does not *want* all processes in a group to have sockets bound to a non-existent device? 'ip vrf' restricts the device index to a VRF device because as a management tool I want it to be user friendly, but generically the BPF code does not have any restrictions. ifindex can be any u32 value.