netdev - Re: [PATCH v2] bpf: Restrict cgroup bpf hooks to the init netns

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXzh-7BJ4Q_oG0hBkTdK+XgDWUYe0=gLwkqG6sncskv3g@mail.gmail.com>
Date:   Tue, 24 Jan 2017 13:24:54 -0800
From:   Andy Lutomirski <luto@...capital.net>
To:     David Ahern <dsa@...ulusnetworks.com>
Cc:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Tejun Heo <tj@...nel.org>, Andy Lutomirski <luto@...nel.org>,
        Network Development <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH v2] bpf: Restrict cgroup bpf hooks to the init netns

On Tue, Jan 24, 2017 at 12:29 PM, David Ahern <dsa@...ulusnetworks.com> wrote:
>
> Users do not run around exec'ing commands in random network contexts (namespace, vrf, device, whatever) and expect them to just work.

I worked on some code once (deployed in production, even!) that calls
unshare() to make a netns and creates an interface and runs code in
there and expects it to just work.  It wouldn't work the outer program
were run under current ip vrf.

>
>>
>> Maybe you can argue that this is a missing feature in cgroup+bpf (no
>> API to query which netns is in use) and a corresponding bug in 'ip
>> vrf', but I see this as evidence that cgroup+bpf as it exists in 4.10
>> is not carefully enough throught through.  The only non-example user
>> of it that I can find (ip vrf) is buggy and can't really be fixed
>> using mechanisms that exist in 4.10-rc.
>
> The argument is that cgroups and namespaces are completely disjoint infrastructure and that users need to know what they are doing.

But perhaps they should be less disjoint.  As far as I know,
cgroup+bpf is the *only* network configuration that is being set up to
run across all network namespaces.  No one has said why this is okay,
let alone why it's preferable to making it work per-netns just like
basically all other Linux network configuration.

>
>>
>>>
>>>> things up so that unshare will malfunction.  It should avoid
>>>> malfunctioning when running Linux programs that are unaware of it.
>>>
>>> I agree that for VRF use case it will help to make programs netns
>>> aware by adding new bpf_get_current_netns_id() or something helper,
>>> but it's up to the program to function properly or be broken.
>>
>> This will cause David's code to run slower, and I think he wants very
>> high performance.
>
> This is a socket create path not a packet path. While overhead should be contained, a few extra cycles should be fine.
>
> Adding the capability to allow users to check the netns id would offer a solution to the namespace problem, but there is nothing that *requires* a bpf program to do it.
>
> Who's to say an admin does not *want* all processes in a group to have sockets bound to a non-existent device? 'ip vrf' restricts the device index to a VRF device because as a management tool I want it to be user friendly, but generically the BPF code does not have any restrictions. ifindex can be any u32 value.

I was hoping for an actual likely use case for the bpf hooks to be run
in all namespaces.  You're arguing that iproute2 can be made to work
mostly okay if bpf hooks can run in all namespaces, but the use case
of intentionally making sk_bound_dev_if invalid across all namespaces
seems dubious.

But all of what you're suggesting doing would still work fine and
would result in less kernel code *and* less eBPF code if the hooks
were per netns.

--Andy