netdev - Re: [PATCH net] bpf: expose netns inode to bpf programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 26 Jan 2017 08:37:30 -0800
From:   Andy Lutomirski <luto@...capital.net>
To:     Alexei Starovoitov <ast@...com>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        "David S . Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Ahern <dsa@...ulusnetworks.com>,
        Tejun Heo <tj@...nel.org>, Thomas Graf <tgraf@...g.ch>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net] bpf: expose netns inode to bpf programs

Hi Linus-

Can you weigh in here before we get stuck in a potentially unfortunate place?

On Wed, Jan 25, 2017 at 10:23 PM, Alexei Starovoitov <ast@...com> wrote:
> On 1/25/17 9:46 PM, Eric W. Biederman wrote:
>>
>> Alexei Starovoitov <ast@...com> writes:
>>

[...]

>>> Similarly TC cls_bpf/act_bpf and socket filters can do
>>> if (skb->netns_inum == expected_inode)
>>
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@...ssion.com>
>
>
> I very much value your opinion, but your ack or nack doesn't apply here.

I'm confused.  Eric, the namespace maintainer, says nack.  This is
namespace code regardless of what file it lives in.

>> The only sane thing is to interpret whatever your bpf program in the
>> context of the program which installs it.
>
>
> that's impossible. The programs are operating in the context that
> is disjoined from the app that installs it.
>
>> If you can't do that you have a very broken piece of userspace
>> interface.  Which appears to be the case here.
>
>
> Call it rubbish, but this is how it is.
> cls_bpf is operating on packets. xdp_bpf is operating on raw dma buffers
> and there we might need eventually lookup sockets and net namespaces.
> Think of bpf programs as safe kernel modules. They don't have
> confined boundaries and program authors, if not careful, can shoot
> themselves in the foot. We're not trying to prevent that because
> it's impossible to check that the program is sane. Just like
> it's impossible to check that kernel module is sane.
> But in case of bpf we check that bpf program is _safe_ from the kernel
> point of view. If it's doing some garbage, it's program's business.
> Does it make more sense now?
>

With all due respect, I think this is not an acceptable way to think
about BPF at all.  If you think of BPF this way, I think there needs
to be a real discussion at KS or similar as to whether this is okay.
The reason is simple: the kernel promises a stable ABI to userspace
but not to kernel modules.  By thinking of BPF as more like a module,
you're taking a big shortcut that will either result in ABI breakage
down the road or in committing to a problematic stable ABI.

In fact, this was discussed a bit at KS in the context of tracepoints.
The general consensus seemed to be that tracepoints should not be
considered fully stable in large part because they're a debugging
feature.  But these BPF hooks are very much not a debugging feature.

Concretely, iproute2 git contains an eBPF program.  If that program
were just a "safe kernel module", then it would be totally fine
(expected, even) for future kernel versions to break it outright.  I
don't think this is how it works.

--Andy