[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWvDZp0TFrw88i9gab5L6OhBwrmjYL-yMDwtMHz51HZ+A@mail.gmail.com>
Date: Sat, 4 Feb 2017 20:17:57 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...com>,
"David S . Miller" <davem@...emloft.net>,
Daniel Borkmann <daniel@...earbox.net>,
David Ahern <dsa@...ulusnetworks.com>,
Tejun Heo <tj@...nel.org>,
"Eric W . Biederman" <ebiederm@...ssion.com>,
Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH v2 net] bpf: add bpf_sk_netns_id() helper
On Sat, Feb 4, 2017 at 8:05 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Sat, Feb 04, 2017 at 07:33:14PM -0800, Andy Lutomirski wrote:
>> On Sat, Feb 4, 2017 at 7:25 PM, Alexei Starovoitov
>> <alexei.starovoitov@...il.com> wrote:
>> > On Sat, Feb 04, 2017 at 09:15:10AM -0800, Andy Lutomirski wrote:
>> >> On Fri, Feb 3, 2017 at 5:22 PM, Alexei Starovoitov <ast@...com> wrote:
>> >> > Note that all bpf programs types are global.
>> >>
>> >> I don't think this has a clear enough meaning to work with. In
>> >
>> > Please clarify what you mean. The quoted part says
>> > "bpf programs are global". What is not "clear enough" there?
>>
>> What does "bpf programs are global" mean? I am genuinely unable to
>> figure out what you mean. Here are some example guesses of what you
>> might mean:
>>
>> - BPF programs are compiled independently of a namespace. This is
>> certainly true, but I don't think it matters.
>>
>> - You want BPF programs to affect everything on the system. But this
>> doesn't seem right to be -- they only affect things in the relevant
>> cgroup, so they're not global in that sense.
>
> All bpf program types are global in the sense that you can
> make all of them to operate across all possible scopes and namespaces.
I still don't understand what you mean here. A seccomp program runs
in the process that installs it and children -- it does not run in
"all possible scopes". A socket filter runs on a single socket and
therefore runs in a single netns. So presumably I'm still
misunderstanding you
> cgroup only gives a scope for the program to run, but it's
> not limited by it. The user can have the same program
> attached to two or more different cgroups, so one program
> will run across multiple cgroups.
Does this mean "BPF programs are compiled independently of a
namespace?" If so, I don't see why it's relevant at all. Sure, you
could compile a BPF program once and install it in two different
scopes, but that doesn't mean that the kernel should *run* it globally
in any sense. Can you clarify?
>
>> - The set of BPF program types and the verification rules are
>> independent of cgroup and namespace. This is true, but I don't think
>> it matters.
>
> It matters. That's actually the key to understand. The loading part
> verifies correctness for particular program type.
> Afterwards the same program can be attached to any place.
> Including different cgroups and different namespaces.
> The 'attach' part is like 'switch on' that enables program
> on particular hook. The scope (whether it's socket or netdev or cgroup)
> is a scope that program author uses to narrow down the hook,
> but it's not an ultimate restriction.
> For example the socket program can be attached to sockets and
> share information with cls_bpf program attached to netdev.
> The kprobe tracing program can peek into kernel internal data
> and share it with cls_bpf or any other type as long as
> everything is root. The information flow is global to the whole system.
Why does any of this imply that a cgroup+bpf program that is attached
once should run for all network namespaces?
>
>> Because we're one week or so from 4.10 final, the 4.10-rc code is
>> problematic even for ip vrf, and there isn't a clear solution yet.
>> There are a bunch of requirements that seem to conflict, and something
>> has to give.
>
> let's go back to the beginning:
> - you've identified a 'malfunction' in ip vrf. It's valid one. Thank you.
> - can it be fixed without kernel changes ? Yes. David offered to do so.
He has (I think) a somewhat kludgey fix that gets the "ip netns" case
right but not the "unshare -n" case. I think the latter can't be
fixed without kernel changes.
Powered by blists - more mailing lists