[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXgBrWmWFghWxtV92RDvqYBHsT-L37ujvexyA6v+3M8sQ@mail.gmail.com>
Date: Sat, 4 Feb 2017 19:54:20 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Alexei Starovoitov <ast@...com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"David S . Miller" <davem@...emloft.net>,
Daniel Borkmann <daniel@...earbox.net>,
David Ahern <dsa@...ulusnetworks.com>,
Tejun Heo <tj@...nel.org>, Thomas Graf <tgraf@...g.ch>,
Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net] bpf: expose netns inode to bpf programs
On Sat, Feb 4, 2017 at 7:48 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Sat, Feb 04, 2017 at 07:27:01PM -0800, Andy Lutomirski wrote:
>> On Sat, Feb 4, 2017 at 7:10 PM, Alexei Starovoitov
>> <alexei.starovoitov@...il.com> wrote:
>> > On Sat, Feb 04, 2017 at 09:07:19AM -0800, Andy Lutomirski wrote:
>> >> >> can see a namespaced view of the world. For this to work, presumably
>> >> >> we need to make sure that eBPF programs that are installed by programs
>> >> >> that are in a container don't see traffic that isn't in that
>> >> >> container.
>> >> >
>> >> > such approach will break existing users.
>> >>
>> >> What existing users? The whole feature has never been in a released kernel.
>> >
>> > the v3 patch commit log explains the use cases that work across netns.
>> > And they would break if netns boundary is suddenly started to be
>> > enforced for networking program types.
>>
>> This is why these issues should be addressed carefully before this
>> feature is stabilized in 4.10, and I'm increasingly unconvinced that
>> this will happen. There may be a single week left, and the cgroup
>> override issue is still entirely unaddressed, for example.
>
> imo cgroup override issue is not an issue and the api can be extended
> at any time. I offered to change the default to disable override,
> but I need override to be there, since it's the fastest and most
> flexible way. So the sooner we can land bpf_sk_netns_id() patch
> the faster I can post override disable.
Override is barely the fastest way. Long ago in one of these threads,
I showed how the full run-all-the-in-scope-programs behavior could be
implemented with a single correctly-predicted branch worth of overhead
compared to the current behavior. The run-them-all behavior also
seems like the least surprising, most secure, and most generally
useful behavior.
I've repeatedly asked how you plan to make a "don't override" flag
have sensible semantics when someone tries to add a new flag or change
the behavior to "don't override but, rather then rejecting programs
down the hierarchy, just run them all". You haven't answered that
question.
Given that we're doing API design and it's waaaaay past the merge
window, I just don't see how any of this is ready for 4.10.
>
>> Also, the v3 commit log's example seems a bit weak, since seccomp can
>> handle that case (block non-init-netns raw sockets) and it's not clear
>> to me that a filter like that serves a purpose regardless besides
>> kernel attack surface reduction.
>
> seccomp doesn't even support extended bpf yet and that makes it
> quite inflexible.
You can trivially write a seccomp filter that blocks raw sockets.
>
>> > If you're suggesting to limit root netns for cgroup_sock program type only
>> > then David explained weeks ago why it's not acceptable for vrf use
>> > case which was the main reason to add that program type in the first place.
>> > Also it would make one specific bpf program type to be different
>> > from all others which will be confusing.
>> >
>>
>> I'm suggesting limiting it to the init netns for *all* cases.
>> (Specifically, limiting installing programs to the init netns. But
>> perhaps limiting running of programs to the netns in which they're
>> installed would be the best solution after all.) If there isn't a
>> well-understood solution that plays well with netns and that satisfies
>> ip vrf, the it's worth seriously thinking about whether ip vrf is
>> really ready for 4.10.
>
> The proposed v3 is imo solves the ip vrf 'malfunction' that you found.
> v1 was also solving it and iproute2 patch was ready, but as Eric pointed
> out just ns.inum is not enough.
> The 'malfunction' can also be addressed _without_ kernel changes,
> but the kernel can make it easier, hence v3 patch.
I don't know whether Eric will be okay with your v3. I pointed out a
possible problem.
>
> Andy, can you please reply in one thread? In particular in v3 patch?
> I'm not sure why you making more or less the same points in
> different threads. I'm sure it's hard for everyone to follow.
>
I can try. Can you try the same thing so I can reply once?
Powered by blists - more mailing lists