netdev - Re: [PATCH net] bpf: expose netns inode to bpf programs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUvnfRuN8=X9d7Hi5mF95n7XS6_muh7GTuKiLO_RtGbxw@mail.gmail.com>
Date:   Sat, 4 Feb 2017 21:05:29 -0800
From:   Andy Lutomirski <luto@...capital.net>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Alexei Starovoitov <ast@...com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "David S . Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Ahern <dsa@...ulusnetworks.com>,
        Tejun Heo <tj@...nel.org>, Thomas Graf <tgraf@...g.ch>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH net] bpf: expose netns inode to bpf programs

On Sat, Feb 4, 2017 at 8:37 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Sat, Feb 04, 2017 at 07:54:20PM -0800, Andy Lutomirski wrote:
>>
>> I've repeatedly asked how you plan to make a "don't override" flag
>> have sensible semantics when someone tries to add a new flag or change
>> the behavior to "don't override but, rather then rejecting programs
>> down the hierarchy, just run them all".  You haven't answered that
>> question.
>
> I explained already that I need to do combining on the bpf side instead
> of running the list, since running several programs where 90% of
> the logic is the same is the overhead that is not acceptable
> for production server. It may be fine for desktop, but not
> when every cycle matters. With one program per cgroup I can
> combine multiple of them on bpf side. In networking the most of
> the prep work like packet parsing and lookups are common,
> but the actions are different, so for one part of the hieararchy
> I can have program A monitoring tcp and in other
> part I can have program B monitoring tcp and udp.
> What you're saying that for tcp and udp monitoring
> run two programs. One for udp and one for tcp.
> That is not efficient. Hence the need to combine
> the programs on bpf side and attach only one with override.

I'm not saying that at all.  I'm saying that this use case sounds
valid, but maybe it could be solved differently.  Here are some ideas:

 - Expose the actual cgroup (relative to the hooked cgroup) to the BPF
program.  Then you could parse the headers, check what cgroup you're
in, and proceed accordingly.  This could potentially be even faster
for your use case if done carefully because it will stress the
instruction cache less.

 - Have a (non-default) flag that says "overridable".  The effect is
that, if /A and /A/B both have overridable programs attached, then
sockets in /A/B don't run /A's program.  If, however, /A has a
non-overridable program and /A/B has an overridable program, then
sockets in /A/B run both programs.  IOW overridable programs override
other overridable programs, but non-overridable programs never
override anything and are never overridden by anything.

>
> The "dont override flag" was also explained before. Here it is again:
> Without breaking above "override" scenario we can add a flag
> that when the program is attached with that flag to one part of
> the hierarchy the override of it will not be allowed in the descendent.
> This extension can be done at any time in the future.
> The only question is what is the default when such flag
> is not present. The current default is override allowed.
> You insist on changing the default right now.
> I don't mind, therefore I'm working on such "dont override flag",
> since if we need to change the default it needs to be done now,
> but if it doesn't happen for 4.10, it's absolutely fine too.
> For security use cases the flag will be added later
> and sandboxing use cases will use that flag too.
> There are no expections that if cgroup is there the program
> attach command must always succeed. That's why we have error codes
> and we can dissallow attach based on this flag or any future
> restrictions. All of it is root now anyway and sandboxing/security
> use cases need to wait until bpf side can be made unprivileged.
> I see current api to have a ton of room for future extensions.
>

Suppose someone wants to make CGROUP_BPF work right when a container
and a container manager both use it.  It'll have to run both programs.
What would the semantics be if this were to be added?  This is really
a question, not an indictment of your approach.  For all I know,
you're proposing exactly what I suggested above.

And sandboxing needn't, and won't, wait until unprivileged bpf
programs are added.  Plenty of sandboxes run as root.