linux-kernel - Re: [PATCH v6 net-next 1/3] filter: add Extended BPF interpreter and converter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAMEtUuyr__joDR9YYHuaqWd2fJL5-1ct5pEaQ2NRaq+G-BZK+w@mail.gmail.com>
Date:	Mon, 10 Mar 2014 13:02:10 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	David Miller <davem@...emloft.net>
Cc:	Daniel Borkmann <dborkman@...hat.com>,
	Ingo Molnar <mingo@...nel.org>, Will Drewry <wad@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"H. Peter Anvin" <hpa@...or.com>,
	Hagen Paul Pfeifer <hagen@...u.net>,
	Jesse Gross <jesse@...ira.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Tom Zanussi <tom.zanussi@...ux.intel.com>,
	Jovi Zhangwei <jovi.zhangwei@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Pekka Enberg <penberg@....fi>,
	Arjan van de Ven <arjan@...radead.org>,
	Christoph Hellwig <hch@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH v6 net-next 1/3] filter: add Extended BPF interpreter and converter

On Mon, Mar 10, 2014 at 12:22 PM, David Miller <davem@...emloft.net> wrote:
> From: Alexei Starovoitov <ast@...mgrid.com>
> Date: Fri, 7 Mar 2014 14:19:39 -0800
>
>> On Fri, Mar 7, 2014 at 12:38 PM, David Miller <davem@...emloft.net> wrote:
>> 2.
>> Another alternative is to do
>> struct sk_filter {
>>   ..
>> union {
>>     unsigned int (*bpf_func)(const struct sk_buff *skb,const struct
>> sock_filter *filter);
>>     unsigned int (*bpf_func_ext)(void*, const struct sock_filter_ext *filter);
>>     unsigned int (*bpf_func_generic)(void *, void*);
>> };
>>
>> assignments into sk_filter will use correct field like:
>> fp->bpf_func_ext = sk_run_filter_ext;
>> and the hot call macro will look like:
>> #define SK_RUN_FILTER(F, CTX) (*F->bpf_func_generic)(CTX, F->insns)
>>
>> I think current typecasts are equally ugly, but 'union' style
>> will be cleaner from gcc point of view.
>
> The main point is that we should only bypass the type protection provided
> by the C language as an absolute last resort.
>
> But in some cases, we can consider making an exception.
>
> The problem with letting sk_check_filter() do the verification is that the
> type is also determined by the call site.  That requires some kind of type
> based or run time verification.

agree. To solve the call site issue, we can do 'type based' verification:
I can make sk_run_filter_ext(void *, ...) to be private in filter.c
and provide two globally visible:
sk_run_filter_ext_seccomp(struct seccomp_data*,...
sk_run_filter_ext_skb(struct sk_buff*,...

and inside them just call the private function.
Compiler will do a type verification at the call site for us.
The only problem with this approach is that tracing+ebpf, ovs+ebpf, nft+ebpf
would need their own call functions.

I feel we're overprotecting ourselves here.
Please consider making an exception for this case.

> Although it doesn't use C typing, one thing you could do is encoded the
> pointer into an unsigned long.  Then you encode the context in the lowest
> bits of that value, masking them out when derferencing at run time.
>
> So that way we can make sure seccomp _only_ runs seccomd filters.
>
> And sockets only run socket filters with SKBs in the second argument.
>
>> btw, ebpf jit coming in the next diff (it was also previewed in V1 series).
>>
>> In the 1/3 commit log of this patch I explain the current relation
>> between bpf_ext_enable and bpf_jit_enable flags.
>>
>> In the next patch with ebpf jit, single bpf_jit_enable flag will act for both:
>>   if (bpf_ext_enable) {
>>         convert to new
>>         sk_chk_filter() - check old bpf
>>         if (bpf_jit_enable)
>>              use new jit
>>         else
>>              use new interpreter
>>      } else {
>>         sk_chk_filter() - check old bpf
>>         if (bpf_jit_enable)
>>             use old jit
>>         else
>>             use old interpreter
>>      }
>>
>> I believe Daniel oked this approach, but if you object, I can do it differently.
>> Are you saying 'bpf_jit_enable' flag should mean: do old jit no matter what?
>> Then we would need another flag 'bpf_ext_jit_enable' as well?
>> Seems overkill to me.
>
> It is not OK to create a temporary regression in between patch sets.
>
> You can make JIT higher priority than EBPF in this series, then once every
> single JIT is converted to EBPF, you can just adjust things accordingly.

ok. will change priority.

Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/