linux-kernel - Re: [PATCH v10 net-next 1/3] filter: add Extended BPF interpreter and converter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMEtUuxd0yd81p7KPxyotMOcOxm3MS4Cj9oguS5QAzfXYChLXQ@mail.gmail.com>
Date:	Fri, 14 Mar 2014 12:51:17 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	Pablo Neira Ayuso <pablo@...filter.org>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Daniel Borkmann <dborkman@...hat.com>,
	Ingo Molnar <mingo@...nel.org>, Will Drewry <wad@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"H. Peter Anvin" <hpa@...or.com>,
	Hagen Paul Pfeifer <hagen@...u.net>,
	Jesse Gross <jesse@...ira.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Eric Dumazet <edumazet@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Pekka Enberg <penberg@....fi>,
	Arjan van de Ven <arjan@...radead.org>,
	Christoph Hellwig <hch@...radead.org>,
	Pavel Emelyanov <xemul@...allels.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH v10 net-next 1/3] filter: add Extended BPF interpreter and converter

On Fri, Mar 14, 2014 at 8:37 AM, Alexei Starovoitov <ast@...mgrid.com> wrote:
> On Fri, Mar 14, 2014 at 5:58 AM, Pablo Neira Ayuso <pablo@...filter.org> wrote:
>> On Wed, Mar 12, 2014 at 02:43:32PM -0700, Alexei Starovoitov wrote:
>>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>>> index e568c8ef896b..6e6aab5e062b 100644
>>> --- a/include/linux/filter.h
>>> +++ b/include/linux/filter.h
>>> @@ -25,20 +25,45 @@ struct sock;
>>>  struct sk_filter
>>>  {
>>>       atomic_t                refcnt;
>>> -     unsigned int            len;    /* Number of filter blocks */
>>> +     /* len - number of insns in sock_filter program
>>> +      * len_ext - number of insns in socket_filter_ext program
>>> +      * jited - true if either original or extended program was JITed
>>> +      * orig_prog - original sock_filter program if not NULL
>>> +      */
>>> +     unsigned int            len;
>>> +     unsigned int            len_ext;
>>> +     unsigned int            jited:1;
>>
>> This is consuming 4 bytes just to store the jited bit. I think you can
>> scratch that bit from len, given the maximum filter length for bpf. I
>> think the the jited bit change that David suggested have to come in
>> first place as a separated patch in the series.
>
> It was reviewed so many times that I would prefer not to break it
> apart just to split it for single 'jited' bitfield, though I agree with taking
> one bit from len.
> I actually proposed it in 'bool vs bitfield' thread few days ago.
> I think it can be done as a separate commit after this one goes in.
>
>>> +     struct sock_filter      *orig_prog;
>>
>> If your new extended filtering is not used, this consumes 8 extra
>> bytes + len_ext (bytes) in x86_64. I think a more generic way to make
>> this is that you can move the original bpf filter and its length at
>> the bottom of this structure after insns to store something like:
>>
>> struct sk_bpf_compat {
>>         struct sock_filter      *prog;
>>         unsigned int            len;
>> };
>>
>> This would be only allocated when you filtering approach is used. For
>> that you'll need some enum in sk_filter to indicate the filtering
>> approach, but we'll save 8 bytes per filter in the end with regards to
>> this current patch.
>
> this is also can be done as separate commit after this one.
> Though I don't like the idea, because access to 'prog' and 'len'
> becomes very complicated. In every place we need a helper
> function to calculate an offset to this 'sk_bpf_compat',
> then typecast that memory location, etc.
> Imo single pointer is much cleaner.
>
> Thanks
> Alexei

Hi David,

can you please explain why the status of these
patches is 'deferred' in patchwork ?
Is it because of bpf vs nft thread?
I think that's orthogonal.
First of all, ebpf and nft are not really comparable.
ebpf is a low level assembler
whereas nft is a high level state machine.
As I was saying nft can be accelerated by ebpf.
Even without accelerated nft, ebpf makes faster seccomp,
tracing filters, ovs, etc.
nft state machine is not applicable for such tasks.
It feels odd even trying to compare them.
They're serving different purpose.
general purpose assembler vs packet classifier? Just different.
I'm not sure what is the concern here.

Thanks
Alexei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/