lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Aug 2020 12:37:01 +0200
From:   Lukas Wunner <lukas@...ner.de>
To:     Daniel Borkmann <daniel@...earbox.net>
Cc:     Laura Garcia <nevola@...il.com>,
        Pablo Neira Ayuso <pablo@...filter.org>,
        Jozsef Kadlecsik <kadlec@...filter.org>,
        Florian Westphal <fw@...len.de>,
        Netfilter Development Mailing list 
        <netfilter-devel@...r.kernel.org>, coreteam@...filter.org,
        netdev@...r.kernel.org, Martin Mares <mj@....cz>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Thomas Graf <tgraf@...g.ch>,
        Alexei Starovoitov <ast@...nel.org>,
        David Miller <davem@...emloft.net>
Subject: Re: [PATCH nf-next 3/3] netfilter: Introduce egress hook

On Tue, Apr 28, 2020 at 10:11:19PM +0200, Daniel Borkmann wrote:
> > https://lore.kernel.org/netdev/bbdee6355234e730ef686f9321bd072bcf4bb232.1584523237.git.daniel@iogearbox.net/
> > 
> > In the ensuing discussion it turned out that the performance argument
> > may be addressed by a rearrangement of sch_handle_egress() and
> > nf_egress() invocations.  I could look into amending the series
> > accordingly and resubmitting, though I'm currently swamped with
> > other work.
> 
> The rework of these hooks is still on my todo list, too swamped with
> other stuff as well right now, but I'll see to have a prototype this
> net-next development cycle.

Daniel, I have it running now the way you want it (I think) and am
benchmarking several variants.  I'm now faced with the following
choice:

* One variant speeds up the default case with neither tc nor nft in use
  (2241 -> 2285 Mb/sec), but tc becomes a little slower (1929 -> 1927
  Mb/sec).

* Another variant still speeds up the default case but not by as much
  (2241 -> 2274 Mb/sec) and speeds up tc as well (1929 -> 1931 Mb/sec).

The difference is that the first variant uses an outer static_key which
patches in a function containing two inner static_keys for nft + tc.
The second variant uses an outer static_key and a single inner static_key
for nft (but no static_key for tc).

nft is slower in the second variant (2231 -> 2225 Mb/sec).

I'm leaning towards the first variant, but because it incurs a small
performance degradation if tc is used, I wanted to give you a heads-up.
If this is totally unacceptable for you, I'll post the second variant
instead.

I need a few more days to update the commit messages, perform further
testing and apply polish, so I plan to dump the patches to the list
next week.  Just thought I'd ask for your opinion, I'm aware this is
difficult to judge without seeing the code.

I'm using static_keys instead of fmodify_return (which you've suggested)
because I would like to avoid a dependency on BPF as it might not be an
option for space-constrained embedded machines and a BPF JIT isn't
available on as many arches as CONFIG_JUMP_LABEL.

Thanks,

Lukas

Powered by blists - more mailing lists