[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250820181536.02e50df6@elisabeth>
Date: Wed, 20 Aug 2025 18:15:36 +0200
From: Stefano Brivio <sbrivio@...hat.com>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Florian Westphal <fw@...len.de>, netdev@...r.kernel.org, Paolo Abeni
<pabeni@...hat.com>, "David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
netfilter-devel@...r.kernel.org, pablo@...filter.org
Subject: Re: [PATCH net-next 5/6] netfilter: nft_set_pipapo: Store real
pointer, adjust later.
On Wed, 20 Aug 2025 18:01:14 +0200
Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> On 2025-08-20 17:44:01 [+0200], Stefano Brivio wrote:
> > On Wed, 20 Aug 2025 16:47:37 +0200
> > Florian Westphal <fw@...len.de> wrote:
> >
> > > From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> > >
> > > The struct nft_pipapo_scratch is allocated, then aligned to the required
> > > alignment and difference (in bytes) is then saved in align_off. The
> > > aligned pointer is used later.
> > > While this works, it gets complicated with all the extra checks if
> > > all member before map are larger than the required alignment.
> > >
> > > Instead of saving the aligned pointer, just save the returned pointer
> > > and align the map pointer in nft_pipapo_lookup() before using it. The
> > > alignment later on shouldn't be that expensive.
> >
> > The cost of doing the alignment later was the very reason why I added
> > this whole dance in the first place though. Did you check packet
> > matching rates before and after this?
>
> how? There was something under selftest which I used to ensure it still
> works.
tools/testing/selftests/net/netfilter/nft_concat_range.sh, you should add
"performance" to $TESTS (or just do TESTS=perfomance), they are normally
skipped because they take a while.
> On x86 it should be two additional opcodes (and + lea) and that might be
> interleaved.
I think so too, but I wonder if that has a much bigger effect on
subsequent cache loads rather than just those two instructions.
> Do you remember a rule of thumb of your improvement?
I added this right away with the initial implementation of the
vectorised version, so I didn't really check the difference or record
it anywhere, but I vaguely remember having something similar to the
version with your current change in an earlier draft and it was
something like 20 cycles difference with the 'net,port' test with 1000
entries... maybe, I'm really not sure anymore.
I'm especially not sure if my old draft was equivalent to this change.
I reported the original figures (with the alignment done in advance) in
the commit message of 7400b063969b ("nft_set_pipapo: Introduce
AVX2-based lookup implementation").
> As far as I remember the alignment code expects that the "hole" at the
> begin does not exceed a certain size and the lock there exceeds it.
I think you're right. But again, the alignment itself should be fast,
that's not what I'm concerned about.
--
Stefano
Powered by blists - more mailing lists