[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9a7a44a6-ec0c-e5e9-1c94-ccc0d1755560@redhat.com>
Date: Thu, 16 Feb 2023 16:17:46 +0100
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Alexander Lobakin <alexandr.lobakin@...el.com>,
Paul Menzel <pmenzel@...gen.mpg.de>
Cc: brouer@...hat.com, bpf@...r.kernel.org, xdp-hints@...-project.net,
martin.lau@...nel.org, daniel@...earbox.net,
netdev@...r.kernel.org, ast@...nel.org,
Stanislav Fomichev <sdf@...gle.com>,
yoong.siang.song@...el.com, anthony.l.nguyen@...el.com,
intel-wired-lan@...ts.osuosl.org
Subject: Re: [xdp-hints] Re: [Intel-wired-lan] [PATCH bpf-next V1] igc: enable
and fix RX hash usage by netstack
On 14/02/2023 16.13, Alexander Lobakin wrote:
> From: Paul Menzel <pmenzel@...gen.mpg.de>
> Date: Tue, 14 Feb 2023 16:00:52 +0100
>>
>> Am 10.02.23 um 16:07 schrieb Jesper Dangaard Brouer:
>>> When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372
>>> ("igc: Add transmit and receive fastpath and interrupt handlers"), the
>>> hardware wasn't configured to provide RSS hash, thus it made sense to not
>>> enable net_device NETIF_F_RXHASH feature bit.
>>>
[...]
>>
>>> hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have
>>> the effect that netstack will do a software based hash calc calling into
>>> flow_dissect, but only when code calls skb_get_hash(), which doesn't
>>> necessary happen for local delivery.
>>
>> Excuse my ignorance, but is that bug visible in practice by users
>> (performance?) or is that fix needed for future work?
>
> Hash calculation always happens when RPS or RFS is enabled. So having no
> hash in skb before hitting the netstack slows down their performance.
> Also, no hash in skb passed from the driver results in worse NAPI bucket
> distribution when there are more traffic flows than Rx queues / CPUs.
> + Netfilter needs hashes on some configurations.
>
Thanks Olek for explaining that.
My perf measurements show that the expensive part is that netstack will
call the flow_dissector code, when the hardware RX-hash is missing.
>>
>>> Fixes: 2121c2712f82 ("igc: Add multiple receive queues control
>>> supporting")
>>> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
>
> [...]
>
> Nice to see that you also care about (not) using short types on the stack :)
As can be seen by godbolt.org exploration[0] I have done, the stack
isn't used for storing the values.
[0]
https://github.com/xdp-project/xdp-project/tree/master/areas/hints/godbolt/
I have created three files[2] with C-code that can be compiled via
https://godbolt.org/. The C-code contains a comment with the ASM code
that was generated with -02 with compiler x86-64 gcc 12.2.
The first file[01] corresponds to this patch.
[01]
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/godbolt/igc_godbolt01.c
[G01] https://godbolt.org/z/j79M9aTsn
The second file igc_godbolt02.c [02] have changes in [diff02]
[02]
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/godbolt/igc_godbolt02.c
[G02] https://godbolt.org/z/sErqe4qd5
[diff02] https://github.com/xdp-project/xdp-project/commit/1f3488a932767
The third file igc_godbolt03.c [03] have changes in [diff03]
[03]
https://github.com/xdp-project/xdp-project/blob/master/areas/hints/godbolt/igc_godbolt03.c
[G03] https://godbolt.org/z/5K3vE1Wsv
[diff03] https://github.com/xdp-project/xdp-project/commit/aa9298f68705
Summary, the only thing we can save is replacing some movzx
(zero-extend) with mov instructions.
>>
>> [1]: https://notabs.org/coding/smallIntsBigPenalty.htm
>
> Thanks,
> Olek
>
Powered by blists - more mailing lists