lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <20221017203620.GA18251@fastly.com> Date: Mon, 17 Oct 2022 13:36:21 -0700 From: Joe Damato <jdamato@...tly.com> To: Jacob Keller <jacob.e.keller@...el.com> Cc: Jakub Kicinski <kuba@...nel.org>, intel-wired-lan@...ts.osuosl.org, netdev@...r.kernel.org, davem@...emloft.net, anthony.l.nguyen@...el.com, jesse.brandeburg@...el.com Subject: Re: [net-queue bugfix RFC] i40e: Clear IFF_RXFH_CONFIGURED when RSS is reset On Mon, Oct 17, 2022 at 01:25:39PM -0700, Jacob Keller wrote: > > > On 10/17/2022 12:45 PM, Jakub Kicinski wrote: > > On Thu, 13 Oct 2022 15:54:31 -0700 Joe Damato wrote: > >> Before this change, reconfiguring the queue count using ethtool doesn't > >> always work, even for queue counts that were previously accepted because > >> the IFF_RXFH_CONFIGURED bit was not cleared when the flow indirection hash > >> is cleared by the driver. > > > > It's not cleared but when was it set? Could you describe the flow that > > gets us to this set a bit more? > > > > Normally clearing the IFF_RXFH_CONFIGURED in the driver is _only_ > > acceptable on error recovery paths, and should come with a "this should > > never happen" warning. > > > > Correct. The whole point of IFF_RXFH_CONFIGURED is to be able for the > driver to know whether or not the current config was the default or a > user specified value. If this flag is set, we should not be changing the > config except in exceptional circumstances. > > >> For example: > >> > >> $ sudo ethtool -x eth0 > >> RX flow hash indirection table for eth0 with 34 RX ring(s): > >> 0: 0 1 2 3 4 5 6 7 > >> 8: 8 9 10 11 12 13 14 15 > >> 16: 16 17 18 19 20 21 22 23 > >> 24: 24 25 26 27 28 29 30 31 > >> 32: 32 33 0 1 2 3 4 5 > >> [...snip...] > >> > >> As you can see, the flow indirection hash distributes flows to 34 queues. > >> > >> Increasing the number of queues from 34 to 64 works, and the flow > >> indirection hash is reset automatically: > >> > >> $ sudo ethtool -L eth0 combined 64 > >> $ sudo ethtool -x eth0 > >> RX flow hash indirection table for eth0 with 64 RX ring(s): > >> 0: 0 1 2 3 4 5 6 7 > >> 8: 8 9 10 11 12 13 14 15 > >> 16: 16 17 18 19 20 21 22 23 > >> 24: 24 25 26 27 28 29 30 31 > >> 32: 32 33 34 35 36 37 38 39 > >> 40: 40 41 42 43 44 45 46 47 > >> 48: 48 49 50 51 52 53 54 55 > >> 56: 56 57 58 59 60 61 62 63 > > > > This is odd, if IFF_RXFH_CONFIGURED is set driver should not > > re-initialize the indirection table. Which I believe is what > > you describe at the end of your message: > > > > Right. It seems like the driver should actually be checking this flag > somewhere else and preventing the flow where we clear the indirection > table... > > We are at least in some places according to your report here, but > perhaps there is a gap.... Thanks for the comments / information. I noticed that one other driver (mlx5) tweaks this bit, which is what led me down this rabbit hole. I'll have to re-read the i40e code and re-run some experiments with the queue count and flow hash to get a better understanding of the current behavior and verify/double check the results. I'll follow-up with an email to intel-wired-lan about the current (unpatched) behavior I'm seeing with i40e to double check if there's a bug or if I've simply made a mistake somewhere in my testing. I did run the experiments a few times, so it is possible I got into some weird state. It is worth revisiting fresh from a reboot with a kernel built from net-next.
Powered by blists - more mailing lists