[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20221017203620.GA18251@fastly.com>
Date: Mon, 17 Oct 2022 13:36:21 -0700
From: Joe Damato <jdamato@...tly.com>
To: Jacob Keller <jacob.e.keller@...el.com>
Cc: Jakub Kicinski <kuba@...nel.org>, intel-wired-lan@...ts.osuosl.org,
netdev@...r.kernel.org, davem@...emloft.net,
anthony.l.nguyen@...el.com, jesse.brandeburg@...el.com
Subject: Re: [net-queue bugfix RFC] i40e: Clear IFF_RXFH_CONFIGURED when RSS
is reset
On Mon, Oct 17, 2022 at 01:25:39PM -0700, Jacob Keller wrote:
>
>
> On 10/17/2022 12:45 PM, Jakub Kicinski wrote:
> > On Thu, 13 Oct 2022 15:54:31 -0700 Joe Damato wrote:
> >> Before this change, reconfiguring the queue count using ethtool doesn't
> >> always work, even for queue counts that were previously accepted because
> >> the IFF_RXFH_CONFIGURED bit was not cleared when the flow indirection hash
> >> is cleared by the driver.
> >
> > It's not cleared but when was it set? Could you describe the flow that
> > gets us to this set a bit more?
> >
> > Normally clearing the IFF_RXFH_CONFIGURED in the driver is _only_
> > acceptable on error recovery paths, and should come with a "this should
> > never happen" warning.
> >
>
> Correct. The whole point of IFF_RXFH_CONFIGURED is to be able for the
> driver to know whether or not the current config was the default or a
> user specified value. If this flag is set, we should not be changing the
> config except in exceptional circumstances.
>
> >> For example:
> >>
> >> $ sudo ethtool -x eth0
> >> RX flow hash indirection table for eth0 with 34 RX ring(s):
> >> 0: 0 1 2 3 4 5 6 7
> >> 8: 8 9 10 11 12 13 14 15
> >> 16: 16 17 18 19 20 21 22 23
> >> 24: 24 25 26 27 28 29 30 31
> >> 32: 32 33 0 1 2 3 4 5
> >> [...snip...]
> >>
> >> As you can see, the flow indirection hash distributes flows to 34 queues.
> >>
> >> Increasing the number of queues from 34 to 64 works, and the flow
> >> indirection hash is reset automatically:
> >>
> >> $ sudo ethtool -L eth0 combined 64
> >> $ sudo ethtool -x eth0
> >> RX flow hash indirection table for eth0 with 64 RX ring(s):
> >> 0: 0 1 2 3 4 5 6 7
> >> 8: 8 9 10 11 12 13 14 15
> >> 16: 16 17 18 19 20 21 22 23
> >> 24: 24 25 26 27 28 29 30 31
> >> 32: 32 33 34 35 36 37 38 39
> >> 40: 40 41 42 43 44 45 46 47
> >> 48: 48 49 50 51 52 53 54 55
> >> 56: 56 57 58 59 60 61 62 63
> >
> > This is odd, if IFF_RXFH_CONFIGURED is set driver should not
> > re-initialize the indirection table. Which I believe is what
> > you describe at the end of your message:
> >
>
> Right. It seems like the driver should actually be checking this flag
> somewhere else and preventing the flow where we clear the indirection
> table...
>
> We are at least in some places according to your report here, but
> perhaps there is a gap....
Thanks for the comments / information. I noticed that one other driver
(mlx5) tweaks this bit, which is what led me down this rabbit hole.
I'll have to re-read the i40e code and re-run some experiments with the
queue count and flow hash to get a better understanding of the current
behavior and verify/double check the results.
I'll follow-up with an email to intel-wired-lan about the current
(unpatched) behavior I'm seeing with i40e to double check if there's
a bug or if I've simply made a mistake somewhere in my testing.
I did run the experiments a few times, so it is possible I got into some
weird state. It is worth revisiting fresh from a reboot with a kernel built
from net-next.
Powered by blists - more mailing lists