[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c7a994ee-763c-1f94-2c1e-4348d7b4cc62@solarflare.com>
Date: Fri, 13 Apr 2018 15:52:25 +0100
From: Edward Cree <ecree@...arflare.com>
To: David Miller <davem@...emloft.net>
CC: <linux-net-drivers@...arflare.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel
On 13/04/18 13:36, Edward Cree wrote:
> It turns out this may all be moot anyway: I figured out why I was seeing
> ARFS storms and it wasn't the configuration issue I originally blamed.
Hmm, correction, while the fix I mentioned in my previous email is needed,
it doesn't prevent the ARFS storms (although seems to lessen their
severity, given that the machine didn't actually fall over this time),
so we do also need some kind of limiting.
On 12/04/18 16:33, David Miller wrote:
> Then simply make the work process a queue, and add entries to the queue
> here if the work is already scheduled.
>
> Is there a reason why that wouldn't work?
That has the same problem as the existing code, that the length of the queue
can grow without bound, potentially causing a very long lag between the
request and its execution. This then can quickly become exponential as,
while waiting for the filter to be inserted, further packets from the same
flow are received (still unsteered) and trigger duplicate ARFS requests
(though I suppose it would be possible to scan the queue for matching flow
IDs; but the concurrency / locking problems with that are 'interesting').
I'm not sure why you object to the dropping of requests - it seems reasonable
to me to treat ARFS as a 'best-effort' thing. The packets will still be
received (just not necessarily on the core nearest the application), and if
the flow continues it will generate more steering requests after the ones
currently in flight have been processed.
And in practice we only get into this situation in the first place when we
have interrupt affinities configured in such a way as to make ARFS
practically useless anyway, so our failure to insert the filters is not of
great significance.
On 13/04/18 15:45, David Miller wrote:
> I understand the constraints you are working under, but do realize
> that the real root of the problems is that you are implementing what
> is defined clearly as a synchronous operation as asynchronous.
Yes, it is unfortunate that we are unable to perform synchronous filter
insertions, but you go to war with the hardware you have :(
-Ed
Powered by blists - more mailing lists