netdev - Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c7a994ee-763c-1f94-2c1e-4348d7b4cc62@solarflare.com>
Date:   Fri, 13 Apr 2018 15:52:25 +0100
From:   Edward Cree <ecree@...arflare.com>
To:     David Miller <davem@...emloft.net>
CC:     <linux-net-drivers@...arflare.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel

On 13/04/18 13:36, Edward Cree wrote:
> It turns out this may all be moot anyway: I figured out why I was seeing
>  ARFS storms and it wasn't the configuration issue I originally blamed.
Hmm, correction, while the fix I mentioned in my previous email is needed,
 it doesn't prevent the ARFS storms (although seems to lessen their
 severity, given that the machine didn't actually fall over this time),
 so we do also need some kind of limiting.

On 12/04/18 16:33, David Miller wrote:
> Then simply make the work process a queue, and add entries to the queue
> here if the work is already scheduled.
>
> Is there a reason why that wouldn't work?
That has the same problem as the existing code, that the length of the queue
 can grow without bound, potentially causing a very long lag between the
 request and its execution.  This then can quickly become exponential as,
 while waiting for the filter to be inserted, further packets from the same
 flow are received (still unsteered) and trigger duplicate ARFS requests
 (though I suppose it would be possible to scan the queue for matching flow
 IDs; but the concurrency / locking problems with that are 'interesting').

I'm not sure why you object to the dropping of requests - it seems reasonable
 to me to treat ARFS as a 'best-effort' thing.  The packets will still be
 received (just not necessarily on the core nearest the application), and if
 the flow continues it will generate more steering requests after the ones
 currently in flight have been processed.
And in practice we only get into this situation in the first place when we
 have interrupt affinities configured in such a way as to make ARFS
 practically useless anyway, so our failure to insert the filters is not of
 great significance.

On 13/04/18 15:45, David Miller wrote:
> I understand the constraints you are working under, but do realize
> that the real root of the problems is that you are implementing what
> is defined clearly as a synchronous operation as asynchronous.
Yes, it is unfortunate that we are unable to perform synchronous filter
 insertions, but you go to war with the hardware you have :(

-Ed