netdev - Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d56e5a40-48cc-d984-c5cd-e18fbc411ed3@solarflare.com>
Date:   Fri, 13 Apr 2018 16:59:07 +0100
From:   Edward Cree <ecree@...arflare.com>
To:     David Miller <davem@...emloft.net>
CC:     <linux-net-drivers@...arflare.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH net 2/2] sfc: limit ARFS workitems in flight per channel

On 13/04/18 16:03, David Miller wrote:
> Whilst you may not be able to program the filter into the hardware
> synchronously, you should be able to allocate the ID and get all of
> the software state setup.
That's what we were doing before commit 3af0f34290f6 ("sfc: replace
 asynchronous filter operations"), and (as mentioned in that commit)
 that leads to (or at least the scheme we used had) race conditions
 which I could not see a way to fix.  If the hardware resets (and
 thus forgets all its filters) after we've done the software state
 setup but before we reach the point of finalising the software state
 after the hardware operation, we don't know what operations we need
 to do to re-apply the software state to the hardware, because we
 don't know whether the reset happened before or after the hardware
 operation.
Actually, it's not the insertion itself that this can happen to - if
 the reset happens first then the insert will fail because other
 hardware state we reference isn't set up yet.  However, inserting
 one filter can necessitate removing some others (lower-priority
 multicast filters for the same flow); the code before 3af0f34290f6
 was actually capable of getting so confused that it could double-
 free a pointer.
This all gets sufficiently complicated that even if I can find a
 locking scheme that in theory gets it right, there's pretty much a
 100% chance that the actual implementation will contain bugs,
 probably very subtle ones that can't reliably be reproduced etc.
 All my instincts say to get away from that if at all possible.

> People really aren't going to be happy if their performance goes into
> the tank because their filter update rate, for whatever reason, hits
> this magic backlog limit.
Well, the alternative, even if the software state setup part _could_
 be made synchronous, is to allow a potentially unbounded queue for
 the hardware update part (I think there are even still cases in
 which the exponential growth pathology is possible), causing the
 filter insertions to be delayed an arbitrarily long time.  Either
 the flow is still going by that time (in which case the backlog
 limit approach will get a new ndo_rx_flow_steer request and insert
 the filter too) or it isn't, in which case getting round to it
 eventually is no better than dropping it immediately.  In fact it's
 worse because now you waste time inserting a useless filter which
 delays new requests even more.
Besides, I'm fairly confident that the only cases in which you'll
 even come close to hitting the limit are ones where ARFS wouldn't
 do you much good anyway, such as:
* Misconfigured interrupt affinities where ARFS is entirely pointless
* Many short-lived flows (which greatly diminish the utility of ARFS)

So for multiple reasons, hitting the limit won't actually make
 performance worse, although it will often be a sign that performance
 will be bad for other reasons.

-Ed