netdev - Re: A second case of XPS considerably reducing single-stream performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALx6S37+Gpx7jeV-U-PE4wf593F-AUVEeOpU1R0-h6cDfNQnow@mail.gmail.com>
Date:   Thu, 25 Aug 2016 14:02:05 -0700
From:   Tom Herbert <tom@...bertland.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Rick Jones <rick.jones2@....com>, Netdev <netdev@...r.kernel.org>,
        sathya.perla@...adcom.com, ajit.khaparde@...adcom.com,
        sriharsha.basavapatna@...adcom.com, somnath.kotur@...adcom.com
Subject: Re: A second case of XPS considerably reducing single-stream performance

On Thu, Aug 25, 2016 at 12:19 PM, Alexander Duyck
<alexander.duyck@...il.com> wrote:
> On Wed, Aug 24, 2016 at 4:46 PM, Rick Jones <rick.jones2@....com> wrote:
>> Also, while it doesn't seem to have the same massive effect on throughput, I
>> can also see out of order behaviour happening when the sending VM is on a
>> node with a ConnectX-3 Pro NIC.  Its driver is also enabling XPS it would
>> seem.  I'm not *certain* but looking at the traces it appears that with the
>> ConnectX-3 Pro there is more interleaving of the out-of-order traffic than
>> there is with the Skyhawk.  The ConnectX-3 Pro happens to be in a newer
>> generation server with a newer processor than the other systems where I've
>> seen this.
>>
>> I do not see the out-of-order behaviour when the NIC at the sending end is a
>> BCM57840.  It does not appear that the bnx2x driver in the 4.4 kernel is
>> enabling XPS.
>>
>> So, it would seem that there are three cases of enabling XPS resulting in
>> out-of-order traffic, two of which result in a non-trivial loss of
>> performance.
>>
>> happy benchmarking,
>>
>> rick jones
>
> The problem is that there is no socket associated with the guest from
> the host's perspective.  This is resulting in the traffic bouncing
> between queues because there is no saved socket  to lock the interface
> onto.
>
> I was looking into this recently as well and had considered a couple
> of options.  The first is to fall back to just using skb_tx_hash()
> when skb->sk is null for a given buffer.  I have a patch I have been
> toying around with but I haven't submitted it yet.  If you would like
> I can submit it as an RFC to get your thoughts.  The second option is
> to enforce the use of RPS for any interfaces that do not perform Rx in
> NAPI context.  The correct solution for this is probably some
> combination of the two as you have to have all queueing done in order
> at every stage of the packet processing.
>
I have thought several times about creating flow states for packets
coming from VMs. This can be done similar to how we do RFS, call flow
dissector to get a hash of the flow and then use that to index into a
table that contains the last queue-- only change the queue when
criteria are meant to prevent OOO. This would result in flow dissector
on such packets which seems a bit expensive, it would be nice if the
VM can just give us the hash in a TX descriptor. There are other
benefits with a more advanced mechanism, for instance we might be able
to cache routes or IP tables results (stuff we might keep if there
were a transport socket).

Tom