linux-kernel - Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <568F544E.3010702@gmail.com>
Date:	Thu, 07 Jan 2016 22:16:46 -0800
From:	John Fastabend <john.fastabend@...il.com>
To:	KY Srinivasan <kys@...rosoft.com>,
	Vitaly Kuznetsov <vkuznets@...hat.com>,
	Simon Xiao <sixiao@...rosoft.com>,
	Eric Dumazet <eric.dumazet@...il.com>
CC:	Tom Herbert <tom@...bertland.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	David Miller <davem@...emloft.net>
Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct
 flow_keys layout

On 16-01-07 07:49 PM, KY Srinivasan wrote:
> 
> 
>> -----Original Message-----
>> From: John Fastabend [mailto:john.fastabend@...il.com]
>> Sent: Thursday, January 7, 2016 5:02 PM
>> To: Vitaly Kuznetsov <vkuznets@...hat.com>; Simon Xiao
>> <sixiao@...rosoft.com>; Eric Dumazet <eric.dumazet@...il.com>
>> Cc: Tom Herbert <tom@...bertland.com>; netdev@...r.kernel.org; KY
>> Srinivasan <kys@...rosoft.com>; Haiyang Zhang <haiyangz@...rosoft.com>;
>> devel@...uxdriverproject.org; linux-kernel@...r.kernel.org; David Miller
>> <davem@...emloft.net>
>> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct
>> flow_keys layout
>>
>> On 16-01-07 05:28 AM, Vitaly Kuznetsov wrote:
>>> Eric Dumazet <eric.dumazet@...il.com> writes:
>>>
>>>> On Thu, 2016-01-07 at 10:33 +0100, Vitaly Kuznetsov wrote:
>>>>> Recent changes to 'struct flow_keys' (e.g commit d34af823ff40 ("net: Add
>>>>>  VLAN ID to flow_keys")) introduced a performance regression in netvsc
>>>>> driver. Is problem is, however, not the above mentioned commit but the
>>>>> fact that netvsc_set_hash() function did some assumptions on the struct
>>>>> flow_keys data layout and this is wrong. We need to extract the data we
>>>>> need (src/dst addresses and ports) after the dissect.
>>>>>
>>>>> The issue could also be solved in a completely different way: as suggested
>>>>> by Eric instead of our own homegrown netvsc_set_hash() we could use
>>>>> skb_get_hash() which does more or less the same. Unfortunately, the
>>>>> testing done by Simon showed that Hyper-V hosts are not happy with our
>>>>> Jenkins hash, selecting the output queue with the current algorithm based
>>>>> on Toeplitz hash works significantly better.
>>>>
>>
>> Also can I ask the maybe naive question. It looks like the hypervisor
>> is populating some table via a mailbox msg and this is used to select
>> the queues I guess with some sort of weighting function?
>>
>> What happens if you just remove select_queue altogether? Or maybe just
>> what is this 16 entry table doing? How does this work on my larger
>> systems with 64+ cores can I only use 16 cores? Sorry I really have
>> no experience with hyperV and this got me curious.
> 
> We will limit the number of VRSS channels to the number of CPUs in
> a NUMA node. If the number of CPUs in a NUMA node exceeds 8, we
> will only open up 8 VRSS channels. On the host side currently traffic
> spreading is done in software and we have found that limiting to 8 CPUs
> gives us the best throughput. In Windows Server 2016, we will be 
> distributing traffic on the host in hardware; the heuristics in the guest
> may change.
> 
> Regards,
> 
> K. Y

I think a better way to do this would be to query the numa node when
the interface comes online via dev_to_node() and then use cpu_to_node()
or create/find some better variant to get a list of cpus on the numa
node.

At this point you can use the xps mapping interface
netif_set_xps_queue() to get the right queue to cpu binding. If you want
to cap it to max 8 queues that works as well. I don't think there is
any value to have more tx queues than number of cpus in use.

If you do it this way all the normal mechanisms to setup queue mappings
will work for users who are doing some special configuration and the
default will still be what you want.

I guess I should go do this numa mapping for ixgbe and friends now that
I mention it. Last perf numbers I had showed cross numa affinitizing
was pretty bad.

Thanks,
John