lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAN+hb0VJKD7DwNTujL7k1E5_OQdaZbo5AugqW=_0c4MZoJ6=nA@mail.gmail.com>
Date:   Fri, 3 Feb 2017 10:22:42 -0800
From:   Benjamin Serebrin <serebrin@...gle.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     netdev@...r.kernel.org, jasowang@...hat.com,
        David Miller <davem@...emloft.net>,
        Willem de Bruijn <willemb@...gle.com>,
        Venkatesh Srinivas <venkateshs@...gle.com>,
        James Mattson <jmattson@...gle.com>
Subject: Re: [PATCH net-next] virtio: Fix affinity for >32 VCPUs

Thanks, Michael, I'll put this text in the commit log:

XPS settings aren't write-able from userspace, so the only way I know
to fix XPS is in the driver.

The interrupt affinity settings could be set by irqbalancer, yes, but
it's not always enabled
(by intention) in all guests.  I just confirmed that on an unpatched
64VCPU VM with no irqbalancer,
all virtio-net interrupts end up on core 0, and on a patched 64VCPU,
interrupts end up on the core whose
number matches the queue number.  This patch is just extending the
existing behavior in the interrupt affinity
assignment loop, and I think it stands on its own.

It actually sparked some internal discussion about further
intelligence in XPS and interrupt affinity, and
we'll do some experiments and come back with further patches.

On Fri, Feb 3, 2017 at 7:07 AM, Michael S. Tsirkin <mst@...hat.com> wrote:
> On Thu, Feb 02, 2017 at 10:19:05PM -0800, Ben Serebrin wrote:
>> From: Benjamin Serebrin <serebrin@...gle.com>
>>
>> If the number of virtio queue pairs is not equal to the
>> number of VCPUs, the virtio guest driver doesn't assign
>> any CPU affinity for the queue interrupts or the xps
>> aggregation interrupt.
>>
>> Google Compute Engine currently provides 1 queue pair for
>> every VCPU, but limits that at a maximum of 32 queue pairs.
>>
>> This code assigns interrupt affinity even when there are more than
>> 32 VCPUs.
>>
>> Tested:
>>
>> (on a 64-VCPU VM with debian 8, jessie-backports 4.9.2)
>>
>> Without the fix we see all queues affinitized to all CPUs:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>> 0-63
>> [...]
>> 0-63
>>
>> and we see all TX queues' xps_cpus affinitzed to no cores:
>>
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus; done
>> 00000000,00000000
>> [...]
>> 00000000,00000000
>>
>> With the fix, we see each queue assigned to the a single core,
>> and xps affinity set to 1 unique cpu per TX queue.
>>
>> 64 VCPU:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>>
>> 0-63
>> 0
>> 0
>> 1
>> 1
>> 2
>> 2
>> 3
>> 3
>> 4
>> 4
>> 5
>> 5
>> 6
>> 6
>> 7
>> 7
>> 8
>> 8
>> 9
>> 9
>> 10
>> 10
>> 11
>> 11
>> 12
>> 12
>> 13
>> 13
>> 14
>> 14
>> 15
>> 15
>> 16
>> 16
>> 17
>> 17
>> 18
>> 18
>> 19
>> 19
>> 20
>> 20
>> 21
>> 21
>> 22
>> 22
>> 23
>> 23
>> 24
>> 24
>> 25
>> 25
>> 26
>> 26
>> 27
>> 27
>> 28
>> 28
>> 29
>> 29
>> 30
>> 30
>> 31
>> 31
>> 0-63
>> 0-63
>> 0-63
>> 0-63
>>
>> cd /sys/class/net/eth0/queues
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
>>
>> 00000001,00000001
>> 00000002,00000002
>> 00000004,00000004
>> 00000008,00000008
>> 00000010,00000010
>> 00000020,00000020
>> 00000040,00000040
>> 00000080,00000080
>> 00000100,00000100
>> 00000200,00000200
>> 00000400,00000400
>> 00000800,00000800
>> 00001000,00001000
>> 00002000,00002000
>> 00004000,00004000
>> 00008000,00008000
>> 00010000,00010000
>> 00020000,00020000
>> 00040000,00040000
>> 00080000,00080000
>> 00100000,00100000
>> 00200000,00200000
>> 00400000,00400000
>> 00800000,00800000
>> 01000000,01000000
>> 02000000,02000000
>> 04000000,04000000
>> 08000000,08000000
>> 10000000,10000000
>> 20000000,20000000
>> 40000000,40000000
>> 80000000,80000000
>>
>> 48 VCPU:
>>
>> cd /proc/irq
>> for i in `seq 24 92` ; do sudo grep ".*" $i/smp_affinity_list;  done
>> 0-47
>> 0
>> 0
>> 1
>> 1
>> 2
>> 2
>> 3
>> 3
>> 4
>> 4
>> 5
>> 5
>> 6
>> 6
>> 7
>> 7
>> 8
>> 8
>> 9
>> 9
>> 10
>> 10
>> 11
>> 11
>> 12
>> 12
>> 13
>> 13
>> 14
>> 14
>> 15
>> 15
>> 16
>> 16
>> 17
>> 17
>> 18
>> 18
>> 19
>> 19
>> 20
>> 20
>> 21
>> 21
>> 22
>> 22
>> 23
>> 23
>> 24
>> 24
>> 25
>> 25
>> 26
>> 26
>> 27
>> 27
>> 28
>> 28
>> 29
>> 29
>> 30
>> 30
>> 31
>> 31
>> 0-47
>> 0-47
>> 0-47
>> 0-47
>>
>> cd /sys/class/net/eth0/queues
>> for i in `seq 0 31` ; do sudo grep ".*" tx-$i/xps_cpus;  done
>>
>> 0001,00000001
>> 0002,00000002
>> 0004,00000004
>> 0008,00000008
>> 0010,00000010
>> 0020,00000020
>> 0040,00000040
>> 0080,00000080
>> 0100,00000100
>> 0200,00000200
>> 0400,00000400
>> 0800,00000800
>> 1000,00001000
>> 2000,00002000
>> 4000,00004000
>> 8000,00008000
>> 0000,00010000
>> 0000,00020000
>> 0000,00040000
>> 0000,00080000
>> 0000,00100000
>> 0000,00200000
>> 0000,00400000
>> 0000,00800000
>> 0000,01000000
>> 0000,02000000
>> 0000,04000000
>> 0000,08000000
>> 0000,10000000
>> 0000,20000000
>> 0000,40000000
>> 0000,80000000
>>
>> Signed-off-by: Ben Serebrin <serebrin@...gle.com>
>> Acked-by: Willem de Bruijn <willemb@...gle.com>
>> Acked-by: Jim Mattson <jmattson@...gle.com>
>> Acked-by: Venkatesh Srinivas <venkateshs@...gle.com>
>
> I wonder why not just do it in userspace though.
> It would be nice to mention this in the commit log.
> Are we sure this distribution is best for all workloads?
> While irqbalancer is hardly a perfect oracle it does
> manage to balance the load somewhat, and IIUC kernel
> affinities would break that.
> Thoughts?
>
>
>> Effort: kvm
>> ---
>>  drivers/net/virtio_net.c | 30 +++++++++++++++++++++++++++---
>>  1 file changed, 27 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 765c2d6358da..0dc3a102bfc4 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1502,20 +1502,44 @@ static void virtnet_set_affinity(struct virtnet_info *vi)
>>        * queue pairs, we let the queue pairs to be private to one cpu by
>>        * setting the affinity hint to eliminate the contention.
>>        */
>> -     if (vi->curr_queue_pairs == 1 ||
>> -         vi->max_queue_pairs != num_online_cpus()) {
>> +     if (vi->curr_queue_pairs == 1) {
>>               virtnet_clean_affinity(vi, -1);
>>               return;
>>       }
>>
>> +     /* If there are more cpus than queues, then assign the queues'
>> +      * interrupts to the first cpus until we run out.
>> +      */
>>       i = 0;
>>       for_each_online_cpu(cpu) {
>> +             if (i == vi->max_queue_pairs)
>> +                     break;
>>               virtqueue_set_affinity(vi->rq[i].vq, cpu);
>>               virtqueue_set_affinity(vi->sq[i].vq, cpu);
>> -             netif_set_xps_queue(vi->dev, cpumask_of(cpu), i);
>>               i++;
>>       }
>>
>> +     /* Stripe the XPS affinities across the online CPUs.
>> +      * Hyperthread pairs are typically assigned such that Linux's
>> +      * CPU X and X + (numcpus / 2) are hyperthread twins, so we cause
>> +      * hyperthread twins to share TX queues, in the case where there are
>> +      * more cpus than queues.
>> +      */
>> +     for (i = 0; i < vi->max_queue_pairs; i++) {
>> +             struct cpumask mask;
>> +             int skip = i;
>> +
>> +             cpumask_clear(&mask);
>> +             for_each_online_cpu(cpu) {
>> +                     while (skip--)
>> +                             cpu = cpumask_next(cpu, cpu_online_mask);
>> +                     if (cpu < num_possible_cpus())
>> +                             cpumask_set_cpu(cpu, &mask);
>> +                     skip = vi->max_queue_pairs - 1;
>> +             }
>> +             netif_set_xps_queue(vi->dev, &mask, i);
>> +     }
>> +
>>       vi->affinity_hint_set = true;
>>  }
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ