lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <65cdb06d.1809a.169b56dfa50.Coremail.luferry@163.com>
Date:   Mon, 25 Mar 2019 23:17:57 +0800 (CST)
From:   luferry <luferry@....com>
To:     "Dongli Zhang" <dongli.zhang@...cle.com>
Cc:     "Jens Axboe" <axboe@...nel.dk>, "Ming Lei" <ming.lei@...hat.com>,
        "Christoph Hellwig" <hch@....de>, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re:Re: [PATCH] block/mq: blk map queues by core id




Thanks for your kindly suggestions, I will reformat this patch later!




At 2019-03-25 21:53:00, "Dongli Zhang" <dongli.zhang@...cle.com> wrote:
>
>
>On 03/25/2019 05:53 PM, luferry wrote:
>> 
>> sorry,  messed up some func name
>> 
>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to blk_mq_map_queues
>> 
>> Since virtual machine users cannot change the vector numbers, I think blk_mq_map_queues should be more friendly with different cpu topology.
>
>
>Yes, we will fallback to blk_mq_map_queues if there is lack of vectors.
>
>Here is the qemu cmdline:
>"-smp 8,sockets=1,cores=4,threads=2"
>"-device virtio-blk-pci,drive=drive0,id=device0,num-queues=4,vectors=2"
>
># cat /sys/block/vda/mq/0/cpu_list
>0, 4, 5
>root@vm:/home/zhang# cat /sys/block/vda/mq/1/cpu_list
>1
>root@vm:/home/zhang# cat /sys/block/vda/mq/2/cpu_list
>2, 6, 7
>root@vm:/home/zhang# cat /sys/block/vda/mq/3/cpu_list
>3
>
>How about to put how to hit the issue in commit message so people would be able
>to know the scenario?
>
>Dongli Zhang
>
>> 
>> 
>> 
>> 
>> At 2019-03-25 17:49:33, "luferry" <luferry@....com> wrote:
>>>
>>> After reading the code and compare pci device info.
>>> I can reproduce this imbalance under KVM.
>>> When enable virtio-blk with multi queues but with only 2 msix-vector.
>>> vp_dev->per_vq_vectors will be false, so blk_mq_virtio_map_queues will fallback to virtio_mq_map_queues
>>>
>>> Since virtual machine users cannot change the vector numbers, I think virtio_mq_map_queues should be more friendly with different cpu topology. 
>>>
>>> 448 const struct cpumask *vp_get_vq_affinity(struct virtio_device *vdev, int index)
>>> 449 {
>>> 450     struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>> 451
>>> 452     if (!vp_dev->per_vq_vectors ||
>>> 453         vp_dev->vqs[index]->msix_vector == VIRTIO_MSI_NO_VECTOR)
>>> 454         return NULL;
>>> 455
>>> 456     return pci_irq_get_affinity(vp_dev->pci_dev,
>>> 457                     vp_dev->vqs[index]->msix_vector);
>>> 458 }
>>>
>>> 32 int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap,
>>> 33         struct virtio_device *vdev, int first_vec)
>>> 34 {
>>> 35     const struct cpumask *mask;
>>> 36     unsigned int queue, cpu;
>>> 37
>>> 38     if (!vdev->config->get_vq_affinity)
>>> 39         goto fallback;
>>> 40
>>> 41     for (queue = 0; queue < qmap->nr_queues; queue++) {
>>> 42         mask = vdev->config->get_vq_affinity(vdev, first_vec + queue); //vp_get_vq_affinity
>>> 43         if (!mask)
>>> 44             goto fallback;
>>> 45
>>> 46         for_each_cpu(cpu, mask)
>>> 47             qmap->mq_map[cpu] = qmap->queue_offset + queue;
>>> 48     }
>>> 49
>>> 50     return 0;
>>> 51 fallback:
>>> 52     return blk_mq_map_queues(qmap);
>>> 53 }
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2019-03-23 19:14:34, "Dongli Zhang" <dongli.zhang@...cle.com> wrote:
>>>>
>>>>
>>>> On 03/23/2019 02:34 PM, luferry wrote:
>>>>>
>>>>> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
>>>>> I'm sure about something.
>>>>> The hypervisor is KVM and disk driver is virtio-blk.
>>>>> [root@...-mq ~]# dmesg |grep KVM
>>>>> [    0.000000] Hypervisor detected: KVM
>>>>> [    0.186330] Booting paravirtualized kernel on KVM
>>>>> [    0.279106] KVM setup async PF for cpu 0
>>>>> [    0.420819] KVM setup async PF for cpu 1
>>>>> [    0.421682] KVM setup async PF for cpu 2
>>>>> [    0.422113] KVM setup async PF for cpu 3
>>>>> [    0.422434] KVM setup async PF for cpu 4
>>>>> [    0.422434] KVM setup async PF for cpu 5
>>>>> [    0.423312] KVM setup async PF for cpu 6
>>>>> [    0.423394] KVM setup async PF for cpu 7
>>>>> [    0.424125] KVM setup async PF for cpu 8
>>>>> [    0.424414] KVM setup async PF for cpu 9
>>>>> [    0.424415] KVM setup async PF for cpu 10
>>>>> [    0.425329] KVM setup async PF for cpu 11
>>>>> [    0.425420] KVM setup async PF for cpu 12
>>>>> [    0.426156] KVM setup async PF for cpu 13
>>>>> [    0.426431] KVM setup async PF for cpu 14
>>>>> [    0.426431] KVM setup async PF for cpu 15
>>>>> [root@...-mq ~]# lspci |grep block
>>>>> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>>> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
>>>>>
>>>>> [root@...-mq ~]# lscpu
>>>>> Architecture:          x86_64
>>>>> CPU op-mode(s):        32-bit, 64-bit
>>>>> Byte Order:            Little Endian
>>>>> CPU(s):                16
>>>>> On-line CPU(s) list:   0-15
>>>>> Thread(s) per core:    2
>>>>> Core(s) per socket:    8
>>>>>
>>>>> [root@...-mq ~]# ls /sys/block/vdb/mq/
>>>>> 0  1  2  3
>>>>>
>>>>> [root@...-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
>>>>> processor	: 0
>>>>> core id		: 0
>>>>> processor	: 1
>>>>> core id		: 0
>>>>> processor	: 2
>>>>> core id		: 1
>>>>> processor	: 3
>>>>> core id		: 1
>>>>> processor	: 4
>>>>> core id		: 2
>>>>> processor	: 5
>>>>> core id		: 2
>>>>> processor	: 6
>>>>> core id		: 3
>>>>> processor	: 7
>>>>> core id		: 3
>>>>> processor	: 8
>>>>> core id		: 4
>>>>> processor	: 9
>>>>> core id		: 4
>>>>> processor	: 10
>>>>> core id		: 5
>>>>> processor	: 11
>>>>> core id		: 5
>>>>> processor	: 12
>>>>> core id		: 6
>>>>> processor	: 13
>>>>> core id		: 6
>>>>> processor	: 14
>>>>> core id		: 7
>>>>> processor	: 15
>>>>> core id		: 7
>>>>>
>>>>> --before this patch--
>>>>> [root@...-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>>> 0, 4, 5, 8, 9, 12, 13
>>>>> 1
>>>>> 2, 6, 7, 10, 11, 14, 15
>>>>> 3
>>>>>
>>>>> --after this patch--
>>>>> [root@...-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
>>>>> 0, 4, 5, 12, 13
>>>>> 1, 6, 7, 14, 15
>>>>> 2, 8, 9
>>>>> 3, 10, 11
>>>>>
>>>>>
>>>>> I add dump_stack insert blk_mq_map_queues.
>>>>>
>>>>> [    1.378680] Call Trace:
>>>>> [    1.378690]  dump_stack+0x5a/0x73
>>>>> [    1.378695]  blk_mq_map_queues+0x29/0xb0
>>>>> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
>>>>> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
>>>>> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
>>>>> [    1.378713]  really_probe+0x215/0x3f0
>>>>> [    1.378716]  driver_probe_device+0x115/0x130
>>>>> [    1.378718]  device_driver_attach+0x50/0x60
>>>>> [    1.378720]  __driver_attach+0xbd/0x140
>>>>> [    1.378722]  ? device_driver_attach+0x60/0x60
>>>>> [    1.378724]  bus_for_each_dev+0x67/0xc0
>>>>> [    1.378727]  ? klist_add_tail+0x57/0x70
>>>>
>>>> I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
>>>> while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".
>>>>
>>>> # cat /sys/block/vda/mq/0/cpu_list
>>>> 0, 1, 2, 3
>>>> # cat /sys/block/vda/mq/1/cpu_list
>>>> 4, 5, 6, 7
>>>> # cat /sys/block/vda/mq/2/cpu_list
>>>> 8, 9, 10, 11
>>>> # cat /sys/block/vda/mq/3/cpu_list
>>>> 12, 13, 14, 15
>>>>
>>>>
>>>> I do agree in above case we would have issue if the mapping is established by
>>>> blk_mq_map_queues().
>>>>
>>>>
>>>> However, I am just curious how we finally reach at blk_mq_map_queues() from
>>>> blk_mq_alloc_tag_set()?
>>>>
>>>> It should be something like:
>>>>
>>>> blk_mq_alloc_tag_set()
>>>> -> blk_mq_update_queue_map()
>>>>      -> if (set->ops->map_queues && !is_kdump_kernel())
>>>>             return set->ops->map_queues(set);
>>>>      -> else
>>>>             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
>>>>
>>>> Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?
>>>>
>>>> Or the execution reach at:
>>>>
>>>> virtblk_map_queues()
>>>> -> blk_mq_virtio_map_queues()
>>>>     -> if (!vdev->config->get_vq_affinity)
>>>>            return blk_mq_map_queues(qmap);
>>>>     -> else
>>>>            establish the mapping via get_vq_affinity
>>>>
>>>> but vdev->config->get_vq_affinity == NULL?
>>>>
>>>> For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
>>>> get_vq_affinity.
>>>>
>>>>
>>>> I used to play with firecracker (by amazon) and it is interesting firecracker
>>>> uses mmio to setup virtio-blk.
>>>>
>>>>
>>>> Sorry for disturbing the review of this patch. I just would like to clarify in
>>>> which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?
>>>>
>>>> Dongli Zhang
>>>>
>>>>>
>>>>>
>>>>> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@...cle.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/22/2019 06:09 PM, luferry wrote:
>>>>>>> under virtual machine environment, cpu topology may differ from normal
>>>>>>> physical server.
>>>>>>
>>>>>> Would mind share the name of virtual machine monitor, the command line if
>>>>>> available, and which device to reproduce.
>>>>>>
>>>>>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>>>>>> assume they use pci or virtio specific mapper to establish the mapping.
>>>>>>
>>>>>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>>>>>
>>>>>> Indeed I use three queues instead of twp as one is reserved for admin.
>>>>>>
>>>>>> # ls /sys/block/nvme0n1/mq/*
>>>>>> /sys/block/nvme0n1/mq/0:
>>>>>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>>>>>
>>>>>> /sys/block/nvme0n1/mq/1:
>>>>>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>>>>>
>>>>>>
>>>>>> Thank you very much!
>>>>>>
>>>>>> Dongli Zhang
>>>>>>
>>>>>>> for example (machine with 4 cores, 2 threads per core):
>>>>>>>
>>>>>>> normal physical server:
>>>>>>> core-id   thread-0-id  thread-1-id
>>>>>>> 0         0            4
>>>>>>> 1         1            5
>>>>>>> 2         2            6
>>>>>>> 3         3            7
>>>>>>>
>>>>>>> virtual machine:
>>>>>>> core-id   thread-0-id  thread-1-id
>>>>>>> 0         0            1
>>>>>>> 1         2            3
>>>>>>> 2         4            5
>>>>>>> 3         6            7
>>>>>>>
>>>>>>> When attach disk with two queues, all the even numbered cpus will be
>>>>>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>>>>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>>>>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>>>>>> performance impact on system IO
>>>>>>>
>>>>>>> So suggest to allocate cpu map by core id, this can be more currency
>>>>>>>
>>>>>>> Signed-off-by: luferry <luferry@....com>
>>>>>>> ---
>>>>>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>>>>>> index 03a534820271..4125e8e77679 100644
>>>>>>> --- a/block/blk-mq-cpumap.c
>>>>>>> +++ b/block/blk-mq-cpumap.c
>>>>>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>>  {
>>>>>>>  	unsigned int *map = qmap->mq_map;
>>>>>>>  	unsigned int nr_queues = qmap->nr_queues;
>>>>>>> -	unsigned int cpu, first_sibling;
>>>>>>> +	unsigned int cpu, first_sibling, core = 0;
>>>>>>>  
>>>>>>>  	for_each_possible_cpu(cpu) {
>>>>>>>  		/*
>>>>>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>>>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>>>  		} else {
>>>>>>>  			first_sibling = get_first_sibling(cpu);
>>>>>>> -			if (first_sibling == cpu)
>>>>>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>>>>> -			else
>>>>>>> +			if (first_sibling == cpu) {
>>>>>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>>>>>> +				core++;
>>>>>>> +			} else
>>>>>>>  				map[cpu] = map[first_sibling];
>>>>>>>  		}
>>>>>>>  	}
>>>>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ