linux-kernel - Re: [PATCH] block/mq: blk map queues by core id

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b568440-cd06-d74c-5708-333b847e49c1@oracle.com>
Date:   Sat, 23 Mar 2019 19:14:34 +0800
From:   Dongli Zhang <dongli.zhang@...cle.com>
To:     luferry <luferry@....com>
Cc:     Jens Axboe <axboe@...nel.dk>, Ming Lei <ming.lei@...hat.com>,
        Christoph Hellwig <hch@....de>, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] block/mq: blk map queues by core id



On 03/23/2019 02:34 PM, luferry wrote:
> 
> I just bought one vm , so i cannot access to hypervisor. I will try to build the environment on my desktop.
> I'm sure about something.
> The hypervisor is KVM and disk driver is virtio-blk.
> [root@...-mq ~]# dmesg |grep KVM
> [    0.000000] Hypervisor detected: KVM
> [    0.186330] Booting paravirtualized kernel on KVM
> [    0.279106] KVM setup async PF for cpu 0
> [    0.420819] KVM setup async PF for cpu 1
> [    0.421682] KVM setup async PF for cpu 2
> [    0.422113] KVM setup async PF for cpu 3
> [    0.422434] KVM setup async PF for cpu 4
> [    0.422434] KVM setup async PF for cpu 5
> [    0.423312] KVM setup async PF for cpu 6
> [    0.423394] KVM setup async PF for cpu 7
> [    0.424125] KVM setup async PF for cpu 8
> [    0.424414] KVM setup async PF for cpu 9
> [    0.424415] KVM setup async PF for cpu 10
> [    0.425329] KVM setup async PF for cpu 11
> [    0.425420] KVM setup async PF for cpu 12
> [    0.426156] KVM setup async PF for cpu 13
> [    0.426431] KVM setup async PF for cpu 14
> [    0.426431] KVM setup async PF for cpu 15
> [root@...-mq ~]# lspci |grep block
> 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
> 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device
> 
> [root@...-mq ~]# lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    2
> Core(s) per socket:    8
> 
> [root@...-mq ~]# ls /sys/block/vdb/mq/
> 0  1  2  3
> 
> [root@...-mq ~]# cat /proc/cpuinfo |egrep 'processor|core id'
> processor	: 0
> core id		: 0
> processor	: 1
> core id		: 0
> processor	: 2
> core id		: 1
> processor	: 3
> core id		: 1
> processor	: 4
> core id		: 2
> processor	: 5
> core id		: 2
> processor	: 6
> core id		: 3
> processor	: 7
> core id		: 3
> processor	: 8
> core id		: 4
> processor	: 9
> core id		: 4
> processor	: 10
> core id		: 5
> processor	: 11
> core id		: 5
> processor	: 12
> core id		: 6
> processor	: 13
> core id		: 6
> processor	: 14
> core id		: 7
> processor	: 15
> core id		: 7
> 
> --before this patch--
> [root@...-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
> 0, 4, 5, 8, 9, 12, 13
> 1
> 2, 6, 7, 10, 11, 14, 15
> 3
> 
> --after this patch--
> [root@...-mq ~]# cat /sys/block/vdb/mq/*/cpu_list
> 0, 4, 5, 12, 13
> 1, 6, 7, 14, 15
> 2, 8, 9
> 3, 10, 11
> 
> 
> I add dump_stack insert blk_mq_map_queues.
> 
> [    1.378680] Call Trace:
> [    1.378690]  dump_stack+0x5a/0x73
> [    1.378695]  blk_mq_map_queues+0x29/0xb0
> [    1.378700]  blk_mq_alloc_tag_set+0x1bd/0x2d0
> [    1.378705]  virtblk_probe+0x1ae/0x8e4 [virtio_blk]
> [    1.378709]  virtio_dev_probe+0x18a/0x230 [virtio]
> [    1.378713]  really_probe+0x215/0x3f0
> [    1.378716]  driver_probe_device+0x115/0x130
> [    1.378718]  device_driver_attach+0x50/0x60
> [    1.378720]  __driver_attach+0xbd/0x140
> [    1.378722]  ? device_driver_attach+0x60/0x60
> [    1.378724]  bus_for_each_dev+0x67/0xc0
> [    1.378727]  ? klist_add_tail+0x57/0x70

I am not able to reproduce above call stack when virtio-blk is assigned 4 queues
while my qemu cmdline is "-smp 16,sockets=1,cores=8,threads=2".

# cat /sys/block/vda/mq/0/cpu_list
0, 1, 2, 3
# cat /sys/block/vda/mq/1/cpu_list
4, 5, 6, 7
# cat /sys/block/vda/mq/2/cpu_list
8, 9, 10, 11
# cat /sys/block/vda/mq/3/cpu_list
12, 13, 14, 15


I do agree in above case we would have issue if the mapping is established by
blk_mq_map_queues().


However, I am just curious how we finally reach at blk_mq_map_queues() from
blk_mq_alloc_tag_set()?

It should be something like:

blk_mq_alloc_tag_set()
 -> blk_mq_update_queue_map()
      -> if (set->ops->map_queues && !is_kdump_kernel())
             return set->ops->map_queues(set);
      -> else
             return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);

Wouldn't we always have set->ops->map_queues = virtblk_map_queues()?

Or the execution reach at:

virtblk_map_queues()
 -> blk_mq_virtio_map_queues()
     -> if (!vdev->config->get_vq_affinity)
            return blk_mq_map_queues(qmap);
     -> else
            establish the mapping via get_vq_affinity

but vdev->config->get_vq_affinity == NULL?

For virtio pci, get_vq_affinity is always set. Seems virtio mmio would not set
get_vq_affinity.


I used to play with firecracker (by amazon) and it is interesting firecracker
uses mmio to setup virtio-blk.


Sorry for disturbing the review of this patch. I just would like to clarify in
which scenario we would hit this issue, e.g., when virtio-blk is based on mmio?

Dongli Zhang

> 
> 
> At 2019-03-22 19:58:08, "Dongli Zhang" <dongli.zhang@...cle.com> wrote:
>>
>>
>> On 03/22/2019 06:09 PM, luferry wrote:
>>> under virtual machine environment, cpu topology may differ from normal
>>> physical server.
>>
>> Would mind share the name of virtual machine monitor, the command line if
>> available, and which device to reproduce.
>>
>> For instance, I am not able to reproduce with qemu nvme or virtio-blk as I
>> assume they use pci or virtio specific mapper to establish the mapping.
>>
>> E.g., with qemu and nvme: -smp 8,sockets=1,cores=4,threads=2
>>
>> Indeed I use three queues instead of twp as one is reserved for admin.
>>
>> # ls /sys/block/nvme0n1/mq/*
>> /sys/block/nvme0n1/mq/0:
>> cpu0  cpu1  cpu2  cpu3  cpu_list  nr_reserved_tags  nr_tags
>>
>> /sys/block/nvme0n1/mq/1:
>> cpu4  cpu5  cpu6  cpu7  cpu_list  nr_reserved_tags  nr_tags
>>
>>
>> Thank you very much!
>>
>> Dongli Zhang
>>
>>> for example (machine with 4 cores, 2 threads per core):
>>>
>>> normal physical server:
>>> core-id   thread-0-id  thread-1-id
>>> 0         0            4
>>> 1         1            5
>>> 2         2            6
>>> 3         3            7
>>>
>>> virtual machine:
>>> core-id   thread-0-id  thread-1-id
>>> 0         0            1
>>> 1         2            3
>>> 2         4            5
>>> 3         6            7
>>>
>>> When attach disk with two queues, all the even numbered cpus will be
>>> mapped to queue 0. Under virtual machine, all the cpus is followed by
>>> its sibling cpu.Before this patch, all the odd numbered cpus will also
>>> be mapped to queue 0, can cause serious imbalance.this will lead to
>>> performance impact on system IO
>>>
>>> So suggest to allocate cpu map by core id, this can be more currency
>>>
>>> Signed-off-by: luferry <luferry@....com>
>>> ---
>>>  block/blk-mq-cpumap.c | 9 +++++----
>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
>>> index 03a534820271..4125e8e77679 100644
>>> --- a/block/blk-mq-cpumap.c
>>> +++ b/block/blk-mq-cpumap.c
>>> @@ -35,7 +35,7 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>  {
>>>  	unsigned int *map = qmap->mq_map;
>>>  	unsigned int nr_queues = qmap->nr_queues;
>>> -	unsigned int cpu, first_sibling;
>>> +	unsigned int cpu, first_sibling, core = 0;
>>>  
>>>  	for_each_possible_cpu(cpu) {
>>>  		/*
>>> @@ -48,9 +48,10 @@ int blk_mq_map_queues(struct blk_mq_queue_map *qmap)
>>>  			map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>>  		} else {
>>>  			first_sibling = get_first_sibling(cpu);
>>> -			if (first_sibling == cpu)
>>> -				map[cpu] = cpu_to_queue_index(qmap, nr_queues, cpu);
>>> -			else
>>> +			if (first_sibling == cpu) {
>>> +				map[cpu] = cpu_to_queue_index(qmap, nr_queues, core);
>>> +				core++;
>>> +			} else
>>>  				map[cpu] = map[first_sibling];
>>>  		}
>>>  	}
>>>