[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a699a394-d36a-4f42-bd49-9a5a573fd58f@redhat.com>
Date: Sat, 29 Jun 2024 14:29:41 -0400
From: Waiman Long <longman@...hat.com>
To: Mikulas Patocka <mpatocka@...hat.com>, Tejun Heo <tj@...nel.org>,
Lai Jiangshan <jiangshanlai@...il.com>
Cc: Mike Snitzer <snitzer@...nel.org>, Laurence Oberman
<loberman@...hat.com>, Jonathan Brassow <jbrassow@...hat.com>,
Ming Lei <minlei@...hat.com>, Ondrej Kozina <okozina@...hat.com>,
Milan Broz <gmazyland@...il.com>, linux-kernel@...r.kernel.org,
dm-devel@...ts.linux.dev
Subject: Re: dm-crypt performance regression due to workqueue changes
On 6/29/24 14:15, Mikulas Patocka wrote:
> Hi
>
> I report that the patch 63c5484e74952f60f5810256bd69814d167b8d22
> ("workqueue: Add multiple affinity scopes and interface to select them")
> is causing massive dm-crypt slowdown in virtual machines.
>
> Steps to reproduce:
> * Install a system in a virtual machine with 16 virtual CPUs
> * Create a scratch file with "dd if=/dev/zero of=Scratch.img bs=1M
> count=2048 oflag=direct" - the file should be on a fast NVMe drive
> * Attach the scratch file to the virtual machine as /dev/vdb; cache mode
> should be 'none'
> * cryptsetup --force-password luksFormat /dev/vdb
> * cryptsetup luksOpen /dev/vdb cr
> * fio --direct=1 --bsrange=128k-128k --runtime=40 --numjobs=1
> --ioengine=libaio --iodepth=8 --group_reporting=1
> --filename=/dev/mapper/cr --name=job --rw=read
>
> With 6.5, we get 3600MiB/s; with 6.6 we get 1400MiB/s.
>
> The reason is that virt-manager by default sets up a topology where we
> have 16 sockets, 1 core per socket, 1 thread per core. And that workqueue
> patch avoids moving work items across sockets, so it processes all
> encryption work only on one virtual CPU.
>
> The performance degradation may be fixed with "echo 'system'
>> /sys/module/workqueue/parameters/default_affinity_scope" - but it is
> regression anyway, as many users don't know about this option.
>
> How should we fix it? There are several options:
> 1. revert back to 'numa' affinity
> 2. revert to 'numa' affinity only if we are in a virtual machine
> 3. hack dm-crypt to set the 'numa' affinity for the affected workqueues
> 4. any other solution?
Another alternative is to go back to the old "numa" default if the
kernel is running under a hypervisor since the cpu configuration
information is likely to be incorrect anyway. The current default of
"cache" will remain if not under a hypervisor.
Cheers,
Longman
Powered by blists - more mailing lists