linux-kernel - dm-crypt performance regression due to workqueue changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <32fd8274-d5f-3eca-f5d2-1a9117fd8edb@redhat.com>
Date: Sat, 29 Jun 2024 20:15:56 +0200 (CEST)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>
cc: Waiman Long <longman@...hat.com>, Mike Snitzer <snitzer@...nel.org>, 
    Laurence Oberman <loberman@...hat.com>, 
    Jonathan Brassow <jbrassow@...hat.com>, Ming Lei <minlei@...hat.com>, 
    Ondrej Kozina <okozina@...hat.com>, Milan Broz <gmazyland@...il.com>, 
    linux-kernel@...r.kernel.org, dm-devel@...ts.linux.dev
Subject: dm-crypt performance regression due to workqueue changes

Hi

I report that the patch 63c5484e74952f60f5810256bd69814d167b8d22 
("workqueue: Add multiple affinity scopes and interface to select them") 
is causing massive dm-crypt slowdown in virtual machines.

Steps to reproduce:
* Install a system in a virtual machine with 16 virtual CPUs
* Create a scratch file with "dd if=/dev/zero of=Scratch.img bs=1M
  count=2048 oflag=direct" - the file should be on a fast NVMe drive
* Attach the scratch file to the virtual machine as /dev/vdb; cache mode
  should be 'none'
* cryptsetup --force-password luksFormat /dev/vdb
* cryptsetup luksOpen /dev/vdb cr
* fio --direct=1 --bsrange=128k-128k --runtime=40 --numjobs=1
  --ioengine=libaio --iodepth=8 --group_reporting=1
  --filename=/dev/mapper/cr --name=job --rw=read

With 6.5, we get 3600MiB/s; with 6.6 we get 1400MiB/s.

The reason is that virt-manager by default sets up a topology where we 
have 16 sockets, 1 core per socket, 1 thread per core. And that workqueue 
patch avoids moving work items across sockets, so it processes all 
encryption work only on one virtual CPU.

The performance degradation may be fixed with "echo 'system'
>/sys/module/workqueue/parameters/default_affinity_scope" - but it is 
regression anyway, as many users don't know about this option.

How should we fix it? There are several options:
1. revert back to 'numa' affinity
2. revert to 'numa' affinity only if we are in a virtual machine
3. hack dm-crypt to set the 'numa' affinity for the affected workqueues
4. any other solution?

Mikulas