linux-kernel - Re: dm-crypt performance regression due to workqueue changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e52c5a40-8ca9-38ae-1595-3785c6ac435@redhat.com>
Date: Mon, 1 Jul 2024 15:42:29 +0200 (CEST)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Daniel P. Berrangé <berrange@...hat.com>
cc: Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>, 
    Waiman Long <longman@...hat.com>, Mike Snitzer <snitzer@...nel.org>, 
    Laurence Oberman <loberman@...hat.com>, 
    Jonathan Brassow <jbrassow@...hat.com>, Ming Lei <minlei@...hat.com>, 
    Ondrej Kozina <okozina@...hat.com>, Milan Broz <gmazyland@...il.com>, 
    linux-kernel@...r.kernel.org, dm-devel@...ts.linux.dev, 
    users@...ts.libvirt.org
Subject: Re: dm-crypt performance regression due to workqueue changes



On Mon, 1 Jul 2024, Daniel P. Berrangé wrote:

> On Sun, Jun 30, 2024 at 08:49:48PM +0200, Mikulas Patocka wrote:
> > 
> > 
> > On Sun, 30 Jun 2024, Tejun Heo wrote:
> > 
> > > Do you happen to know why libvirt is doing that? There are many other
> > > implications to configuring the system that way and I don't think we want to
> > > design kernel behaviors to suit topology information fed to VMs which can be
> > > arbitrary.
> > > 
> > > Thanks.
> > 
> > I don't know why. I added users@...ts.libvirt.org to the CC.
> > 
> > How should libvirt properly advertise "we have 16 threads that are 
> > dynamically scheduled by the host kernel, so the latencies between them 
> > are changing and unpredictable"?
> 
> NB, libvirt is just control plane, the actual virtual hardware exposed
> is implemented across QEMU and the KVM kernel mod. Guest CPU topology
> and/or NUMA cost information is the responsibility of QEMU.
> 
> When QEMU's virtual CPUs are floating freely across host CPUs there's
> no perfect answer. The host admin needs to make a tradeoff in their
> configuration
> 
> They can optimize for density, by allowing guest CPUs to float freely
> and allow CPU overcommit against host CPUs, and the guest CPU topology
> is essentially a lie.
> 
> They can optimize for predictable performance, by strictly pinning
> guest CPUs 1:1 to host CPUs, and minimize CPU overcommit, and have
> the guest CPU topology 1:1 match the host CPU topology.
> 
> With regards,
> Daniel

The problem that we have here is that the commit 
63c5484e74952f60f5810256bd69814d167b8d22 ("workqueue: Add multiple 
affinity scopes and interface to select them") changes the behavior of 
unbound workqueues, so that work items are only executed on CPUs that 
share last level cache with the task that submitted them.

If there are 16 virtual CPUs that are freely floating across physical 
CPUs, virt-manager by default selects a topology where it advertises 16 
sockets, 1 CPU per socket, 1 thread per CPU. The result is that the 
unbound workqueues are no longer unbound, they can't move work across 
sockets and they are bound to just one virtual CPU, causing dm-crypt 
performance degradation. (the crypto operations are no longer 
parallelized).

Whose bug is this? Is it a bug in virt-manager because it advertises 
invalid topology? Is this a bug in that patch 63c5484e7495 because it 
avoids moving work items across sockets?

Mikulas