[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <c69938cffd4002a93a95a396affaa945e0f69206.camel@infradead.org>
Date: Wed, 23 Jul 2025 12:04:44 +0200
From: David Woodhouse <dwmw2@...radead.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>, Pavel Machek
<pavel@...nel.org>, linux-pm <linux-pm@...r.kernel.org>, Marc Zyngier
<maz@...nel.org>, linux-arm-kernel@...ts.infradead.org, "Saidi, Ali"
<alisaidi@...zon.com>, "oliver.upton" <oliver.upton@...ux.dev>, Joey Gouly
<joey.gouly@....com>, Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu
<yuzenghui@...wei.com>, Catalin Marinas <catalin.marinas@....com>, Will
Deacon <will@...nel.org>, linux-kernel <linux-kernel@...r.kernel.org>,
"Heyne, Maximilian" <mheyne@...zon.de>, Alexander Graf <graf@...zon.com>,
"Stamatis, Ilias" <ilstam@...zon.com>
Subject: Memory corruption after resume from hibernate with Arm GICv3 ITS
We have seen guests crashing when, after they resume from hibernate,
the hypervisor serializes their state for live update or live
migration.
The Arm Generic Interrupt Controller is a complicated beast, and it
does scattershot DMA to little tables all across the guest's address
space, without even living behind an IOMMU.
Rather than simply turning it off overall, the guest has to explicitly
tear down *every* one of the individual tables which were previously
configured, in order to ensure that the memory is no longer used.
KVM's implementation of the virtual GIC only uses this guest memory
when asked to serialize its state. Instead of passing the information
up to userspace as most KVM devices will do for serialization, KVM
*only* supports scribbling it to guest memory.
So, when the transition from boot to resumed kernel leaves the vGIC
pointing at the *wrong* addresses, that's why a subsequent LU/LM of
that guest triggers the memory corruption by writing the KVM state to a
guest address that the now-running kernel did *not* expect.
I tried this, just to get some more information:
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -720,7 +720,7 @@ static struct its_collection *its_build_mapd_cmd(struct its_node *its,
its_encode_valid(cmd, desc->its_mapd_cmd.valid);
its_fixup_cmd(cmd);
-
+ printk("%s dev 0x%x valid %d addr 0x%lx\n", __func__, desc->its_mapd_cmd.dev->device_id, desc->its_mapd_cmd.valid, itt_addr);
return NULL;
}
@@ -4996,10 +4996,15 @@ static int its_save_disable(void)
struct its_node *its;
int err = 0;
+ printk("%s\n", __func__);
raw_spin_lock(&its_lock);
list_for_each_entry(its, &its_nodes, entry) {
+ struct its_device *its_dev;
void __iomem *base;
+ list_for_each_entry(its_dev, &its->its_device_list, entry) {
+ its_send_mapd(its_dev, 0);
+ }
base = its->base;
its->ctlr_save = readl_relaxed(base + GITS_CTLR);
err = its_force_quiescent(base);
@@ -5032,8 +5037,10 @@ static void its_restore_enable(void)
struct its_node *its;
int ret;
+ printk("%s\n", __func__);
raw_spin_lock(&its_lock);
list_for_each_entry(its, &its_nodes, entry) {
+ struct its_device *its_dev;
void __iomem *base;
int i;
@@ -5083,6 +5090,10 @@ static void its_restore_enable(void)
if (its->collections[smp_processor_id()].col_id <
GITS_TYPER_HCC(gic_read_typer(base + GITS_TYPER)))
its_cpu_init_collection(its);
+
+ list_for_each_entry(its_dev, &its->its_device_list, entry) {
+ its_send_mapd(its_dev, 1);
+ }
}
raw_spin_unlock(&its_lock);
}
Running on a suitable host with qemu, I reproduce with
# echo reboot > /sys/power/disk
# echo disk > /sys/power/state
Example qemu command line:
qemu-system-aarch64 -serial mon:stdio -M virt,gic-version=host -cpu max -enable-kvm -drive file=~/Fedora-Cloud-Base-Generic-42-1.1.aarch64.qcow2,id=nvm,if=none,snapshot=off,format=qcow2 -device nvme,drive=nvm,serial=1 -m 8g -nographic -nic user,model=virtio -kernel vmlinuz-6.16.0-rc7-dirty -initrd initramfs-6.16.0-rc7-dirty.img -append 'root=UUID=6c7b9058-d040-4047-a892-d2f1c7dee687 ro rootflags=subvol=root no_timer_check console=tty1 console=ttyAMA0,115200n8 systemd.firstboot=off rootflags=subvol=root no_console_suspend=1 resume_offset=366703 resume=/dev/nvme0n1p3' -trace gicv3_its\*
As the kernel boots up for the first time, it sends a normal MAPD command:
[ 1.292956] its_build_mapd_cmd dev 0x10 valid 1 addr 0x10f010000
On hibernation, my newly added code unmaps and then *remaps* the same:
[root@...alhost ~]# echo disk > /sys/power/state
[ 42.118573] PM: hibernation: hibernation entry
[ 42.134574] Filesystems sync: 0.015 seconds
[ 42.134899] Freezing user space processes
[ 42.135566] Freezing user space processes completed (elapsed 0.000 seconds)
[ 42.136040] OOM killer disabled.
[ 42.136307] PM: hibernation: Preallocating image memory
[ 42.371141] PM: hibernation: Allocated 297401 pages for snapshot
[ 42.371163] PM: hibernation: Allocated 1189604 kbytes in 0.23 seconds (5172.19 MB/s)
[ 42.371170] Freezing remaining freezable tasks
[ 42.373465] Freezing remaining freezable tasks completed (elapsed 0.002 seconds)
[ 42.378350] Disabling non-boot CPUs ...
[ 42.378363] its_save_disable
[ 42.378363] its_build_mapd_cmd dev 0x10 valid 0 addr 0x10f010000
[ 42.378363] PM: hibernation: Creating image:
[ 42.378363] PM: hibernation: Need to copy 153098 pages
[ 42.378363] PM: hibernation: Image created (115354 pages copied, 37744 zero pages)
[ 42.378363] its_restore_enable
[ 42.378363] its_build_mapd_cmd dev 0x10 valid 1 addr 0x10f010000
[ 42.383601] nvme nvme0: 1/0/0 default/read/poll queues
[ 42.384411] nvme nvme0: Ignoring bogus Namespace Identifiers
[ 42.384924] hibernate: Hibernating on CPU 0 [mpidr:0x0]
[ 42.387742] PM: Using 1 thread(s) for lzo compression
[ 42.387748] PM: Compressing and saving image data (115654 pages)...
[ 42.387757] PM: Image saving progress: 0%
[ 43.485794] PM: Image saving progress: 10%
[ 44.739662] PM: Image saving progress: 20%
[ 46.617453] PM: Image saving progress: 30%
[ 48.437644] PM: Image saving progress: 40%
[ 49.857855] PM: Image saving progress: 50%
[ 52.156928] PM: Image saving progress: 60%
[ 53.344810] PM: Image saving progress: 70%
[ 54.472998] PM: Image saving progress: 80%
[ 55.083950] PM: Image saving progress: 90%
[ 56.406480] PM: Image saving progress: 100%
[ 56.407088] PM: Image saving done
[ 56.407100] PM: hibernation: Wrote 462616 kbytes in 14.01 seconds (33.02 MB/s)
[ 56.407106] PM: Image size after compression: 148041 kbytes
[ 56.408210] PM: S|
[ 56.642393] Flash device refused suspend due to active operation (state 20)
[ 56.642871] Flash device refused suspend due to active operation (state 20)
[ 56.643432] reboot: Restarting system
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd4f1]
Then the *boot* kernel comes up, does its own MAPD using a slightly different address:
[ 1.270652] its_build_mapd_cmd dev 0x10 valid 1 addr 0x10f009000
... and then transfers control to the hibernated kernel, which again
tries to unmap and remap the ITT at its original address due to my
suspend/resume hack (which is clearly hooking the wrong thing, but is
at least giving us useful information):
Starting systemd-hibernate-resume.service - Resume from hibernation...
[ 1.391340] PM: hibernation: resume from hibernation
[ 1.391861] random: crng reseeded on system resumption
[ 1.391927] Freezing user space processes
[ 1.392984] Freezing user space processes completed (elapsed 0.001 seconds)
[ 1.393473] OOM killer disabled.
[ 1.393486] Freezing remaining freezable tasks
[ 1.395012] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[ 1.400817] PM: Using 1 thread(s) for lzo decompression
[ 1.400832] PM: Loading and decompressing image data (115654 pages)...
[ 1.400836] hibernate: Hibernated on CPU 0 [mpidr:0x0]
[ 1.438621] PM: Image loading progress: 0%
[ 1.554623] PM: Image loading progress: 10%
[ 1.594714] PM: Image loading progress: 20%
[ 1.639317] PM: Image loading progress: 30%
[ 1.683055] PM: Image loading progress: 40%
[ 1.720726] PM: Image loading progress: 50%
[ 1.768878] PM: Image loading progress: 60%
[ 1.800203] PM: Image loading progress: 70%
[ 1.822833] PM: Image loading progress: 80%
[ 1.840985] PM: Image loading progress: 90%
[ 1.871253] PM: Image loading progress: 100%
[ 1.871611] PM: Image loading done
[ 1.871617] PM: hibernation: Read 462616 kbytes in 0.47 seconds (984.28 MB/s)
[ 42.378350] Disabling non-boot CPUs ...
[ 42.378363] its_save_disable
[ 42.378363] its_build_mapd_cmd dev 0x10 valid 0 addr 0x10f010000
[ 42.378363] PM: hibernation: Creating image:
[ 42.378363] PM: hibernation: Need to copy 153098 pages
[ 42.378363] hibernate: Restored 0 MTE pages
[ 42.378363] its_restore_enable
[ 42.378363] its_build_mapd_cmd dev 0x10 valid 1 addr 0x10f010000
[ 42.417445] OOM killer enabled.
[ 42.417455] Restarting tasks: Starting
[ 42.419915] nvme nvme0: 1/0/0 default/read/poll queues
[ 42.420407] Restarting tasks: Done
[ 42.420781] PM: hibernation: hibernation exit
[ 42.421149] nvme nvme0: Ignoring bogus Namespace Identifiers
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)
Powered by blists - more mailing lists