lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230608120021.3273400-1-jgowans@amazon.com>
Date:   Thu, 8 Jun 2023 14:00:18 +0200
From:   James Gowans <jgowans@...zon.com>
To:     Thomas Gleixner <tglx@...utronix.de>, <liaochang1@...wei.com>,
        Marc Zyngier <maz@...nel.org>
CC:     <linux-kernel@...r.kernel.org>, James Gowans <jgowans@...zon.com>
Subject: [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke

If interrupts do not have global active states it is possible for
the next interrupt to arrive on a new CPU if an affinity change happens
while the original CPU is still running the handler. This specifically
impacts GIC-v3.

In this series, generic functionality is added to handle_fast_eoi() to
support resending the interrupt when this race happens, and that generic
functionality is enabled specifically for the GIC-v3 which is impacted
by this issue. GIC-v3 uses the handle_fast_eoi() generic handler, hence
that is the handler getting the functionality.

Also adding a bit more details to the IRQD flags docs to help future
readers know when/why flags should be used and what they mean.

== Testing: ==

TL;DR: Run a virt using QEMU on a EC2 R6g.metal host with a ENA device
passed through using VFIO - bounce IRQ affinity between two CPUs. Before
this change an interrupt can get lost and the device stalls; after this
change the interrupt is not lost.

=== Details: ===

Intentionally slow down the IRQ injection a bit, to turn this from a
rare race condition which to something which can easily be flushed out
in testing:

@@ -763,6 +764,7 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
        raw_spin_lock_irqsave(&irq->irq_lock, flags);
        irq->pending_latch = true;
        vgic_queue_irq_unlock(kvm, irq, flags);
+       udelay(10);

        return 0;
 }

Also sprinkle a print to make it clear when the race described here is
hit:

@@ -698,6 +698,7 @@ void handle_fasteoi_irq(struct irq_desc *desc)
         * handling the previous one - it may need to be resent.
         */
        if (!irq_may_run(desc)) {
+               printk("!irq_may_run %i\n", desc->irq_data.irq);
                if (irqd_needs_resend_when_in_progress(&desc->irq_data))
                        desc->istate |= IRQS_PENDING;
                goto out;

Launch QEMU in your favourite way, with an ENA device passed through via
VFIO (VFIO driver re-binding needs to be done before this):

qemu-system-aarch64 -enable-kvm  -machine virt,gic_version=3 -device vfio-pci,host=04:00.0 ...

In the VM, generate network traffic to get interrupts flowing:

ping -f -i 0.001 10.0.3.1 > /dev/null

On the host, change affinity of the interrupt around to flush out the race:

while true; do
	echo 1 > /proc/irq/71/smp_affinity ; sleep 0.01;
	echo 2 > /proc/irq/71/smp_affinity ; sleep 0.01;
done

In host dmesg the printk indicates that the race is hit:

[  102.215801] !irq_may_run 71
[  105.426413] !irq_may_run 71
[  105.586462] !irq_may_run 71

Before this change, an interrupt is lost and this manifests as a driver
watchdog timeout in the guest device driver:

[   35.124441] ena 0000:00:02.0 enp0s2: Found a Tx that wasn't completed on time,...
...
[   37.124459] ------------[ cut here ]------------
[   37.124791] NETDEV WATCHDOG: enp0s2 (ena): transmit queue 0 timed out

After this change, even though the !irq_may_run print is still shown
(indicating that the race is still hit) the driver no longer times out
because the interrupt now gets resent when the race occurs.

James Gowans (3):
  genirq: Expand doc for PENDING and REPLAY flags
  genirq: fasteoi supports resend on concurrent invoke
  irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs

 drivers/irqchip/irq-gic-v3-its.c |  2 ++
 include/linux/irq.h              | 13 +++++++++++++
 kernel/irq/chip.c                | 16 +++++++++++++++-
 kernel/irq/debugfs.c             |  2 ++
 kernel/irq/internals.h           |  7 +++++--
 5 files changed, 37 insertions(+), 3 deletions(-)


base-commit: 5f63595ebd82f56a2dd36ca013dd7f5ff2e2416a
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ