linux-kernel - Re: [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d08bc249fcf25ab88ded1578e79997a25ab6ba93.camel@amazon.com>
Date:   Fri, 16 Jun 2023 08:32:30 +0000
From:   "Gowans, James" <jgowans@...zon.com>
To:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "maz@...nel.org" <maz@...nel.org>,
        "liaochang1@...wei.com" <liaochang1@...wei.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/3] Resend GIC-v3 LPIs on concurrent invoke

Hi Marc and Tomas,
Just a ping on this series; would be great to get any more feedback, or
get this merged.

Thanks!
James

On Thu, 2023-06-08 at 14:00 +0200, James Gowans wrote:
> If interrupts do not have global active states it is possible for
> the next interrupt to arrive on a new CPU if an affinity change happens
> while the original CPU is still running the handler. This specifically
> impacts GIC-v3.
> 
> In this series, generic functionality is added to handle_fast_eoi() to
> support resending the interrupt when this race happens, and that generic
> functionality is enabled specifically for the GIC-v3 which is impacted
> by this issue. GIC-v3 uses the handle_fast_eoi() generic handler, hence
> that is the handler getting the functionality.
> 
> Also adding a bit more details to the IRQD flags docs to help future
> readers know when/why flags should be used and what they mean.
> 
> == Testing: ==
> 
> TL;DR: Run a virt using QEMU on a EC2 R6g.metal host with a ENA device
> passed through using VFIO - bounce IRQ affinity between two CPUs. Before
> this change an interrupt can get lost and the device stalls; after this
> change the interrupt is not lost.
> 
> === Details: ===
> 
> Intentionally slow down the IRQ injection a bit, to turn this from a
> rare race condition which to something which can easily be flushed out
> in testing:
> 
> @@ -763,6 +764,7 @@ int vgic_its_inject_cached_translation(struct kvm *kvm, struct kvm_msi *msi)
>         raw_spin_lock_irqsave(&irq->irq_lock, flags);
>         irq->pending_latch = true;
>         vgic_queue_irq_unlock(kvm, irq, flags);
> +       udelay(10);
> 
>         return 0;
>  }
> 
> Also sprinkle a print to make it clear when the race described here is
> hit:
> 
> @@ -698,6 +698,7 @@ void handle_fasteoi_irq(struct irq_desc *desc)
>          * handling the previous one - it may need to be resent.
>          */
>         if (!irq_may_run(desc)) {
> +               printk("!irq_may_run %i\n", desc->irq_data.irq);
>                 if (irqd_needs_resend_when_in_progress(&desc->irq_data))
>                         desc->istate |= IRQS_PENDING;
>                 goto out;
> 
> Launch QEMU in your favourite way, with an ENA device passed through via
> VFIO (VFIO driver re-binding needs to be done before this):
> 
> qemu-system-aarch64 -enable-kvm  -machine virt,gic_version=3 -device vfio-pci,host=04:00.0 ...
> 
> In the VM, generate network traffic to get interrupts flowing:
> 
> ping -f -i 0.001 10.0.3.1 > /dev/null
> 
> On the host, change affinity of the interrupt around to flush out the race:
> 
> while true; do
> 	echo 1 > /proc/irq/71/smp_affinity ; sleep 0.01;
> 	echo 2 > /proc/irq/71/smp_affinity ; sleep 0.01;
> done
> 
> In host dmesg the printk indicates that the race is hit:
> 
> [  102.215801] !irq_may_run 71
> [  105.426413] !irq_may_run 71
> [  105.586462] !irq_may_run 71
> 
> Before this change, an interrupt is lost and this manifests as a driver
> watchdog timeout in the guest device driver:
> 
> [   35.124441] ena 0000:00:02.0 enp0s2: Found a Tx that wasn't completed on time,...
> ...
> [   37.124459] ------------[ cut here ]------------
> [   37.124791] NETDEV WATCHDOG: enp0s2 (ena): transmit queue 0 timed out
> 
> After this change, even though the !irq_may_run print is still shown
> (indicating that the race is still hit) the driver no longer times out
> because the interrupt now gets resent when the race occurs.
> 
> James Gowans (3):
>   genirq: Expand doc for PENDING and REPLAY flags
>   genirq: fasteoi supports resend on concurrent invoke
>   irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
> 
>  drivers/irqchip/irq-gic-v3-its.c |  2 ++
>  include/linux/irq.h              | 13 +++++++++++++
>  kernel/irq/chip.c                | 16 +++++++++++++++-
>  kernel/irq/debugfs.c             |  2 ++
>  kernel/irq/internals.h           |  7 +++++--
>  5 files changed, 37 insertions(+), 3 deletions(-)
> 
> 
> base-commit: 5f63595ebd82f56a2dd36ca013dd7f5ff2e2416a