linux-kernel - [RFC PATCH] irqchip/gic: Implement irq_chip->irq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200710145642.28978-1-valentin.schneider@arm.com>
Date:   Fri, 10 Jul 2020 15:56:42 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Jason Cooper <jason@...edaemon.net>,
        Marc Zyngier <maz@...nel.org>,
        Lorenzo Pieralisi <Lorenzo.Pieralisi@....com>
Subject: [RFC PATCH] irqchip/gic: Implement irq_chip->irq_retrigger()

While digging around IRQCHIP_EOI_IF_HANDLED and irq/resend.c, it has come
to my attention that the IRQ resend situation seems a bit precarious for
the GIC(s). Below is a (somewhat structured) dump of my notes/thoughts
about that.

IRQCHIP_EOI_IF_HANDLED
======================

If the irqchip doesn't have this flag set, we may issue an irq_eoi()
despite not having handled the IRQ in the following cases:

o !irq_may_run()
  - IRQ may be in progress (handle_irq_event() ongoing)
  - IRQ is an armed wakeup source (also sets it pending)

o !action or IRQ disabled; in this case it is set as pending.

irq_resend()
============

For the above cases where the IRQ is marked as pending, it means that when
we'll go through check_irq_resend() (e.g. when we re-enable the IRQ) we
will end up in resend_irqs() which goes through the flow handler again, IOW
will issue *another* EOI on the same IRQ.

Now, this is all done via a tasklet, so AFAICT cannot happen in irq
context (as defined by in_interrupt() - it can happen at the tail of
handling an IRQ if it wasn't nesting).

GIC woes
========

The TL;DR for IRQ handling on the GIC is that we have 3 steps:
o Acknowledgement (Ack)
o priority drop (PD)
o deactivation (D)

The GIC also has an "eoimode" knob that says whether PD and D are split, IOW:
o eoimode=0: irq_eoi() does PD + D
o eoimode=1: irq_eoi() does D

Regardless of the mode, it is paramount that any PD is
o issued on the same CPU that Ack'd the IRQ
o issued in reverse order of the Acks

IHI0069D, 4.1.1 Physical CPU interface, Priority drop
"""
A priority drop must be performed by the same PE that activated the
interrupt.

[...]

For each CPU interface, the GIC architecture requires the order of the
valid writes to ICC_EOIR0_EL1 and ICC_EOIR1_EL1 to be the exact reverse of
the order of the reads from ICC_IAR0_EL1 and ICC_IAR1_EL1
"""

IHI0069D, 8.2.9 ICC_EOIR0_EL1, Interrupt Controller End Of Interrupt Register 0
"""
A write to this register must correspond to the most recent valid read by
this PE from an Interrupt Acknowledge Register, and must correspond to the
INTID that was read from ICC_IAR0_EL1, otherwise the system behavior is
UNPREDICTABLE.
"""

For eoimode=1, the PD is hidden from genirq and is contained within the GIC
driver. This means that issuing another irq_eoi() will only be re-issuing a
D, which I think the GIC can live with (even if issued from a different CPU).

For eoimode=0, it is more troubling, as we break the aforementioned
restrictions. That said, IIUC this cannot cause e.g. a bogus running
priority reduction due to the tasklet context mentioned above (running
priority ought to be idle priority).

Linux hosts will almost always use eoimode=1, so that leaves us with
guests running with eoimode=0, and I don't know how common it is (if at all
possible) for those to use pm / wakeup IRQs. I suppose that is a reason
why this hasn't cropped up before, that or I miserably misunderstood the
whole thing.

In any case, the virtual interface follows the same restrictions wrt
PD ordering:

IHI0069D 8.3.7 ICV_EOIR0_EL1, Interrupt Controller Virtual End Of Interrupt Register 0
"""
A write to this register must correspond to the most recent valid read by
this vPE from a Virtual Interrupt Acknowledge Register, and must correspond
to the INTID that was read from ICV_IAR0_EL1, otherwise the system behavior
is UNPREDICTABLE.
"""

Change
======

One way to ensure we only get one PD per interrupt activation and maintain
the expected ordering is to add the IRQCHIP_EOI_IF_HANDLED flag to all
irq-gic chips, but that only really works for eoimode=1; for eoimode=0 that
would mean leaving the flow handler without having issued a PD at all.

At the same time, this whole IRQS_PENDING & resend thing feels like
something we can handle in hardware: we can leverage
irq_chip.irq_retrigger() and use this to mark the interrupt as pending in
the GIC itself, in which case we *don't* want to have
IRQCHIP_EOI_IF_HANDLED (as the retrigger will lead to another ack+eoi
pair).

Implement irq_chip.irq_retrigger() for both GICs.

Signed-off-by: Valentin Schneider <valentin.schneider@....com>
---
 drivers/irqchip/irq-gic-v3.c | 7 +++++++
 drivers/irqchip/irq-gic.c    | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index cc46bc2d634b..c025e8b51464 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1207,6 +1207,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 #define gic_smp_init()		do { } while(0)
 #endif
 
+static int gic_retrigger(struct irq_data *data)
+{
+	return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);
+}
+
 #ifdef CONFIG_CPU_PM
 static int gic_cpu_pm_notifier(struct notifier_block *self,
 			       unsigned long cmd, void *v)
@@ -1242,6 +1247,7 @@ static struct irq_chip gic_chip = {
 	.irq_eoi		= gic_eoi_irq,
 	.irq_set_type		= gic_set_type,
 	.irq_set_affinity	= gic_set_affinity,
+	.irq_retrigger          = gic_retrigger,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
 	.irq_nmi_setup		= gic_irq_nmi_setup,
@@ -1258,6 +1264,7 @@ static struct irq_chip gic_eoimode1_chip = {
 	.irq_eoi		= gic_eoimode1_eoi_irq,
 	.irq_set_type		= gic_set_type,
 	.irq_set_affinity	= gic_set_affinity,
+	.irq_retrigger          = gic_retrigger,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
 	.irq_set_vcpu_affinity	= gic_irq_set_vcpu_affinity,
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 00de05abd3c3..33ce025868d0 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -355,6 +355,11 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 }
 #endif
 
+static int gic_retrigger(struct irq_data *data)
+{
+	return gic_irq_set_irqchip_state(data, IRQCHIP_STATE_PENDING, true);
+}
+
 static void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
 {
 	u32 irqstat, irqnr;
@@ -425,6 +430,7 @@ static const struct irq_chip gic_chip = {
 	.irq_unmask		= gic_unmask_irq,
 	.irq_eoi		= gic_eoi_irq,
 	.irq_set_type		= gic_set_type,
+	.irq_retrigger          = gic_retrigger,
 	.irq_get_irqchip_state	= gic_irq_get_irqchip_state,
 	.irq_set_irqchip_state	= gic_irq_set_irqchip_state,
 	.flags			= IRQCHIP_SET_TYPE_MASKED |
-- 
2.27.0