linux-kernel - Re: [PATCH 15/18] irqchip/apple-aic: Add support for the Apple Interrupt Controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <1a8d801d-3b02-8909-52e2-28ca5d67f48e@marcan.st>
Date:   Tue, 9 Feb 2021 15:20:29 +0900
From:   Hector Martin <marcan@...can.st>
To:     Marc Zyngier <maz@...nel.org>
Cc:     soc@...nel.org, linux-arm-kernel@...ts.infradead.org,
        robh+dt@...nel.org, Arnd Bergmann <arnd@...nel.org>,
        linux-kernel@...r.kernel.org, devicetree@...r.kernel.org,
        Olof Johansson <olof@...om.net>
Subject: Re: [PATCH 15/18] irqchip/apple-aic: Add support for the Apple
 Interrupt Controller

On 08/02/2021 18.25, Marc Zyngier wrote:
> I really do not want to expose IPIs in the DT. The OS defines what
> IPIs are used for, not the firmware/HW. No other platform requires
> this either, so is there any reason to do so?

This is used internally by the chained IPI driver (patch #16), but it 
does not need to be ever used in the DT. I guess it would be appropriate 
to just not document it in the bindings, and also use a higher type than 
2 (e.g. 0xff), so that if we ever have to add another type to the 
binding (e.g. the timer on older SoCs) it doesn't have to skip the 
number 2 to avoid breaking compat between newer DTs and older drivers.

See irq-bcm2836.c for the same approach: a chained IPI controller using 
an otherwise undocumented IRQ binding.

Another approach is to do what irq-armada-370-xp.c does: just ditch the 
chained irqchip and call handle_domain_irq into the IPI domain directly 
from the main IRQ handler function.

>> +#include <linux/irqchip/chained_irq.h>
> 
> There isn't any chained interrupt controller here, AFAICT.

This goes with patch #16, I'll move it there.

> If these functions have no impact on the per-CPU interrupts, then
> maybe these interrupts should be given a different irqchip.

Same IRQ domain, different irqchip? That sounds reasonable and gets rid 
of the bounds check on the mask/unmask calls, I'll do it for v2. This 
chip would apply for both IPIs (where a different register set in AIC 
for  masking/unmasking applies, but that is handled at the chained 
irqchip level in #16) and FIQs (which have no masking).

>> +static void aic_irq_eoi(struct irq_data *d)
>> +{
>> +	/*
>> +	 * Reading the interrupt reason automatically acknowledges and masks
>> +	 * the IRQ, so we just unmask it here if needed.
>> +	 */
>> +	if (!irqd_irq_disabled(d) && !irqd_irq_masked(d))
>> +		aic_irq_unmask(d);
> 
> This doesn't apply to per-CPU interrupts, right? Or does it?

The auto-masking does apply to IPIs, but this code doesn't do the 
unmasking. That is handled in the chained IPI domain in #16

*Strictly speaking* if we separate the responsibility at AIC for the 
root handler and say the chained handler should purely be a multiplexer 
for IPIs that doesn't touch the hardware at all, then the 
masking/unmasking should move here (into another irqchip) and the IPI 
domain code should just call into the root domain to mask/unmask the 
sole used hardware IPI; the current approach is a minor layering 
violation but... I'm not sure how useful it is to keep the layering 
pristine when both drivers live in the same file and are instantiated 
together anyway.

If I switch to the irq-armada-370-xp.c model where there is no logical 
chaining, then this would be fine as-is as both domains would logically 
represent driving different parts of AIC in parallel, with no nesting 
relationship.

>> +		u32 type = event >> 16, irq = event & 0xffff;
> 
> Nit: please consider introducing masks and using the bitfield macros
> to extract the various fields.

Ack, will do.

>> +		/* AIC_EVENT is read-sensitive, ensure it happens before we proceed */
>> +		isb();
> 
> You seem to have a data dependency after this, so I can't see how the
> ISB influences the read from AIC_EVENT. However you need to order it
> with the read from the timer registers, and I believe it'd be better
> to move the barrier there.

(Keeping the barrier story in the other thread)

>> +		if (type == AIC_EVENT_TYPE_HW) {
>> +			handle_domain_irq(aic_irqc->hw_domain, irq, regs);
>> +		} else if (type == AIC_EVENT_TYPE_IPI) {
>> +			handle_domain_irq(aic_irqc->hw_domain,
>> +					  ic->nr_hw + AIC_NR_FIQ + irq - 1, regs);
> 
> nit: it would be slightly less cumbersome to compute the hwirq in a
> switch, and have a single call to handle_domain_irq().
> 
> I also wonder whether using two top-level domains would be better. Not
> a big deal though.

Exactly :) I can certainly switch to that if you have no objection. It 
should have lower overhead for IPIs anyway, and removes the fwspec step 
to glue it all together.

>> +		} else {
>> +			pr_err("spurious IRQ event %d, %d\n", type, irq);
> 
> Spurious interrupts aren't an error, in general. If you really want to
> keep this, at the very least make it rate-limited.

In this case it's more like "unknown IRQ event", which better be an 
error because it means the chip is doing something we don't know about - 
*except* the zero case, that's just a spurious IRQ which indeed isn't an 
error (peripheral asserted and deasserted IRQ before we could handle it; 
I need to check but I believe AIC would just withdraw the event in that 
case).

So the zero case should be ignored and the unknown case should be fine 
without rate limiting, because it really shouldn't happen, ever. I'll 
fix the zero case for v2.

> Consider turning the whole thing into a do{}while() so that there is
> only a single read of AIC_EVENT in the function.

Ack.

>> +	/*
>> +	 * It would be really nice to find a system register that lets us get the FIQ source
>> +	 * state without having to peek down into clients...
>> +	 */
> 
> nit: please try to keep comments within the 80 cols limit. I don't
> mind code being wider, but comments benefit from being more rigorously
> structured.

Ack, I'll keep it in mind. For this one I just used clang-format as a 
first-pass and made some minor changes, so please do point out any other 
style nits I missed so I can keep it in check. I know it doesn't enforce 
nor fully represent kernel style.

> And yes, having to poll each end-point IP is really a drag. How does
> the PMU work on this system? Is there any other per-CPU source?

PMU also ends up in FIQ, and it's nonstandard (not the ARM Performance 
Monitors extension). That means one more FIQ source is going to have to 
end up here, and one more downstream register to poll (a proprietary one 
in this case: SYS_PMCR0 is s3_1_c15_c0_0, not to be confused with the 
standard SYS_PMCR_EL0 which is completely different).

So the FIQ sources to be polled are the following:

1. HV timers
2. Guest timers (auto-masked, mask register TBD, I still have no idea 
how XNU routes these to the hypervisor framework... haven't found the 
code yet, or the mask register)
3. Fast IPIs (not currently implemented)
4 Apple PMU

We have #1, I'll look for the mask bits to properly implement #2, #3 can 
wait until we actually implement those, and #4 can wait until we 
implement that PMU. Does that sound OK?

If you think it's worth it, I could at least check the status registers 
for #3 and #4 and yell loudly, so that if we somehow end up with a FIQ 
storm because those paths went off unchecked, at least we have logs. Or 
just make sure to mask them in the AIC init code, or both.

Since these are per-CPU, setting the masks is a per-cpu call, so that 
needs to go via something like cpuhp_setup_state_nocalls to run on CPU 
bring-up.

That said: PMC seems to have IRQ settings to go via AIC instead of FIQ, 
but I have no idea if that works on these CPUs; XNU only uses it on 
older ones. That should probably be investigated later.

> This system runs VHE, so there is also CNT{P,V}_CTL_EL02 to consider.
> But I really wonder how the whole thing works once these two timers
> are assigned to a guest. Somehow, something must control the masking,
> otherwise you wouldn't be able to enter a guest with a timer firing.

Yeah, as I mentioned on IRC, there is auto-masking for the guest timers, 
somehow. I'll find that mask bit.

> It also means that there is no way to have threaded per-CPU
> interrupts, which means no Preempt-RT. You could wire the mask/unmask
> callbacks to mess with the IMASK bit in individual timers, but that
> doesn't solve the problem for guests.

For HV timers, we're probably have to mess with IMASK here if there is 
no other way... I need to read through the timer code and make sure that 
wouldn't confuse it, as that creates a bit of an implicit contract here.

It'd be great if I can find a true status/mask register for FIQs, but 
I'm not holding my breath that it exists...

> Are all interrupts level? How are MSIs implemented?

Seems a fixed set of MSIs are routed into AIC, presumably transformed 
into level lines? I need to look into this part in more detail.

>> +	irqc->hw_domain =
>> +		irq_domain_add_linear(node,
>> +				      irqc->nr_hw + AIC_NR_FIQ + AIC_NR_IPI,
>> +				      &aic_irq_domain_ops, irqc);
> 
> Please keep assignments on a single line.

I think this one was one of those clang-format things :-).

I'll fix it and watch out for similar things.

>> +	for (i = 0; i < BITS_TO_LONGS(irqc->nr_hw); i++)
> 
> long is 64bit on arm64, so this loop is unlikely to do what you
> want. Consider using BITS_TO_U32.

Ha, nice catch. Thanks!

-- 
Hector Martin (marcan@...can.st)
Public Key: https://mrcn.st/pub