linux-kernel - Re: [PATCH v1 1/3] irqchip/gic: enable irq target all

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87r1o8ea6s.fsf@nanos.tec.linutronix.de>
Date:   Wed, 02 Dec 2020 12:09:31 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Hanks Chen <hanks.chen@...iatek.com>, Marc Zyngier <maz@...nel.org>
Cc:     Mark Rutland <mark.rutland@....com>,
        CC Hwang <cc.hwang@...iatek.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Kuohong Wang <kuohong.wang@...iatek.com>,
        Russell King <linux@...linux.org.uk>,
        linux-kernel@...r.kernel.org, linux-mediatek@...ts.infradead.org,
        Loda Chou <loda.chou@...iatek.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        Will Deacon <will@...nel.org>,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v1 1/3] irqchip/gic: enable irq target all

Hanks,

On Tue, Dec 01 2020 at 21:54, Hanks Chen wrote:
> On Fri, 2020-11-27 at 18:11 +0000, Marc Zyngier wrote:
>> On 2020-11-27 14:15, Hanks Chen wrote:
>> > +	/*
>> > +	 * No move required, if interrupt is 1 of N IRQ.
>> > +	 * write current cpu_online_mask into affinity mask.
>> > +	 */
>> > +	if (cpumask_weight(desc->irq_common_data.affinity) > 1) {
>> > +		cpumask_copy(desc->irq_common_data.affinity, cpu_online_mask);
>> 
>> Again, this is totally bogus.
>> 
>
> If I add the target all flag into irq_common_data stucture to
> distinguish it. Would this be better?
>
> remove #ifdef CONFIG_ARM_IRQ_TARGET_ALL
>
> and
>
> if (desc->irq_common_data.irq_target_all)
> 	cpumask_copy(desc->irq_common_data.affinity, cpu_online_mask);

No. We are not adding random members to irq_common_data just to support
this. If at all this is a flag of the interrupt chip.

Also this copy is wrong to begin with. The affinity mask is only updated
on cpu hot-unplug when the outgoing CPU is the last online CPU in the
mask. Then we break the user/kernel supplied affinity mask and we only
do that when the interrupt is not affinity managed.

We do not remove offline CPUs from the affinity mask. The affinity code
of the irq chip has to ensure that only online CPUs can be
targeted. That's what ends up in effective_affinity.

>> > +#ifdef CONFIG_ARM_IRQ_TARGET_ALL
>> > +	/*
>> > +	 * No restore required, if interrupt is 1 of N IRQ.
>> > +	 */
>> > +	if (cpumask_weight(affinity) > 1) {
>> > +		cpumask_set_cpu(cpu, irq_data_get_affinity_mask(data));
>> > +		return;
>> > +	}

Heck no. This breaks managed interrupts and some more. Fiddling with the
irq affinity mask is wrong to begin with. Don't touch it ever.

>> > --- a/kernel/irq/manage.c
>> > +++ b/kernel/irq/manage.c
>> > @@ -270,7 +270,14 @@ int irq_do_set_affinity(struct irq_data *data,
>> > const struct cpumask *mask,
>> >  	switch (ret) {
>> >  	case IRQ_SET_MASK_OK:
>> >  	case IRQ_SET_MASK_OK_DONE:
>> > +#ifndef CONFIG_ARM_IRQ_TARGET_ALL
>> >  		cpumask_copy(desc->irq_common_data.affinity, mask);
>> > +#else
>> > +		if (cpumask_weight(mask) > 1)
>> > +			cpumask_copy(desc->irq_common_data.affinity, cpu_online_mask);
>> > +		else
>> > +			cpumask_copy(desc->irq_common_data.affinity, mask);

Again, no. Touching the affinity mask is a NONO. The affinity mask is
handed in from either the kernel or from user space. This code has no
business to override that.

The only case where the kernel touches it is on CPU hot-unplug when the
last cpu in the affinity mask goes offline and if the interrupt is not
affinity managed.

>> - You pollute the core code with hacks that should never be there. If 
>> the
>>    current behaviour is a problem, please state what the problem is.
>> 
> We found the RCU warn when IRQs target to a offline CPU without I bit
> masked in CPU hot-plug flow
> I'll reproduce the issue again and show the log analysis for it.

That has absolutely nothing to do with any of the functions where you
fiddle with the affinity mask.

Thanks,

        tglx