linux-kernel - Re: [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs according to change of housekeeping CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87h69uyfx9.ffs@tglx>
Date: Wed, 02 Oct 2024 12:09:38 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Costa Shulyupin <costa.shul@...hat.com>, longman@...hat.com,
 ming.lei@...hat.com, pauld@...hat.com, juri.lelli@...hat.com,
 vschneid@...hat.com, Michael Ellerman <mpe@...erman.id.au>, Nicholas
 Piggin <npiggin@...il.com>, Christophe Leroy
 <christophe.leroy@...roup.eu>, Naveen N Rao <naveen@...nel.org>, Zefan Li
 <lizefan.x@...edance.com>, Tejun Heo <tj@...nel.org>, Johannes Weiner
 <hannes@...xchg.org>, Michal Koutný <mkoutny@...e.com>,
 Ingo Molnar
 <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>, Costa Shulyupin <costa.shul@...hat.com>, Bjorn
 Helgaas <bhelgaas@...gle.com>, linuxppc-dev@...ts.ozlabs.org,
 linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [RFC PATCH v3 2/3] genirq/cpuhotplug: Adjust managed irqs
 according to change of housekeeping CPU

On Mon, Sep 16 2024 at 15:20, Costa Shulyupin wrote:

> Interrupts disturb real-time tasks on affined cpus.
> To ensure CPU isolation for real-time tasks, interrupt handling must
> be adjusted accordingly.
> Non-managed interrupts can be configured from userspace,
> while managed interrupts require adjustments in kernelspace.
>
> Adjust status of managed interrupts according change
> of housekeeping CPUs to support dynamic CPU isolation.

What means 'adjust status' ?

> +
> +/*
> + * managed_irq_isolate() - Deactivate managed interrupts if necessary
> + */
> +// derived from migrate_one_irq, irq_needs_fixup, irq_fixup_move_pending

If at all then this needs to be integrated with migrate_one_irq()

> +static int managed_irq_isolate(struct irq_desc *desc)
> +{
> +	struct irq_data *d = irq_desc_get_irq_data(desc);
> +	struct irq_chip *chip = irq_data_get_irq_chip(d);
> +	const struct cpumask *a;
> +	bool maskchip;
> +	int err;
> +
> +	/*
> +	 * Deactivate if:
> +	 * - Interrupt is managed
> +	 * - Interrupt is not per cpu
> +	 * - Interrupt is started
> +	 * - Effective affinity mask includes isolated CPUs
> +	 */
> +	if (!irqd_affinity_is_managed(d) || irqd_is_per_cpu(d) || !irqd_is_started(d)
> +	    || cpumask_subset(irq_data_get_effective_affinity_mask(d),
> +			      housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
> +		return 0;
> +	// TBD: it is required?
> +	/*
> +	 * Complete an eventually pending irq move cleanup. If this
> +	 * interrupt was moved in hard irq context, then the vectors need
> +	 * to be cleaned up. It can't wait until this interrupt actually
> +	 * happens and this CPU was involved.
> +	 */
> +	irq_force_complete_move(desc);
> +
> +	if (irqd_is_setaffinity_pending(d)) {
> +		irqd_clr_move_pending(d);
> +		if (cpumask_intersects(desc->pending_mask,
> +		    housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)))
> +			a = irq_desc_get_pending_mask(desc);
> +	} else {
> +		a = irq_data_get_affinity_mask(d);
> +	}
> +
> +	maskchip = chip->irq_mask && !irq_can_move_pcntxt(d) && !irqd_irq_masked(d);
> +	if (maskchip)
> +		chip->irq_mask(d);
> +
> +	if (!cpumask_intersects(a, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ))) {
> +		/*
> +		 * Shut managed interrupt down and leave the affinity untouched.
> +		 * The effective affinity is reset to the first online CPU.
> +		 */
> +		irqd_set_managed_shutdown(d);
> +		irq_shutdown_and_deactivate(desc);
> +		return 0;

Seriously? The interrupt is active and the queue might have outstanding
requests which will never complete because the interrupt is taken away.

On CPU hotplug the related subsystem has shut down the device queue and
drained all outstanding requests. But none of this happens here.

> +	}
> +
> +	/*
> +	 * Do not set the force argument of irq_do_set_affinity() as this
> +	 * disables the masking of offline CPUs from the supplied affinity
> +	 * mask and therefore might keep/reassign the irq to the outgoing
> +	 * CPU.

Which outgoing CPU?

> +	 */
> +	err = irq_do_set_affinity(d, a, false);
> +	if (err)
> +		pr_warn_ratelimited("IRQ%u: set affinity failed(%d).\n",
> +				    d->irq, err);
> +
> +	if (maskchip)
> +		chip->irq_unmask(d);
> +
> +	return err;
> +}
> +
> +/** managed_irq_affinity_adjust() - Deactivate of restore managed interrupts
> + * according to change of housekeeping cpumask.
> + *
> + * @enable_mask:	CPUs for which interrupts should be restored
> + */
> +int managed_irq_affinity_adjust(struct cpumask *enable_mask)
> +{
> +	unsigned int irq;
> +
> +	for_each_active_irq(irq) {

What ensures that this iteration is safe?

> +		struct irq_desc *desc = irq_to_desc(irq);

And that the descriptor is valid?

> +		unsigned int cpu;
> +
> +		for_each_cpu(cpu, enable_mask)
> +			irq_restore_affinity_of_irq(desc, cpu);

And what protects irq_restore_affinity_of_irq() against other operations
on @desc?

> +		raw_spin_lock(&desc->lock);

What disables interrupts here in the runtime case?

> +		managed_irq_isolate(desc);
> +		raw_spin_unlock(&desc->lock);
> +	}
> +
> +	return 0;

That return value has which purpose?

None of this can work at runtime.

Thanks,

        tglx