linux-kernel - Re: [PATCH v1 4/7] sched/isolation: Adjust affinity of managed irqs according to change of housekeeping cpumask

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87wmnrj4uz.ffs@tglx>
Date: Sat, 18 May 2024 03:17:24 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Costa Shulyupin <costa.shul@...hat.com>, longman@...hat.com,
 pauld@...hat.com, juri.lelli@...hat.com, prarit@...hat.com,
 vschneid@...hat.com, Anna-Maria Behnsen <anna-maria@...utronix.de>,
 Frederic Weisbecker <frederic@...nel.org>, Zefan Li
 <lizefan.x@...edance.com>, Tejun Heo <tj@...nel.org>, Johannes Weiner
 <hannes@...xchg.org>, Ingo Molnar <mingo@...hat.com>, Peter Zijlstra
 <peterz@...radead.org>, Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
 <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
 <mgorman@...e.de>, Daniel Bristot de Oliveira <bristot@...hat.com>, Petr
 Mladek <pmladek@...e.com>, Andrew Morton <akpm@...ux-foundation.org>,
 Masahiro Yamada <masahiroy@...nel.org>, Randy Dunlap
 <rdunlap@...radead.org>, Yoann Congal <yoann.congal@...le.fr>, "Gustavo A.
 R. Silva" <gustavoars@...nel.org>, Nhat Pham <nphamcs@...il.com>, Costa
 Shulyupin <costa.shul@...hat.com>, linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org
Subject: Re: [PATCH v1 4/7] sched/isolation: Adjust affinity of managed irqs
 according to change of housekeeping cpumask

On Thu, May 16 2024 at 22:04, Costa Shulyupin wrote:
> irq_affinity_adjust() is prototyped from irq_affinity_online_cpu()
> and irq_restore_affinity_of_irq().

I'm used to this prototyped phrase by now. It still does not justify to
expose me to this POC hackery.

My previous comments about change logs still apply.

> +static int irq_affinity_adjust(cpumask_var_t disable_mask)
> +{
> +	unsigned int irq;
> +	cpumask_var_t mask;
> +
> +	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	irq_lock_sparse();
> +	for_each_active_irq(irq) {
> +		struct irq_desc *desc = irq_to_desc(irq);
> +
> +		raw_spin_lock_irq(&desc->lock);

That's simply broken. This is not CPU hotplug on an outgoing CPU. Why
are you assuming that your isolation change code can rely on the
implicit guarantees of CPU hot(un)plug?

Also there is a reason why interrupt related code is in kernel/irq/* and
not in some random other location. Even if C allows you to fiddle with
everything that does not mean that hiding random hacks in other places
is correct in any way.

> +		struct irq_data *data = irq_desc_get_irq_data(desc);
> +
> +		if (irqd_affinity_is_managed(data) && cpumask_weight_and(disable_mask,
> +			irq_data_get_affinity_mask(data))) {

Interrupt target isolation is only relevant for managed interrupts and
non-managed interrupts clearly are going to migrate themself away
magically, right?

> +
> +			cpumask_and(mask, cpu_online_mask, irq_default_affinity);
> +			cpumask_and(mask, mask, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ));

There are clearly a lot of comments explaining what this is doing and
why it is correct as there is a guarantee that these masks overlap by
definition.

> +			irq_set_affinity_locked(data, mask, true);

Plus the extensive explanation why using 'force=true' is even remotely
correct here.

I conceed that the documentation of that function and its arguments is
close to non-existant, but if you follow the call chain of that function
there are enough hints down the road, no?

> +			WARN_ON(cpumask_weight_and(irq_data_get_effective_affinity_mask(data),
> +						disable_mask));
> +			WARN_ON(!cpumask_subset(irq_data_get_effective_affinity_mask(data),
> +						cpu_online_mask));
> +			WARN_ON(!cpumask_subset(irq_data_get_effective_affinity_mask(data),
> +						housekeeping_cpumask(HK_TYPE_MANAGED_IRQ)));

These warnings are required and useful within the spinlock held and
interrupt disabled section because of what?

 - Because the resulting stack trace provides a well known call chain

 - Because the resulting warnings do not tell anything about the
   affected interrupt number

 - Because the resulting warnings do not tell anything about the CPU
   masks which cause the problem

 - Because the aggregate information of the above is utterly useless

Impressive...

Thanks,

       tglx