lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13ba4125-8972-4ddb-b630-96e89bdd7248@samsung.com>
Date: Thu, 20 Nov 2025 17:34:43 +0100
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Thomas Gleixner <tglx@...utronix.de>, Frederic Weisbecker
	<frederic@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Marco Crivellari
	<marco.crivellari@...e.com>, Waiman Long <llong@...hat.com>,
	cgroups@...r.kernel.org
Subject: Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated
 partitions

On 20.11.2025 16:00, Thomas Gleixner wrote:
> On Thu, Nov 20 2025 at 14:20, Frederic Weisbecker wrote:
>> Le Thu, Nov 20, 2025 at 12:51:31PM +0100, Marek Szyprowski a écrit :
>>> On 18.11.2025 15:30, Frederic Weisbecker wrote:
>>>> When a cpuset isolated partition is created / updated or destroyed,
>>>> the IRQ threads are affine blindly to all the non-isolated CPUs. And
>>>> this happens without taking into account the IRQ thread initial
>>>> affinity that becomes ignored.
>>>>
>>>> For example in a system with 8 CPUs, if an IRQ and its kthread are
>>>> initially affine to CPU 5, creating an isolated partition with only
>>>> CPU 2 inside will eventually end up affining the IRQ kthread to all
>>>> CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
>>>> CPU 5.
>>>>
>>>> Besides the blind re-affinity, this doesn't take care of the actual
>>>> low level interrupt which isn't migrated. As of today the only way to
>>>> isolate non managed interrupts, along with their kthreads, is to
>>>> overwrite their affinity separately, for example through /proc/irq/
>>>>
>>>> To avoid doing that manually, future development should focus on
>>>> updating the IRQs affinity whenever cpuset isolated partitions are
>>>> updated.
>>>>
>>>> In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
>>>> To prevent from that, set the PF_NO_SETAFFINITY flag to them.
>>>>
>>>> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
>>> This patch landed in today's linux-next as commit 844dcacab287 ("genirq:
>>> Fix interrupt threads affinity vs. cpuset isolated partitions"). In my
>>> tests I found that it triggers a warnings on some of my test systems.
>>> This is example of such warning:
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 1 at kernel/kthread.c:599 kthread_bind_mask+0x2c/0x84
>> Erm, does this means that the IRQ thread got awaken before the first official
>> wakeup in wake_up_and_wait_for_irq_thread_ready()? This looks wrong...
>>
>> irq_startup() may be called on a few occcasions before. So perhaps
>> the IRQ already fired and woke up the kthread once before the "official"
>> first wake up?
>>
>> There seem to be some initialization ordering issue here...
> That's unavoidable because of locking and hen and egg ordering problems.
>
> When the thread is created the interrupt is not yet started up and
> therefore effective affinity is not known. So doing a bind there is
> pointless.
>
> The action has to be made visible _before_ starting it up as once the
> interrupt is unmasked it might fire. That's even more true for shared
> interrupts.
>
> The wakeup/bind/... muck cannot be invoked with the descriptor lock
> held, so it has to be done _after_ the thing went live.
>
> So yes an interrupt which fires before kthread_bind() is invoked might
> exactly have that effect. It wakes it from the kthread() wait and it
> goes straight through to the thread function.
>
> That's why the original code just set the affinity bit and let the
> thread itself handle it.
>
> There are three options to solve that:
>
>        1) Bind the thread right after kthread_create() to a random
>           housekeeping task and let move itself over to the real place
>           once it came up (at this point the affinity is established)
>
>        2) Serialize the setup so that the thread (even, when woken up
>           early) get's stuck in UNINTERRUPTIBLE state _before_ it can
>           reach wait_for_interrupt() which waits with INTERRUPTIBLE
>
>        3) Teach kthread_create() that the thread is subject to being
>           bound to something later on and implement that serialization
>           there.
>
> #1 is pretty straight forward. See untested below.

This fixes the observed issue. The only problem with that patch is 
incorrect set of arguments to kthread_create_on_cpu() (it doesn't 
support printf-style args).

If that matters:

Tested-by: Marek Szyprowski <m.szyprowski@...sung.com>


> #2 is ugly (I tried)
>
> #3 could be useful in general, but that needs more thoughts
>
> Thanks
>
>          tglx
> ---
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -133,7 +133,15 @@ void __irq_wake_thread(struct irq_desc *
>   	 */
>   	atomic_inc(&desc->threads_active);
>   
> -	wake_up_process(action->thread);
> +	/*
> +	 * This might be a premature wakeup before the thread reached the
> +	 * thread function and set the IRQTF_READY bit. It's waiting in
> +	 * kthread code with state UNINTERRUPTIBLE. Once it reaches the
> +	 * thread function it waits with INTERRUPTIBLE. The wakeup is not
> +	 * lost in that case because the thread is guaranteed to observe
> +	 * the RUN flag before it goes to sleep in wait_for_interrupt().
> +	 */
> +	wake_up_state(action->thread, TASK_INTERRUPTIBLE);
>   }
>   
>   static DEFINE_STATIC_KEY_FALSE(irqhandler_duration_check_enabled);
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1026,16 +1026,8 @@ static void irq_thread_check_affinity(st
>   	set_cpus_allowed_ptr(current, mask);
>   	free_cpumask_var(mask);
>   }
> -
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc)
> -{
> -	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
> -}
>   #else
>   static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc) { }
>   #endif
>   
>   static int irq_wait_for_interrupt(struct irq_desc *desc,
> @@ -1220,7 +1212,6 @@ static void wake_up_and_wait_for_irq_thr
>   	if (!action || !action->thread)
>   		return;
>   
> -	irq_thread_set_affinity(action->thread, desc);
>   	wake_up_process(action->thread);
>   	wait_event(desc->wait_for_threads,
>   		   test_bit(IRQTF_READY, &action->thread_flags));
> @@ -1389,26 +1380,35 @@ static void irq_nmi_teardown(struct irq_
>   static int
>   setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
>   {
> +	/*
> +	 * At this point interrupt affinity is not known. just assume that
> +	 * the current CPU is not isolated and valid to bring the thread
> +	 * up. The thread will move itself over to the right place once the
> +	 * whole setup is complete.
> +	 */
> +	unsigned int cpu = raw_smp_processor_id();
>   	struct task_struct *t;
>   
> -	if (!secondary) {
> -		t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
> -				   new->name);
> -	} else {
> -		t = kthread_create(irq_thread, new, "irq/%d-s-%s", irq,
> -				   new->name);
> -	}
> +	if (!secondary)
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-%s", irq, new->name);
> +	else
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-s-%s", irq, new->name);
>   
>   	if (IS_ERR(t))
>   		return PTR_ERR(t);
>   
>   	/*
> -	 * We keep the reference to the task struct even if
> -	 * the thread dies to avoid that the interrupt code
> -	 * references an already freed task_struct.
> +	 * We keep the reference to the task struct even if the thread dies
> +	 * to avoid that the interrupt code references an already freed
> +	 * task_struct.
>   	 */
>   	new->thread = get_task_struct(t);
>   
> +	/*
> +	 * Ensure the thread adjusts the affinity once it reaches the
> +	 * thread function.
> +	 */
> +	new->thread_flags = BIT(IRQTF_AFFINITY);
>   	return 0;
>   }
>   
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ