lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aR84qyZp3PyH5xFg@localhost.localdomain>
Date: Thu, 20 Nov 2025 16:50:03 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Marek Szyprowski <m.szyprowski@...sung.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Marco Crivellari <marco.crivellari@...e.com>,
	Waiman Long <llong@...hat.com>, cgroups@...r.kernel.org
Subject: Re: [PATCH 1/2] genirq: Fix IRQ threads affinity VS cpuset isolated
 partitions

Le Thu, Nov 20, 2025 at 04:00:39PM +0100, Thomas Gleixner a écrit :
> On Thu, Nov 20 2025 at 14:20, Frederic Weisbecker wrote:
> > Erm, does this means that the IRQ thread got awaken before the first official
> > wakeup in wake_up_and_wait_for_irq_thread_ready()? This looks wrong...
> >
> > irq_startup() may be called on a few occcasions before. So perhaps
> > the IRQ already fired and woke up the kthread once before the "official"
> > first wake up?
> >
> > There seem to be some initialization ordering issue here...
> 
> That's unavoidable because of locking and hen and egg ordering problems.
> 
> When the thread is created the interrupt is not yet started up and
> therefore effective affinity is not known. So doing a bind there is
> pointless.
> 
> The action has to be made visible _before_ starting it up as once the
> interrupt is unmasked it might fire. That's even more true for shared
> interrupts.
> 
> The wakeup/bind/... muck cannot be invoked with the descriptor lock
> held, so it has to be done _after_ the thing went live.
> 
> So yes an interrupt which fires before kthread_bind() is invoked might
> exactly have that effect. It wakes it from the kthread() wait and it
> goes straight through to the thread function.
> 
> That's why the original code just set the affinity bit and let the
> thread itself handle it.
> 
> There are three options to solve that:
> 
>       1) Bind the thread right after kthread_create() to a random
>          housekeeping task and let move itself over to the real place
>          once it came up (at this point the affinity is established)
> 
>       2) Serialize the setup so that the thread (even, when woken up
>          early) get's stuck in UNINTERRUPTIBLE state _before_ it can
>          reach wait_for_interrupt() which waits with INTERRUPTIBLE
> 
>       3) Teach kthread_create() that the thread is subject to being
>          bound to something later on and implement that serialization
>          there.
> 
> #1 is pretty straight forward. See untested below.

Right.

> 
> #2 is ugly (I tried)

Ok...

> 
> #3 could be useful in general, but that needs more thoughts

A lot of thoughts yes given the many things to consider (kthread
automatic affinity handling, etc...).

And if we do that we must always affine the IRQ threads remotely
to avoid conflict with them. But that can be a problem with locking,
etc...

> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -133,7 +133,15 @@ void __irq_wake_thread(struct irq_desc *
>  	 */
>  	atomic_inc(&desc->threads_active);
>  
> -	wake_up_process(action->thread);
> +	/*
> +	 * This might be a premature wakeup before the thread reached the
> +	 * thread function and set the IRQTF_READY bit. It's waiting in
> +	 * kthread code with state UNINTERRUPTIBLE. Once it reaches the
> +	 * thread function it waits with INTERRUPTIBLE. The wakeup is not
> +	 * lost in that case because the thread is guaranteed to observe
> +	 * the RUN flag before it goes to sleep in wait_for_interrupt().
> +	 */
> +	wake_up_state(action->thread, TASK_INTERRUPTIBLE);
>  }
>  
>  static DEFINE_STATIC_KEY_FALSE(irqhandler_duration_check_enabled);
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1026,16 +1026,8 @@ static void irq_thread_check_affinity(st
>  	set_cpus_allowed_ptr(current, mask);
>  	free_cpumask_var(mask);
>  }
> -
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc)
> -{
> -	kthread_bind_mask(t, irq_data_get_effective_affinity_mask(&desc->irq_data));
> -}
>  #else
>  static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
> -static inline void irq_thread_set_affinity(struct task_struct *t,
> -					   struct irq_desc *desc) { }
>  #endif
>  
>  static int irq_wait_for_interrupt(struct irq_desc *desc,
> @@ -1220,7 +1212,6 @@ static void wake_up_and_wait_for_irq_thr
>  	if (!action || !action->thread)
>  		return;
>  
> -	irq_thread_set_affinity(action->thread, desc);
>  	wake_up_process(action->thread);
>  	wait_event(desc->wait_for_threads,
>  		   test_bit(IRQTF_READY, &action->thread_flags));
> @@ -1389,26 +1380,35 @@ static void irq_nmi_teardown(struct irq_
>  static int
>  setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
>  {
> +	/*
> +	 * At this point interrupt affinity is not known. just assume that
> +	 * the current CPU is not isolated and valid to bring the thread
> +	 * up. The thread will move itself over to the right place once the
> +	 * whole setup is complete.
> +	 */
> +	unsigned int cpu = raw_smp_processor_id();
>  	struct task_struct *t;
>  
> -	if (!secondary) {
> -		t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
> -				   new->name);
> -	} else {
> -		t = kthread_create(irq_thread, new, "irq/%d-s-%s", irq,
> -				   new->name);
> -	}
> +	if (!secondary)
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-%s", irq, new->name);
> +	else
> +		t = kthread_create_on_cpu(irq_thread, new, cpu, "irq/%d-s-%s",
> -		irq, new->name);

Right I though about something like that, it involved:

kthread_bind_mask(t, cpu_possible_mask);

Which do you prefer? Also do you prefer such a fixup or should I refactor my
patches you merged?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ