[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5d3d80dd-00ca-464d-bebf-c0fd4836b947@redhat.com>
Date: Mon, 10 Nov 2025 16:28:49 -0500
From: Waiman Long <llong@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
Marco Crivellari <marco.crivellari@...e.com>, cgroups@...r.kernel.org
Subject: Re: [PATCH] genirq: Fix IRQ threads affinity VS cpuset isolated
partitions
On 11/5/25 8:17 AM, Frederic Weisbecker wrote:
> When a cpuset isolated partition is created / updated or destroyed,
> the IRQ threads are affine blindly to all the non-isolated CPUs. And
> this happens without taking into account the IRQ thread initial
> affinity that becomes ignored.
>
> For example in a system with 8 CPUs, if an IRQ and its kthread are
> initially affine to CPU 5, creating an isolated partition with only
> CPU 2 inside will eventually end up affining the IRQ kthread to all
> CPUs but CPU 2 (that is CPUs 0,1,3-7), losing the kthread preference for
> CPU 5.
>
> Besides the blind re-affinity, this doesn't take care of the actual
> low level interrupt which isn't migrated. As of today the only way to
> isolate non managed interrupts, along with their kthreads, is to
> overwrite their affinity separately, for example through /proc/irq/
>
> To avoid doing that manually, future development should focus on
> updating the IRQs affinity whenever cpuset isolated partitions are
> updated.
>
> In the meantime, cpuset shouldn't fiddle with IRQ threads directly.
> To prevent from that, set the PF_NO_SETAFFINITY flag to them.
>
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> ---
> kernel/irq/manage.c | 33 ++++++++++++++++++++-------------
> 1 file changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 400856abf672..5ca000c9f4a7 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -176,7 +176,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
> }
>
> /**
> - * irq_set_thread_affinity - Notify irq threads to adjust affinity
> + * irq_thread_update_affinity - Notify irq threads to adjust affinity
> * @desc: irq descriptor which has affinity changed
> *
> * Just set IRQTF_AFFINITY and delegate the affinity setting to the
> @@ -184,7 +184,7 @@ bool irq_can_set_affinity_usr(unsigned int irq)
> * we hold desc->lock and this code can be called from hard interrupt
> * context.
> */
> -static void irq_set_thread_affinity(struct irq_desc *desc)
> +static void irq_thread_update_affinity(struct irq_desc *desc)
> {
> struct irqaction *action;
>
> @@ -283,7 +283,7 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> fallthrough;
> case IRQ_SET_MASK_OK_NOCOPY:
> irq_validate_effective_affinity(data);
> - irq_set_thread_affinity(desc);
> + irq_thread_update_affinity(desc);
> ret = 0;
> }
>
> @@ -1035,8 +1035,23 @@ static void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *a
> set_cpus_allowed_ptr(current, mask);
> free_cpumask_var(mask);
> }
> +
> +static inline void irq_thread_set_affinity(struct task_struct *t,
> + struct irq_desc *desc)
> +{
> + const struct cpumask *mask;
> +
> + if (cpumask_available(desc->irq_common_data.affinity))
> + mask = irq_data_get_effective_affinity_mask(&desc->irq_data);
> + else
> + mask = cpu_possible_mask;
> +
> + kthread_bind_mask(t, mask);
> +}
This function seems to mirror what is done in
irq_thread_check_affinity() when the affinity cpumask is available. But
if affinity isn't defined, it will make this irq kthread immune from
changes in the set of isolated CPUs. Should we use IRQD_AFFINITY_SET
flag to check if affinity has been set and then set PF_NO_SETAFFINITY
only in this case?
Cheers,
Longman
> #else
> static inline void irq_thread_check_affinity(struct irq_desc *desc, struct irqaction *action) { }
> +static inline void irq_thread_set_affinity(struct task_struct *t,
> + struct irq_desc *desc) { }
> #endif
>
> static int irq_wait_for_interrupt(struct irq_desc *desc,
> @@ -1221,6 +1236,7 @@ static void wake_up_and_wait_for_irq_thread_ready(struct irq_desc *desc,
> if (!action || !action->thread)
> return;
>
> + irq_thread_set_affinity(action->thread, desc);
> wake_up_process(action->thread);
> wait_event(desc->wait_for_threads,
> test_bit(IRQTF_READY, &action->thread_flags));
> @@ -1405,16 +1421,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
> * references an already freed task_struct.
> */
> new->thread = get_task_struct(t);
> - /*
> - * Tell the thread to set its affinity. This is
> - * important for shared interrupt handlers as we do
> - * not invoke setup_affinity() for the secondary
> - * handlers as everything is already set up. Even for
> - * interrupts marked with IRQF_NO_BALANCE this is
> - * correct as we want the thread to move to the cpu(s)
> - * on which the requesting code placed the interrupt.
> - */
> - set_bit(IRQTF_AFFINITY, &new->thread_flags);
> +
> return 0;
> }
>
Powered by blists - more mailing lists