[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBTc3Z_oK_Gg=79g4eUfA1iUat7gsZ2wqKkj=QXULYzng@mail.gmail.com>
Date: Wed, 26 Jun 2024 17:17:48 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Julia Lawall <julia.lawall@...ia.fr>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Mel Gorman <mgorman@...e.de>,
K Prateek Nayak <kprateek.nayak@....com>, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: softirq
On Wed, 26 Jun 2024 at 07:37, Julia Lawall <julia.lawall@...ia.fr> wrote:
>
> Hello,
>
> I'm not sure to understand how soft irqs work. I see the code:
>
> open_softirq(SCHED_SOFTIRQ, sched_balance_softirq);
>
> Intuitively, I would expect that sched_balance_softirq would be run by
> ksoftirqd. That is, I would expect ksoftirqd to be scheduled
By default, sched_softirq and others run in interrupt context.
ksoftirqd is woken up only in some cases like when we spent too much
time processing softirq in interrupt context or the softirq is raised
outside interrupt context
> (sched_switch event), then the various actions of sched_balance_softirq to
> be executed, and the ksoftirqd to be unscheduled (another ksoftirqd)
> event.
>
> But in practice, I see the code of sched_balance_softirq being executed
> by the idle task, before the ksoftirqd is scheduled (see core 40):
What wakes up ksoftirqd ? and which softirq finally runs in ksoftirqd ?
>
> <idle>-0 [040] 3611.432554: softirq_entry: vec=7 [action=SCHED]
> <idle>-0 [040] 3611.432554: bputs: sched_balance_softirq: starting nohz
> <idle>-0 [040] 3611.432554: bputs: sched_balance_softirq: starting _nohz_idle_balance
> bt.B.x-12022 [047] 3611.432554: softirq_entry: vec=1 [action=TIMER]
> <idle>-0 [040] 3611.432554: bputs: _nohz_idle_balance.isra.0: searching for a cpu
> bt.B.x-12033 [003] 3611.432554: softirq_entry: vec=7 [action=SCHED]
> <idle>-0 [040] 3611.432554: bputs: sched_balance_softirq: ending _nohz_idle_balance
> bt.B.x-12052 [011] 3611.432554: softirq_entry: vec=7 [action=SCHED]
> <idle>-0 [040] 3611.432554: bputs: sched_balance_softirq: nohz returns true ending soft irq
> <idle>-0 [040] 3611.432554: softirq_exit: vec=7 [action=SCHED]
>
> For example, idle seems to be running the code in _nohz_idle_balance.
>
> I updated the code of _nohz_idle_balance as follows:
>
> trace_printk("searching for a cpu\n");
> for_each_cpu_wrap(balance_cpu, nohz.idle_cpus_mask, this_cpu+1) {
> if (!idle_cpu(balance_cpu))
> continue;
> trace_printk("found an idle cpu\n");
>
> It prints searching for a cpu, but not found an idle cpu, because the
> ksoftirqd on the core's runqueue makes the core not idle. This makes the
> whole softirq seem fairly useless when the only idle core is the one
> raising the soft irq.
The typical behavior is:
CPUA CPUB
do_idle
while (!need_resched()) {
...
kick_ilb
smp_call_function_single_async(CPUB)
send_call_function_single_ipi
raise_ipi ---------------------> cpuidle exit event
irq_handler_entry
ipi_handler
raise sched_softirq
irq_handler_exit
sorftirq_entry
sched_balance_softirq
__nohe_idle_balance
softirq_exit
cpuidle_enter event
softirq is done in the interrupt context after the irq handler and
CPUB never leaves the while (!need_resched()) loop
In your case, I suspect that you have a racing with the polling mode
and the fact that you leave the while (!need_resched()) loop and call
flush_smp_call_function_queue()
We don't use polling on arm64 so I can't even try to reproduce your case
>
> This is all for the same scenario that I have discussed previously, where
> there are two sockets and an overload of on thread on one and an underload
> of on thread on the other, and all the thread have been marked by numa
> balancing as preferring to be where they are. Now I am trying Prateek's
> patch series.
>
> thanks,
> julia
Powered by blists - more mailing lists