linux-kernel - Re: softirq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtBTc3Z_oK_Gg=79g4eUfA1iUat7gsZ2wqKkj=QXULYzng@mail.gmail.com>
Date: Wed, 26 Jun 2024 17:17:48 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Julia Lawall <julia.lawall@...ia.fr>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Mel Gorman <mgorman@...e.de>, 
	K Prateek Nayak <kprateek.nayak@....com>, linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: softirq

On Wed, 26 Jun 2024 at 07:37, Julia Lawall <julia.lawall@...ia.fr> wrote:
>
> Hello,
>
> I'm not sure to understand how soft irqs work.  I see the code:
>
> open_softirq(SCHED_SOFTIRQ, sched_balance_softirq);
>
> Intuitively, I would expect that sched_balance_softirq would be run by
> ksoftirqd.  That is, I would expect ksoftirqd to be scheduled

By default, sched_softirq and others run in interrupt context.
ksoftirqd is woken up only in some cases like when we spent too much
time processing softirq in interrupt context or the softirq is raised
outside interrupt context

> (sched_switch event), then the various actions of sched_balance_softirq to
> be executed, and the ksoftirqd to be unscheduled (another ksoftirqd)
> event.
>
> But in practice, I see the code of sched_balance_softirq being executed
> by the idle task, before the ksoftirqd is scheduled (see core 40):

What wakes up ksoftirqd ? and which softirq finally runs in ksoftirqd ?

>
>           <idle>-0     [040]  3611.432554: softirq_entry:        vec=7 [action=SCHED]
>           <idle>-0     [040]  3611.432554: bputs:                sched_balance_softirq: starting nohz
>           <idle>-0     [040]  3611.432554: bputs:                sched_balance_softirq: starting _nohz_idle_balance
>           bt.B.x-12022 [047]  3611.432554: softirq_entry:        vec=1 [action=TIMER]
>           <idle>-0     [040]  3611.432554: bputs:                _nohz_idle_balance.isra.0: searching for a cpu
>           bt.B.x-12033 [003]  3611.432554: softirq_entry:        vec=7 [action=SCHED]
>           <idle>-0     [040]  3611.432554: bputs:                sched_balance_softirq: ending _nohz_idle_balance
>           bt.B.x-12052 [011]  3611.432554: softirq_entry:        vec=7 [action=SCHED]
>           <idle>-0     [040]  3611.432554: bputs:                sched_balance_softirq: nohz returns true ending soft irq
>           <idle>-0     [040]  3611.432554: softirq_exit:         vec=7 [action=SCHED]
>
> For example, idle seems to be running the code in _nohz_idle_balance.
>
> I updated the code of _nohz_idle_balance as follows:
>
> trace_printk("searching for a cpu\n");
>         for_each_cpu_wrap(balance_cpu,  nohz.idle_cpus_mask, this_cpu+1) {
>                 if (!idle_cpu(balance_cpu))
>                         continue;
> trace_printk("found an idle cpu\n");
>
> It prints searching for a cpu, but not found an idle cpu, because the
> ksoftirqd on the core's runqueue makes the core not idle.  This makes the
> whole softirq seem fairly useless when the only idle core is the one
> raising the soft irq.

The typical behavior is:

CPUA                                   CPUB
                                       do_idle
                                         while (!need_resched()) {
                                         ...

kick_ilb
  smp_call_function_single_async(CPUB)
    send_call_function_single_ipi
      raise_ipi  --------------------->    cpuidle exit event
                                           irq_handler_entry
                                             ipi_handler
                                               raise sched_softirq
                                           irq_handler_exit
                                           sorftirq_entry
                                             sched_balance_softirq
                                               __nohe_idle_balance
                                           softirq_exit
                                           cpuidle_enter event

softirq is done in the interrupt context after the irq handler and
CPUB never leaves the while (!need_resched())  loop

In your case, I suspect that you have a racing with the polling mode
and the fact that you leave the while (!need_resched()) loop and call
flush_smp_call_function_queue()

We don't use polling on arm64 so I can't even try to reproduce your case

>
> This is all for the same scenario that I have discussed previously, where
> there are two sockets and an overload of on thread on one and an underload
> of on thread on the other, and all the thread have been marked by numa
> balancing as preferring to be where they are.  Now I am trying Prateek's
> patch series.
>
> thanks,
> julia