lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCnusWJXJLDEudQ_q8MWaZYbPJK-QjAbBYWFW8Nw-J+Ww@mail.gmail.com>
Date:   Fri, 26 Nov 2021 09:23:59 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <Valentin.Schneider@....com>
Cc:     Vincent Donnefort <Vincent.Donnefort@....com>,
        peterz@...radead.org, mingo@...hat.com,
        linux-kernel@...r.kernel.org, mgorman@...hsingularity.net,
        dietmar.eggemann@....com
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Thu, 25 Nov 2021 at 16:30, Valentin Schneider
<Valentin.Schneider@....com> wrote:
>
> On 25/11/21 14:23, Vincent Guittot wrote:
> > On Thu, 25 Nov 2021 at 12:16, Valentin Schneider
> > <Valentin.Schneider@....com> wrote:
> >> I think you can still hit this on a symmetric system; let me try to
> >> reformulate my other email.
> >>
> >> If this (non-patched) condition evaluates to true, it means the previous
> >> condition
> >>
> >>   (available_idle_cpu(target) || sched_idle_cpu(target)) &&
> >>    asym_fits_capacity(task_util, target)
> >>
> >> evaluated to false, so for a symmetric system target sure isn't idle.
> >>
> >> prev == smp_processor_id() implies prev == target, IOW prev isn't
> >> idle. Now, consider:
> >>
> >>   p0.prev = CPU1
> >>   p1.prev = CPU1
> >>
> >>   CPU0                     CPU1
> >>   current = don't care     current = swapper/1
> >>
> >>   ttwu(p1)
> >>     ttwu_queue(p1, CPU1)
> >>     // or
> >>     ttwu_queue_wakelist(p1, CPU1)
> >>
> >>                           hrtimer_wakeup()
> >>                             wake_up_process()
> >>                               ttwu()
> >>                                 idle_cpu(CPU1)? no
> >>
> >>                                 is_per_cpu_kthread(current)? yes
> >>                                 prev == smp_processor_id()? yes
> >>                                 this_rq()->nr_running <= 1? yes
> >>                                 => self enqueue
> >>
> >>                           ...
> >>                           schedule_idle()
> >>
> >> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
> >> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
> >> around, we'd still be stacking p0 and p1 onto CPU1.
> >>
> >> IOW this opens a window between a remote ttwu() and the idle task invoking
> >> schedule_idle() where the idle task can stack more tasks onto its CPU.
> >
> > Your use case above is out of the scope of this patch and has always
> > been there, even for other per cpu kthreads. In such case, the wake up
> > is not triggered by current (idle or another per cpu kthread) but by
> > an interrupt (hrtimer in your case).
>
> Technically the idle task didn't pass is_per_cpu_kthread(p) when that
> condition was added, this is somewhat of a "new development" - but you're
> right on the hardirq side of things.
>
> > If we want to filter wakeup
> > generated by interrupt context while a per cpu kthread is running, it
> > would be better to fix all cases and test the running context like
> > this
> >
>
> I think that could make sense - though can the idle task issue wakeups in
> process context? If so that won't be sufficient. A quick audit tells me:
>
> o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle
> o I didn't see any wakeup issued from the cpu_pm_notifier call chain
> o I'm not entirely sure about flush_smp_call_function_from_idle(). I found
>   this thing in RCU:
>
>   smp_call_function_single(cpu, rcu_exp_handler)
>
>     rcu_exp_handler()
>       rcu_report_exp_rdp()
>         rcu_report_exp_cpu_mult()
>           __rcu_report_exp_rnp()
>             swake_up_one()
>
> IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be
> handled in that flush_foo_from_idle() call.

Aren't all these planned to wakeup on local cpu  ? so i don't  see any
real problem there

>
> I'd be tempted to stick your VincentD's conditions together, just to be
> safe...

More than safe I would prefer that we fix the correct root cause
instead of hiding it

>
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6397,7 +6397,8 @@ static int select_idle_sibling(struct
> > task_struct *p, int prev, int target)
> >          * essentially a sync wakeup. An obvious example of this
> >          * pattern is IO completions.
> >          */
> > -       if (is_per_cpu_kthread(current) &&
> > +       if (!in_interrupt() &&
> > +           is_per_cpu_kthread(current) &&
> >             prev == smp_processor_id() &&
> >             this_rq()->nr_running <= 1) {
> >                 return prev;
> >
> >>
> >> >
> >> >> --
> >> >> 2.25.1
> >> >>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ