lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtDPskVdEd-KQ_cwe-R_zVFPQOgdbk9x+3eD12pKs8fGFw@mail.gmail.com>
Date:   Thu, 25 Nov 2021 14:23:10 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <Valentin.Schneider@....com>
Cc:     Vincent Donnefort <Vincent.Donnefort@....com>,
        peterz@...radead.org, mingo@...hat.com,
        linux-kernel@...r.kernel.org, mgorman@...hsingularity.net,
        dietmar.eggemann@....com
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Thu, 25 Nov 2021 at 12:16, Valentin Schneider
<Valentin.Schneider@....com> wrote:
>
> On 25/11/21 10:05, Vincent Guittot wrote:
> > On Wed, 24 Nov 2021 at 16:42, Vincent Donnefort
> > <vincent.donnefort@....com> wrote:
> >>
> >> select_idle_sibling() will return prev_cpu for the case where the task is
> >> woken up by a per-CPU kthread. However, the idle task has been recently
> >> modified and is now identified by is_per_cpu_kthread(), breaking the
> >> behaviour described above. Using !is_idle_task() ensures we do not
> >> spuriously trigger that select_idle_sibling() exit path.
> >>
> >> Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")
> >> Signed-off-by: Vincent Donnefort <vincent.donnefort@....com>
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 945d987246c5..8bf95b0e368d 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -6399,6 +6399,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> >>          * pattern is IO completions.
> >>          */
> >>         if (is_per_cpu_kthread(current) &&
> >> +           !is_idle_task(current) &&
> >>             prev == smp_processor_id() &&
> >>             this_rq()->nr_running <= 1) {
> >>                 return prev;
> >
> > AFAICT, this can't be possible for a symmetric system because it would
> > have been already returned by other conditions.
> > Only an asymmetric system can face such a situation if the task
> > doesn't fit which is the subject of your other patch.
> > so this patch seems irrelevant outside the other one
> >
>
> I think you can still hit this on a symmetric system; let me try to
> reformulate my other email.
>
> If this (non-patched) condition evaluates to true, it means the previous
> condition
>
>   (available_idle_cpu(target) || sched_idle_cpu(target)) &&
>    asym_fits_capacity(task_util, target)
>
> evaluated to false, so for a symmetric system target sure isn't idle.
>
> prev == smp_processor_id() implies prev == target, IOW prev isn't
> idle. Now, consider:
>
>   p0.prev = CPU1
>   p1.prev = CPU1
>
>   CPU0                     CPU1
>   current = don't care     current = swapper/1
>
>   ttwu(p1)
>     ttwu_queue(p1, CPU1)
>     // or
>     ttwu_queue_wakelist(p1, CPU1)
>
>                           hrtimer_wakeup()
>                             wake_up_process()
>                               ttwu()
>                                 idle_cpu(CPU1)? no
>
>                                 is_per_cpu_kthread(current)? yes
>                                 prev == smp_processor_id()? yes
>                                 this_rq()->nr_running <= 1? yes
>                                 => self enqueue
>
>                           ...
>                           schedule_idle()
>
> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
> around, we'd still be stacking p0 and p1 onto CPU1.
>
> IOW this opens a window between a remote ttwu() and the idle task invoking
> schedule_idle() where the idle task can stack more tasks onto its CPU.

Your use case above is out of the scope of this patch and has always
been there, even for other per cpu kthreads. In such case, the wake up
is not triggered by current (idle or another per cpu kthread) but by
an interrupt (hrtimer in your case). If we want to filter wakeup
generated by interrupt context while a per cpu kthread is running, it
would be better to fix all cases and test the running context like
this

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6397,7 +6397,8 @@ static int select_idle_sibling(struct
task_struct *p, int prev, int target)
         * essentially a sync wakeup. An obvious example of this
         * pattern is IO completions.
         */
-       if (is_per_cpu_kthread(current) &&
+       if (!in_interrupt() &&
+           is_per_cpu_kthread(current) &&
            prev == smp_processor_id() &&
            this_rq()->nr_running <= 1) {
                return prev;

>
> >
> >> --
> >> 2.25.1
> >>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ