lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Feb 2020 17:41:39 +0000
From:   Qais Yousef <qais.yousef@....com>
To:     Pavan Kondeti <pkondeti@...eaurora.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 5/6] sched/rt: Better manage pushing unfit tasks on
 wakeup

On 02/24/20 21:34, Pavan Kondeti wrote:
> Hi Qais,
> 
> On Mon, Feb 24, 2020 at 5:42 PM Qais Yousef <qais.yousef@....com> wrote:
> [...]
> > We could do, temporarily, to get these fixes into 5.6. But I do think
> > select_task_rq_rt() doesn't do a good enough job into pushing unfit tasks to
> > the right CPUs.
> >
> > I don't understand the reasons behind your objection. It seems you think that
> > select_task_rq_rt() should be enough, but not AFAICS. Can you be a bit more
> > detailed please?
> >
> > FWIW, here's a screenshot of what I see
> >
> >         https://imgur.com/a/peV27nE
> >
> > After the first activation, select_task_rq_rt() fails to find the right CPU
> > (due to the same move all tasks to the cpumask_fist()) - but when the task
> > wakes up on 4, the logic I put causes it to migrate to CPU2, which is the 2nd
> > big core. CPU1 and CPU2 are the big cores on Juno.
> >
> > Now maybe we should fix select_task_rq_rt() to better balance tasks, but not
> > sure how easy is that.
> >
> 
> Thanks for the trace. Now things are clear to me. Two RT tasks woke up
> simultaneously and the first task got its previous CPU i.e CPU#1. The next task
> goes through find_lowest_rq() and got the same CPU#1. Since this task priority
> is not more than the just queued task (already queued on CPU#1), it is sent
> to its previous CPU i.e CPU#4 in your case.
> 
> From task_woken_rt() path, CPU#4 attempts push_rt_tasks(). CPU#4 is
> not overloaded,
> but we have rt_task_fits_capacity() check which forces the push. Since the CPU
> is not overloaded, your has_unfit_tasks() comes to rescue and push the
> task. Since
> the task has not scheduled in yet, it is eligible for push. You added checks
> to skip resched_curr() in push_rt_tasks() otherwise the push won't happen.

Nice summary, that's exactly what it is :)

> Finally, I understood your patch. Obviously this is not clear to me
> before. I am not
> sure if this patch is the right approach to solve this race. I will
> think a bit more.

I haven't been staring at this code for as long as you, but since we have
logic at wakeup to do a push, I think we need something here anyway for unfit
tasks.

Fixing select_task_rq_rt() to better balance tasks will help a lot in general,
but if that was enough already then why do we need to consider a push at the
wakeup at all then?

AFAIU, in SMP the whole push-pull mechanism is racy and we introduce redundancy
at taking the decision on various points to ensure we minimize this racy nature
of SMP systems. Anything could have happened between the time we called
select_task_rq_rt() and the wakeup, so we double check again before we finally
go and run. That's how I interpret it.

I am open to hear about other alternatives first anyway. Your help has been
much appreciated so far.

Thanks

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ