lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Mar 2023 11:21:30 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Zhang Qiao <zhangqiao22@...wei.com>,
        linux-kernel@...r.kernel.org, mingo@...hat.com,
        juri.lelli@...hat.com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        rkagan@...zon.de
Subject: Re: [PATCH v2] sched/fair: sanitize vruntime of entity being migrated

On Wed, 15 Mar 2023 at 11:15, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 15/03/2023 09:42, Vincent Guittot wrote:
> > On Wed, 15 Mar 2023 at 08:18, Vincent Guittot
> > <vincent.guittot@...aro.org> wrote:
> >>
> >> On Tue, 14 Mar 2023 at 18:16, Peter Zijlstra <peterz@...radead.org> wrote:
> >>>
> >>> On Tue, Mar 14, 2023 at 02:24:37PM +0100, Vincent Guittot wrote:
> >>>
> >>>>> @@ -7632,11 +7646,8 @@ static void migrate_task_rq_fair(struct task_struct *p, int new_cpu)
> >>>>>          * min_vruntime -- the latter is done by enqueue_entity() when placing
> >>>>>          * the task on the new runqueue.
> >>>>>          */
> >>>>> -       if (READ_ONCE(p->__state) == TASK_WAKING) {
> >>>>> -               struct cfs_rq *cfs_rq = cfs_rq_of(se);
> >>>>> -
> >>>>> +       if (READ_ONCE(p->__state) == TASK_WAKING || reset_vruntime(cfs_rq, se))
> >>>>
> >>>> That's somehow what was proposed in one of the previous proposals but
> >>>> we can't call rq_clock_task(rq_of(cfs_rq)) because rq lock might not
> >>>> be hold and rq task clock has not been updated before being used
> >>>
> >>> Argh indeed. I spend a lot of time ensuring we didn't take the old rq
> >>> lock on wakeup -- and then a lot of time cursing about how we don't :-)
> >>>
> >>> Now, if we could rely on the rq-clock being no more than 1 tick behind
> >>> current, this would still be entirely sufficient to catch the long sleep
> >>> case.
> >>
> >> We should also take care when loading rq_clock_task that we are not
> >> racing with an update especially for a 32bits system like pelt
> >> last_update_time
> >
> > We still have this possibility:
> > https://lore.kernel.org/lkml/ZAiFxWLSb9HDazSI@vingu-book/
> >
> > which uses pelt last_update_time when migrating and keep using
> > rq_clock_task in place_entity
>
> Isn't there an issue with this approach on asymmetric CPU capacity systems?
>
> We do a sync_entity_load_avg() in select_task_rq_fair()
> (find_energy_efficient_cpu() for EAS and select_idle_sibling() for CAS)
> to sync cfs_rq and se.

ah yes, I forgot this point.
That being said, is it a valid problem for EAS based system ? I mean
we are trying to fix a vruntime comparison overflow that can happen
with a very long sleeping task (around 200 days) while only a very low
weight entity is always running  during those 200 days.

>
> [...]
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ