lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Oct 2022 11:56:04 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Suren Baghdasaryan <surenb@...gle.com>
Cc:     hannes@...xchg.org, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, matthias.bgg@...il.com, minchan@...gle.com,
        yt.chang@...iatek.com, wenju.xu@...iatek.com,
        jonathan.jmchen@...iatek.com, show-hong.chen@...iatek.com,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-mediatek@...ts.infradead.org, kernel-team@...roid.com
Subject: Re: [RESEND PATCH v4 1/1] psi: stop relying on timer_pending for
 poll_work rescheduling

On Thu, Oct 20, 2022 at 03:25:47PM -0700, Suren Baghdasaryan wrote:
> On Mon, Oct 10, 2022 at 3:57 PM Suren Baghdasaryan <surenb@...gle.com> wrote:
> >
> > Psi polling mechanism is trying to minimize the number of wakeups to
> > run psi_poll_work and is currently relying on timer_pending() to detect
> > when this work is already scheduled. This provides a window of opportunity
> > for psi_group_change to schedule an immediate psi_poll_work after
> > poll_timer_fn got called but before psi_poll_work could reschedule itself.
> > Below is the depiction of this entire window:
> >
> > poll_timer_fn
> >   wake_up_interruptible(&group->poll_wait);
> >
> > psi_poll_worker
> >   wait_event_interruptible(group->poll_wait, ...)
> >   psi_poll_work
> >     psi_schedule_poll_work
> >       if (timer_pending(&group->poll_timer)) return;
> >       ...
> >       mod_timer(&group->poll_timer, jiffies + delay);
> >
> > Prior to 461daba06bdc we used to rely on poll_scheduled atomic which was
> > reset and set back inside psi_poll_work and therefore this race window
> > was much smaller.
> > The larger window causes increased number of wakeups and our partners
> > report visible power regression of ~10mA after applying 461daba06bdc.
> > Bring back the poll_scheduled atomic and make this race window even
> > narrower by resetting poll_scheduled only when we reach polling expiration
> > time. This does not completely eliminate the possibility of extra wakeups
> > caused by a race with psi_group_change however it will limit it to the
> > worst case scenario of one extra wakeup per every tracking window (0.5s
> > in the worst case).
> > This patch also ensures correct ordering between clearing poll_scheduled
> > flag and obtaining changed_states using memory barrier. Correct ordering
> > between updating changed_states and setting poll_scheduled is ensured by
> > atomic_xchg operation.
> > By tracing the number of immediate rescheduling attempts performed by
> > psi_group_change and the number of these attempts being blocked due to
> > psi monitor being already active, we can assess the effects of this change:
> >
> > Before the patch:
> >                                            Run#1    Run#2      Run#3
> > Immediate reschedules attempted:           684365   1385156    1261240
> > Immediate reschedules blocked:             682846   1381654    1258682
> > Immediate reschedules (delta):             1519     3502       2558
> > Immediate reschedules (% of attempted):    0.22%    0.25%      0.20%
> >
> > After the patch:
> >                                            Run#1    Run#2      Run#3
> > Immediate reschedules attempted:           882244   770298    426218
> > Immediate reschedules blocked:             881996   769796    426074
> > Immediate reschedules (delta):             248      502       144
> > Immediate reschedules (% of attempted):    0.03%    0.07%     0.03%
> >
> > The number of non-blocked immediate reschedules dropped from 0.22-0.25%
> > to 0.03-0.07%. The drop is attributed to the decrease in the race window
> > size and the fact that we allow this race only when psi monitors reach
> > polling window expiration time.
> >
> > Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
> > Reported-by: Kathleen Chang <yt.chang@...iatek.com>
> > Reported-by: Wenju Xu <wenju.xu@...iatek.com>
> > Reported-by: Jonathan Chen <jonathan.jmchen@...iatek.com>
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> > Tested-by: SH Chen <show-hong.chen@...iatek.com>
> > Acked-by: Johannes Weiner <hannes@...xchg.org>
> > ---
> > This patch somehow slipped through the cracks after being acked by Johannes in
> > [1] and I didn't notice it until now because we cherry-picked it into Android
> > kernel trees due to the urgency at that time. On the bright side, this change
> > has been tested for about a year in the field by millions of devices.
> > Resending v4 of this patch previously posted at [2], rebased on the latest
> > Linus' TOT.
> 
> Hi Peter,
> We missed this Ack'ed patch last year and as I described above I
> didn't notice that up until now. With rc1 released, hopefully it's a
> good time to ping you to ask for inclusion of this patch in your tree.
> If the timing is not good, please let me know when to remind you and
> I'll send another email. Just want to make sure it does not slip
> again.
> 
> Just FYI, we have two other Ack'ed PSI patches for you to consider:
> 
> https://lore.kernel.org/all/20221014110551.22695-1-zhouchengming@bytedance.com/
> https://lore.kernel.org/all/20220919072356.GA29069@haolee.io/

Thanks for the poke; I've picked up all three and will place then in
sched/core.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ