lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gyQvPqCN8jPrJqJVNeYXkhmCOnkuNvLgdqQtcS-fgF-g@mail.gmail.com>
Date:   Wed, 26 Jul 2023 17:10:23 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Anna-Maria Behnsen <anna-maria@...utronix.de>,
        Frederic Weisbecker <frederic@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        "Gautham R. Shenoy" <gautham.shenoy@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>
Subject: Re: Stopping the tick on a fully loaded system

On Wed, Jul 26, 2023 at 12:29 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Jul 25, 2023 at 04:27:56PM +0200, Rafael J. Wysocki wrote:
> > On Tue, Jul 25, 2023 at 3:07 PM Anna-Maria Behnsen
>
> > >                         100% load               50% load                25% load
> > >                         (top: ~2% idle)         (top: ~49% idle)        (top: ~74% idle;
> > >                                                                         33 CPUs are completely idle)
> > >                         ---------------         ----------------        ----------------------------
> > > Idle Total              1658703 100%            3150522 100%            2377035 100%
> > > x >= 4ms                2504    0.15%           2       0.00%           53      0.00%
> > > 4ms> x >= 2ms           390     0.02%           0       0.00%           4563    0.19%
> > > 2ms > x >= 1ms          62      0.00%           1       0.00%           54      0.00%
> > > 1ms > x >= 500us        67      0.00%           6       0.00%           2       0.00%
> > > 500us > x >= 250us      93      0.01%           39      0.00%           11      0.00%
> > > 250us > x >=100us       280     0.02%           1145    0.04%           633     0.03%
> > > 100us > x >= 50us       942     0.06%           30722   0.98%           13347   0.56%
> > > 50us > x >= 25us        26728   1.61%           310932  9.87%           106083  4.46%
> > > 25us > x >= 10us        825920  49.79%          2320683 73.66%          1722505 72.46%
> > > 10us > x > 5us          795197  47.94%          442991  14.06%          506008  21.29%
> > > 5us > x                 6520    0.39%           43994   1.40%           23645   0.99%
> > >
> > >
> > > 99% of the tick stops only have an idle period shorter than 50us (50us is
> > > 1,25% of a tick length).
> >
> > Well, this just means that the governor predicts overly long idle
> > durations quite often under this workload.
> >
> > The governor's decision on whether or not to stop the tick is based on
> > its idle duration prediction.  If it overshoots, that's how it goes.
>
> This is abysmal; IIRC TEO tracks a density function in C state buckets
> and if it finds it's more likely to be shorter than 'predicted' by the
> timer it should pick something shallower.
>
> Given we have this density function, picking something that's <1% likely
> is insane. In fact, it seems to suggest the whole pick-alternative thing
> is utterly broken.
>
> > > This is also the reason for my opinion, that the return of
> > > tick_nohz_next_event() is completely irrelevant in a (fully) loaded case:
> >
> > It is an upper bound and in a fully loaded case it may be way off.
>
> But given we have our density function, we should be able to do much
> better.
>
>
> Oooh,... I think I see the problem. Our bins are strictly the available
> C-state, but if you run this on a Zen3 that has ACPI-idle, then you end
> up with something that only has 3 C states, like:
>
> $ for i in state*/residency ; do echo -n "${i}: "; cat $i; done
> state0/residency: 0
> state1/residency: 2
> state2/residency: 36
>
> Which means we only have buckets: (0,0] (0,2000], (2000,36000] or somesuch. All
> of them very much smaller than TICK_NSEC.
>
> That means we don't track nearly enough data to reliably tell anything
> about disabling the tick or not. We should have at least one bucket
> beyond TICK_NSEC for this.

Quite likely.

> Hmm.. it is getting very late, but how about I get the cpuidle framework
> to pad the drv states with a few 'disabled' C states so that we have at
> least enough data to cross the TICK_NSEC boundary and say something
> usable about things.
>
> Because as things stand, it's very likely we determine @stop_tick purely
> based on what tick_nohz_get_sleep_length() tells us, not on what we've
> learnt from recent history.
>
>
> (FWIW intel_idle seems to not have an entry for Tigerlake !?! -- my poor
> laptop, it feels neglected)

It should then use ACPI _CST idle states.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ