lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230725222851.GC3784071@hirez.programming.kicks-ass.net>
Date:   Wed, 26 Jul 2023 00:28:51 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     Anna-Maria Behnsen <anna-maria@...utronix.de>,
        Frederic Weisbecker <frederic@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
        "Gautham R. Shenoy" <gautham.shenoy@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>
Subject: Re: Stopping the tick on a fully loaded system

On Tue, Jul 25, 2023 at 04:27:56PM +0200, Rafael J. Wysocki wrote:
> On Tue, Jul 25, 2023 at 3:07 PM Anna-Maria Behnsen

> >                         100% load               50% load                25% load
> >                         (top: ~2% idle)         (top: ~49% idle)        (top: ~74% idle;
> >                                                                         33 CPUs are completely idle)
> >                         ---------------         ----------------        ----------------------------
> > Idle Total              1658703 100%            3150522 100%            2377035 100%
> > x >= 4ms                2504    0.15%           2       0.00%           53      0.00%
> > 4ms> x >= 2ms           390     0.02%           0       0.00%           4563    0.19%
> > 2ms > x >= 1ms          62      0.00%           1       0.00%           54      0.00%
> > 1ms > x >= 500us        67      0.00%           6       0.00%           2       0.00%
> > 500us > x >= 250us      93      0.01%           39      0.00%           11      0.00%
> > 250us > x >=100us       280     0.02%           1145    0.04%           633     0.03%
> > 100us > x >= 50us       942     0.06%           30722   0.98%           13347   0.56%
> > 50us > x >= 25us        26728   1.61%           310932  9.87%           106083  4.46%
> > 25us > x >= 10us        825920  49.79%          2320683 73.66%          1722505 72.46%
> > 10us > x > 5us          795197  47.94%          442991  14.06%          506008  21.29%
> > 5us > x                 6520    0.39%           43994   1.40%           23645   0.99%
> >
> >
> > 99% of the tick stops only have an idle period shorter than 50us (50us is
> > 1,25% of a tick length).
> 
> Well, this just means that the governor predicts overly long idle
> durations quite often under this workload.
> 
> The governor's decision on whether or not to stop the tick is based on
> its idle duration prediction.  If it overshoots, that's how it goes.

This is abysmal; IIRC TEO tracks a density function in C state buckets
and if it finds it's more likely to be shorter than 'predicted' by the
timer it should pick something shallower.

Given we have this density function, picking something that's <1% likely
is insane. In fact, it seems to suggest the whole pick-alternative thing
is utterly broken.

> > This is also the reason for my opinion, that the return of
> > tick_nohz_next_event() is completely irrelevant in a (fully) loaded case:
> 
> It is an upper bound and in a fully loaded case it may be way off.

But given we have our density function, we should be able to do much
better.


Oooh,... I think I see the problem. Our bins are strictly the available
C-state, but if you run this on a Zen3 that has ACPI-idle, then you end
up with something that only has 3 C states, like:

$ for i in state*/residency ; do echo -n "${i}: "; cat $i; done
state0/residency: 0
state1/residency: 2
state2/residency: 36

Which means we only have buckets: (0,0] (0,2000], (2000,36000] or somesuch. All
of them very much smaller than TICK_NSEC.

That means we don't track nearly enough data to reliably tell anything
about disabling the tick or not. We should have at least one bucket
beyond TICK_NSEC for this.

Hmm.. it is getting very late, but how about I get the cpuidle framework
to pad the drv states with a few 'disabled' C states so that we have at
least enough data to cross the TICK_NSEC boundary and say something
usable about things.

Because as things stand, it's very likely we determine @stop_tick purely
based on what tick_nohz_get_sleep_length() tells us, not on what we've
learnt from recent history.


(FWIW intel_idle seems to not have an entry for Tigerlake !?! -- my poor
laptop, it feels neglected)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ