linux-kernel - Re: problem in changing from active to passive mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.22.394.2110271712100.2966@hadrien>
Date:   Wed, 27 Oct 2021 17:16:48 +0200 (CEST)
From:   Julia Lawall <julia.lawall@...ia.fr>
To:     Doug Smythies <dsmythies@...us.net>
cc:     Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Len Brown <lenb@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Linux PM list <linux-pm@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: problem in changing from active to passive mode



On Wed, 27 Oct 2021, Doug Smythies wrote:

> On Tue, Oct 26, 2021 at 8:13 AM Julia Lawall <julia.lawall@...ia.fr> wrote:
> >
> > The problem is illustrated by the attached graphs.  These graphs on the
> > odd numbered pages show the frequency of each core measures at every clock
> > tick.  At each measurement there is a small bar representing 4ms of the
> > color associated with the frequency.  The percentages shown are thus not
> > entirely accurate, because the frequency could change within those 4ms and
> > we would not observe that.
> >
> > The first graph, 5.9schedutil_yeti, is the normal behavior of schedutil
> > running.  The application mostly uses the second highest turbo mode, which
> > is the appropriate one given that there are around 5 active cores most of
> > the time.  I traced power:cpu_frequency, which is the event that occurs
> > when the OS requests a change of frequency.  This happens around 5400
> > times.
> >
> > The second graph, 5.15-schedutil_yeti, is the latest version of Linus's
> > tree.  The cores are almost always at the lowest frequency.  There are no
> > occurrences of the power:cpu_frequency event.
> >
> > The third graph, 5.9schedutil_after_yeti, it what happens when I reboot
> > into 5.9 after having changed to passive mode in 5.15.  The number of
> > power:cpu_frequency drops to around 1100.  The proper turbo mode is
> > actually used sometimes, but much less than in the first graph.  More than
> > half of the time, an active core is at the lowest frequency.
> >
> > This application (avrora from the DaCapo benchmarks) is continually
> > stopping and starting, both for very short intervals.  This may discourage
> > the hardware from raising the frequency of its own volition.
>
> Agreed. This type of workflow has long been known to be a challenge
> for various CPU frequency scaling governors. It comes up every so
> often on the linux-pm email list. Basically, the schedutil CPU frequency
> scaling governor becomes somewhat indecisive under these conditions.
> However, if for some reason it gets kicked up to max CPU frequency,
> then often it will stay there (depending on details of the workflow,
> it stays up for my workflows).
>
> Around the time of the commit you referenced in your earlier
> email, it was recognised that proposed changes were adding
> a bit of a downward bias to the hwp-passive-scheutil case for
> some of these difficult workflows [1].
>
> I booted an old 5.9, HWP enabled, passive, schedutil.
> I got the following for my ping-pong test type workflow,
> (which is not the best example):
>
> Run 1: 6234 uSecs/loop
> Run 2: 2813 uSecs/loop
> Run 3: 2721 uSecs/loop
> Run 4: 2813 uSecs/loop
> Run 5: 11303 uSecs/loop
> Run 6: 13803 uSecs/loop
> Run 7: 2809 uSecs/loop
> Run 8: 2796 uSecs/loop
> Run 9: 2760 uSecs/loop
> Run 10: 2691 uSecs/loop
> Run 11: 9288 uSecs/loop
> Run 12: 4275 uSecs/loop
>
> Then the same with kernel 5.15-rc5
> (I am a couple of weeks behind).
>
> Run 1: 13618 uSecs/loop
> Run 2: 13901 uSecs/loop
> Run 3: 8929 uSecs/loop
> Run 4: 12189 uSecs/loop
> Run 5: 10338 uSecs/loop
> Run 6: 12846 uSecs/loop
> Run 7: 5418 uSecs/loop
> Run 8: 7692 uSecs/loop
> Run 9: 11531 uSecs/loop
> Run 10: 9763 uSecs/loop
>
> Now, for your graph 3, are you saying this pseudo
> code of the process is repeatable?:
>
> Power up the system, booting kernel 5.9
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~13 seconds
> re-boot to kernel 5.15-RC
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~40 seconds
> re-boot to kernel 5.9
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~28 seconds

Yes, exactly.

I have been looking into why with 5.15-RC there are no requests from
schedutil.  I'm not yet sure to understand everything.  But I do notice
that the function cpufreq_this_cpu_can_update returns false around 2/3 of
the time.  This comes from the following code returning 0:

cpumask_test_cpu(smp_processor_id(), policy->cpus)

It seems that the mask policy->cpus always contains only one core, which
might or might not be the running one.  I don't know if this is the
intended behavior.

julia