linux-kernel - Re: brocken devfreq simple

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200624081438.GA20603@pi3>
Date:   Wed, 24 Jun 2020 10:14:38 +0200
From:   Krzysztof Kozlowski <krzk@...nel.org>
To:     Willy Wolff <willy.mh.wolff.ml@...il.com>
Cc:     Chanwoo Choi <cw00.choi@...sung.com>,
        MyungJoo Ham <myungjoo.ham@...sung.com>,
        Kyungmin Park <kyungmin.park@...sung.com>,
        Kukjin Kim <kgene@...nel.org>, linux-pm@...r.kernel.org,
        "linux-samsung-soc@...r.kernel.org" 
        <linux-samsung-soc@...r.kernel.org>,
        linux-arm-kernel@...ts.infradead.org,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Lukasz Luba <lukasz.luba@....com>
Subject: Re: brocken devfreq simple_ondemand for Odroid XU3/4?

On Wed, Jun 24, 2020 at 10:01:17AM +0200, Willy Wolff wrote:
> Hi Krzysztof,
> Thanks to look at it.
> 
> mem_gov is /sys/class/devfreq/10c20000.memory-controller/governor
> 
> Here some numbers after increasing the running time:
> 
> Running using simple_ondemand:
> Before:
>      From  :   To                                                                                     
>            : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
> * 165000000:         0         0         0         0         0         0         0         4   4528600
>   206000000:         5         0         0         0         0         0         0         0     57780
>   275000000:         0         5         0         0         0         0         0         0     50060
>   413000000:         0         0         5         0         0         0         0         0     46240
>   543000000:         0         0         0         5         0         0         0         0     48970
>   633000000:         0         0         0         0         5         0         0         0     47330
>   728000000:         0         0         0         0         0         0         0         0         0
>   825000000:         0         0         0         0         0         5         0         0    331300
> Total transition : 34
> 
> 
> After:
>      From  :   To
>            : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
> * 165000000:         0         0         0         0         0         0         0         4   5098890
>   206000000:         5         0         0         0         0         0         0         0     57780
>   275000000:         0         5         0         0         0         0         0         0     50060
>   413000000:         0         0         5         0         0         0         0         0     46240
>   543000000:         0         0         0         5         0         0         0         0     48970
>   633000000:         0         0         0         0         5         0         0         0     47330
>   728000000:         0         0         0         0         0         0         0         0         0
>   825000000:         0         0         0         0         0         5         0         0    331300
> Total transition : 34
> 
> With a running time of:
> LITTLE => 283.699 s (680.877 c per mem access)
> big => 284.47 s (975.327 c per mem access)

I see there were no transitions during your memory test.

> 
> And when I set to the performance governor:
> Before:
>      From  :   To
>            : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
>   165000000:         0         0         0         0         0         0         0         5   5099040
>   206000000:         5         0         0         0         0         0         0         0     57780
>   275000000:         0         5         0         0         0         0         0         0     50060
>   413000000:         0         0         5         0         0         0         0         0     46240
>   543000000:         0         0         0         5         0         0         0         0     48970
>   633000000:         0         0         0         0         5         0         0         0     47330
>   728000000:         0         0         0         0         0         0         0         0         0
> * 825000000:         0         0         0         0         0         5         0         0    331350
> Total transition : 35
> 
> After:
>      From  :   To
>            : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
>   165000000:         0         0         0         0         0         0         0         5   5099040
>   206000000:         5         0         0         0         0         0         0         0     57780
>   275000000:         0         5         0         0         0         0         0         0     50060
>   413000000:         0         0         5         0         0         0         0         0     46240
>   543000000:         0         0         0         5         0         0         0         0     48970
>   633000000:         0         0         0         0         5         0         0         0     47330
>   728000000:         0         0         0         0         0         0         0         0         0
> * 825000000:         0         0         0         0         0         5         0         0    472980
> Total transition : 35
> 
> With a running time of:
> LITTLE: 68.8428 s (165.223 c per mem access)
> big: 71.3268 s (244.549 c per mem access)
> 
> 
> I see some transition, but not occuring during the benchmark.
> I haven't dive into the code, but maybe it is the heuristic behind that is not
> well defined? If you know how it's working that would be helpfull before I dive
> in it.

Sorry, don't know that much. It seems it counts time between overflow of
DMC perf events and based on this bumps up the frequency.

Maybe your test does not fit well in current formula? Maybe the formula
has some drawbacks...

> 
> I run your test as well, and indeed, it seems to work for large bunch of memory,
> and there is some delay before making a transition (seems to be around 10s).
> When you kill memtester, it reduces the freq stepwisely every ~10s.
> 
> Note that the timing shown above account for the critical path, and the code is
> looping on reading only, there is no write in the critical path.
> Maybe memtester is doing writes and devfreq heuristic uses only write info?
>
You mentioned that you want to cut the prefetcher to have direct access
to RAM. But prefetcher also accesses the RAM. He does not get the
contents from the air.  Although this is unrelated to the problem
because your pattern should kick ondemand as well.

Best regards,
Krzysztof