lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb8c48fb-c9b1-79c1-21b3-cd41ea37e2c6@arm.com>
Date:   Tue, 4 Feb 2020 13:53:23 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     linux-kernel <linux-kernel@...r.kernel.org>,
        Linux PM <linux-pm@...r.kernel.org>,
        linux-arm-msm@...r.kernel.org
Cc:     agross@...nel.org, bjorn.andersson@...aro.org, rjw@...ysocki.net,
        viresh.kumar@...aro.org, Ionela Voinescu <ionela.voinescu@....com>,
        Quentin Perret <qperret@...gle.com>
Subject: Suspect broken frequency transitions on SDM845

Hi folks,

We have a simple sanity test that asserts higher frequency leads to more
work done. It's fairly straightforward - we use the userspace governor,
go through increasing frequencies, run sysbench each time and assert the
values we get are increasing monotonically. We do that for one CPU of each
"type" (i.e. once for a LITTLE and once for a big).

We've been getting some sporadic failures on the big CPUs of a Pixel3
running mainline [1], here is an example of a correct run (CPU4):

| frequency (kHz) | sysbench events |
|-----------------+-----------------|
|          825600 |             236 |
|         1286400 |             369 |
|         1689600 |             483 |
|         2092800 |             600 |
|         2476800 |             711 |

and here is a failed one (still CPU4):

| frequency (kHz) | sysbench events |
|-----------------+-----------------|
|          825600 |             234 |
|         1286400 |             369 |
|         1689600 |             449 |
|         2092800 |             600 |
|         2476800 |             355 |


We've encountered something like this in the past with the exact same
test on h960 [2] but it is much harder to reproduce reliably this time
around.

I haven't found much time to dig into this; I did get a run of ~100 
iterations with about ~15 failures, but nothing cpufreq related showed up in
dmesg. I briefly suspected fast-switch, but it's only used by schedutil, so
in this test I would expect the frequency transition to be complete before we
even try to start executing sysbench.

If anyone has the time and will to look into this, that would be much
appreciated.

[1]: https://git.linaro.org/people/amit.pundir/linux.git/log/?h=blueline-mainline-tracking
[2]: https://lore.kernel.org/lkml/d3ede0ab-b635-344c-faba-a9b1531b7f05@arm.com/

Cheers,
Valentin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ