lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 14 Oct 2018 19:00:21 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     linux-kernel <linux-kernel@...r.kernel.org>,
        LAK <linux-arm-kernel@...ts.infradead.org>
Cc:     Mark Rutland <mark.rutland@....com>, lorenzo.pieralisi@....com,
        John Stultz <john.stultz@...aro.org>,
        Leo Yan <leo.yan@...aro.org>,
        Guodong Xu <guodong.xu@...aro.org>,
        Quentin Perret <quentin.perret@....com>
Subject: [BUG] hikey960: psci: Failure to boot CPU after hotplug

Hi folks,

I was cleaning up some hotplug torture test, and happened to run that on my
HiKey960 which resulted in a failure.

Turns out just a few hotplug operations are needed to trigger this, so I
boiled it down to this small script:

for ((i = 0; i < 4; i++)); do
    echo "OFF $i"
    echo 0 > /sys/devices/system/cpu/cpu$i/online
    echo "ON $i"
    echo 1 > /sys/devices/system/cpu/cpu$i/online
    echo
done

Running this results in:

----->8-----
OFF 0
[   80.819925] CPU0: shutdown
[   80.823851] psci: CPU0 killed.
ON 0
[   80.841609] Detected VIPT I-cache on CPU0
[   80.845730] CPU0: Booted secondary processor 0x0000000000 [0x410fd034]

OFF 1
[   80.927340] CPU1: shutdown
[   80.930204] psci: CPU1 killed.
ON 1
[   80.948701] Detected VIPT I-cache on CPU1
[   80.952810] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]

OFF 2
[   81.023079] CPU2: shutdown
[   81.026465] psci: CPU2 killed.
ON 2
[   81.036281] Detected VIPT I-cache on CPU2
[   81.040402] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]

OFF 3
[   81.103528] CPU3: shutdown
[   81.106382] psci: CPU3 killed.
ON 3
[   81.121835] Detected VIPT I-cache on CPU3
[   81.125975] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
----->8-----


Now, if I run this for CPUs [4-7], I eventually get this (takes a few tries):


----->8-----
OFF 4
[   73.149855] CPU4: shutdown
[   73.152628] psci: CPU4 killed.
ON 4
[   73.157491] Detected VIPT I-cache on CPU4
[   73.161509] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU4: 0x00000000101122
[   73.173813] arch_timer: CPU4: Trapping CNTVCT access
[   73.178782] CPU4: Booted secondary processor 0x0000000100 [0x410fd091]

OFF 5
[   73.261245] CPU5: shutdown
[   73.264043] psci: CPU5 killed.
ON 5
[   74.272375] CPU5: failed to come online
[   74.276264] CPU5: failed in unknown state : 0x0
./hotplug.sh: line 8: echo: write error: Input/output error

OFF 6
[   74.311066] CPU6: shutdown
[   74.313829] psci: CPU6 killed.
ON 6
[   74.318544] Detected VIPT I-cache on CPU6
[   74.322590] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU6: 0x00000000101122
[   74.334884] arch_timer: CPU6: Trapping CNTVCT access
[   74.339854] CPU6: Booted secondary processor 0x0000000102 [0x410fd091]

OFF 7
[   74.394989] CPU7: shutdown
[   74.397770] psci: CPU7 killed.
ON 7
[   74.402295] Detected VIPT I-cache on CPU7
[   74.406475] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU7: 0x00000000101122
[   74.418748] arch_timer: CPU7: Trapping CNTVCT access
[   74.423709] CPU7: Booted secondary processor 0x0000000103 [0x410fd091]
----->8-----


Trying to online CPU5 yet again yields a slightly different result:


----->8-----
[   74.528657] psci: failed to boot CPU5 (-22)
[   74.534577] CPU5: failed to boot: -22
[   74.538291] CPU5: failed in unknown state : 0x0
./hotplug.sh: line 8: echo: write error: Invalid argument
----->8-----

It doesn't seem tied to any particular big CPU - I've that happen for 4 & 7.
It happens both on mainline (4.19-rc7, 3a27203102eb) and on linux-next
(774ea0551a29). I tried bisecting this but it's a bit tricky since the
mainline support for HiKey960 is relatively recent - as far as I can tell,
that issue has always been there on this board.

I wanted to have a bit of fun and investigate that myself, but psci is alien
to me and I don't really know where to look at past "psci_cpu_on()".

I'm running UEFI/ATF on that board, if that's of any help.


Cheers,
Valentin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ