lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Mar 2019 20:21:02 +0100
From:   Gerhard Wiesinger <lists@...singer.com>
To:     Maxime Ripard <maxime.ripard@...tlin.com>
Cc:     arm@...ts.fedoraproject.org, Chen-Yu Tsai <wens@...e.org>,
        LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
        Florian Fainelli <f.fainelli@...il.com>, filbar@...trum.cz
Subject: Re: Banana Pi-R1 stabil

On 05.03.2019 10:28, Maxime Ripard wrote:
> On Sat, Mar 02, 2019 at 09:42:08AM +0100, Gerhard Wiesinger wrote:
>> On 01.03.2019 10:30, Maxime Ripard wrote:
>>> On Thu, Feb 28, 2019 at 08:41:53PM +0100, Gerhard Wiesinger wrote:
>>>> On 28.02.2019 10:35, Maxime Ripard wrote:
>>>>> On Wed, Feb 27, 2019 at 07:58:14PM +0100, Gerhard Wiesinger wrote:
>>>>>> On 27.02.2019 10:20, Maxime Ripard wrote:
>>>>>>> On Sun, Feb 24, 2019 at 09:04:57AM +0100, Gerhard Wiesinger wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I've 3 Banana Pi R1, one running with self compiled kernel
>>>>>>>> 4.7.4-200.BPiR1.fc24.armv7hl and old Fedora 25 which is VERY STABLE, the 2
>>>>>>>> others are running with Fedora 29 latest, kernel 4.20.10-200.fc29.armv7hl. I
>>>>>>>> tried a lot of kernels between of around 4.11
>>>>>>>> (kernel-4.11.10-200.fc25.armv7hl) until 4.20.10 but all had crashes without
>>>>>>>> any output on the serial console or kernel panics after a short time of
>>>>>>>> period (minutes, hours, max. days)
>>>>>>>>
>>>>>>>> Latest known working and stable self compiled kernel: kernel
>>>>>>>> 4.7.4-200.BPiR1.fc24.armv7hl:
>>>>>>>>
>>>>>>>> https://www.wiesinger.com/opensource/fedora/kernel/BananaPi-R1/
>>>>>>>>
>>>>>>>> With 4.8.x the DSA b53 switch infrastructure has been introduced which
>>>>>>>> didn't work (until ca8931948344c485569b04821d1f6bcebccd376b and kernel
>>>>>>>> 4.18.x):
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/dsa/b53?h=v4.20.12
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/dsa/b53?h=v4.20.12
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/dsa/b53?h=v4.20.12&id=ca8931948344c485569b04821d1f6bcebccd376b
>>>>>>>>
>>>>>>>> I has been fixed with kernel 4.18.x:
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/dsa/b53?h=linux-4.18.y
>>>>>>>>
>>>>>>>>
>>>>>>>> So current status is, that kernel crashes regularly, see some samples below.
>>>>>>>> It is typically a "Unable to handle kernel paging request at virtual addres"
>>>>>>>>
>>>>>>>> Another interesting thing: A Banana Pro works well (which has also an
>>>>>>>> Allwinner A20 in the same revision) running same Fedora 29 and latest
>>>>>>>> kernels (e.g. kernel 4.20.10-200.fc29.armv7hl.).
>>>>>>>>
>>>>>>>> Since it happens on 2 different devices and with different power supplies
>>>>>>>> (all with enough power) and also the same type which works well on the
>>>>>>>> working old kernel) a hardware issue is very unlikely.
>>>>>>>>
>>>>>>>> I guess it has something to do with virtual memory.
>>>>>>>>
>>>>>>>> Any ideas?
>>>>>>>> [47322.960193] Unable to handle kernel paging request at virtual addres 5675d0
>>>>>>> That line is a bit suspicious
>>>>>>>
>>>>>>> Anyway, cpufreq is known to cause those kind of errors when the
>>>>>>> voltage / frequency association is not correct.
>>>>>>>
>>>>>>> Given the stack trace and that the BananaPro doesn't have cpufreq
>>>>>>> enabled, my first guess would be that it's what's happening. Could you
>>>>>>> try using the performance governor and see if it's more stable?
>>>>>>>
>>>>>>> If it is, then using this:
>>>>>>> https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
>>>>>>>
>>>>>>> will help you find the offending voltage-frequency couple.
>>>>>> For me it looks like they have all the same config regarding cpu governor
>>>>>> (Banana Pro, old kernel stable one, new kernel unstable ones)
>>>>> The Banana Pro doesn't have a regulator set up, so it will only change
>>>>> the frequency, not the voltage.
>>>>>
>>>>>> They all have the ondemand governor set:
>>>>>>
>>>>>> I set on the 2 unstable "new kernel Banana Pi R1":
>>>>>>
>>>>>> # Set to max performance
>>>>>> echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>>>>> echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
>>>>> What are the results?
>>>> Stable since more than around 1,5 days. Normally they have been crashed for
>>>> such a long uptime. So it looks that the performance governor fixes it.
>>>>
>>>> I guess crashes occour because of changing CPU voltage and clock changes and
>>>> invalid data (e.g. also invalid RAM contents might be read, register
>>>> problems, etc).
>>>>
>>>> Any ideas how to fix it for ondemand mode, too?
>>> Run https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
>>>
>>>> But it doesn't explaing that it works with kernel 4.7.4 without any
>>>> problems.
>>> My best guess would be that cpufreq wasn't enabled at that time, or
>>> without voltage scaling.
>>>
>> Where can I see the voltage scaling parameters?
>>
>> on DTS I don't see any difference between kernel 4.7.4 and 4.20.10 regarding
>> voltage:
>>
>> dtc -I dtb -O dts -o
>> /boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dts
>> /boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dtb
> This can be also due to configuration being changed, driver support, etc.

Where will the voltages for scaling then be set in detail (drivers, etc.)?


>
>> There is another strange thing (tested with
>> kernel-5.0.0-0.rc8.git1.1.fc31.armv7hl, kernel-4.19.8-300.fc29.armv7hl,
>> kernel-4.20.13-200.fc29.armv7hl, kernel-4.20.10-200.fc29.armv7hl):
>>
>> There is ALWAYS high CPU of around 10% in kworker:
>>
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
>> 18722 root      20   0       0      0      0 I   9.5   0.0 0:47.52
>> [kworker/1:3-events_freezable_power_]
>>
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
>>    776 root      20   0       0      0      0 I   8.6   0.0 0:02.77
>> [kworker/0:4-events]
> The first one looks like it's part of the workqueue code.


Any guessed reason for that?


>
>> Therefore CPU doesn't switch to low frequencies (see below).
> You said previously that those crashes were happening when the board
> was changing frequency, so I'm confused?


For the ondemand setting: due to the high load of kworker, the frequency 
is not changing often to lower values (but does some time and crashes 
also regularly)

For the performance setting: frequency is fixed (to maximum in the 
current configuration) and is stable


>
>> Any ideas?
> Run the cpustress program I told you to use already twice.

Had no time to try it yet. Will do. See also my comment below regarding 
idle CPU and high CPU.


>
>> BTW: Still stable at aboout 2,5days on both devices. So solution IS the
>> performance governor.
> No, the performance governor prevents any change in frequency. My
> guess is that a lower frequency operating point is not working and is
> crashing the CPU.
>

Yes, there might at least 2 scenarios:

1.) Frequency switching itself is the problem

2.) lower frequency/voltage operating points are not stable.

For both scenarios: it might be possible that the crash happens on idle 
CPU, high CPU load or just randomly. Therefore just "waiting" might be 
better than 100% CPU utilization.But will test also 100% CPU.

Therefore it would be good to see where the voltages for different 
frequencies for the SoC are defined (to compare).


I'm currently testing 2 different settings on the 2 new Banana Pi R1 
with newest kernel (see below), so 2 static frequencies:

# Set to specific frequency 144000 (currently testing on Banana Pi R1 #1)

# Set to specific frequency 312000 (currently testing on Banana Pi R1 #2)

If that's fine I'll test also further frequencies (with different loads).

Thnx.

Ciao,

Gerhard


# Set to max performance (stable)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to ondemand (not stable)
echo "ondemand" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "ondemand" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 144000 (currently testing on Banana Pi R1 #1)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "144000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "144000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 312000 (currently testing on Banana Pi R1 #2)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "312000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "312000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "312000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "312000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 528000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "528000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "528000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "528000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "528000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 720000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "720000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "720000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "720000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "720000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 864000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "864000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "864000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "864000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "864000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 912000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "912000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "912000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "912000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "912000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq

# Set to specific frequency 960000 (untested)
echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
echo "960000" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ