[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ad8fbeb-fad8-d39a-9cc6-e7f1deab0b4f@wiesinger.com>
Date: Sat, 2 Mar 2019 09:42:08 +0100
From: Gerhard Wiesinger <lists@...singer.com>
To: Maxime Ripard <maxime.ripard@...tlin.com>
Cc: arm@...ts.fedoraproject.org, Chen-Yu Tsai <wens@...e.org>,
LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
Florian Fainelli <f.fainelli@...il.com>, filbar@...trum.cz
Subject: Re: Banana Pi-R1 stabil
On 01.03.2019 10:30, Maxime Ripard wrote:
> On Thu, Feb 28, 2019 at 08:41:53PM +0100, Gerhard Wiesinger wrote:
>> On 28.02.2019 10:35, Maxime Ripard wrote:
>>> On Wed, Feb 27, 2019 at 07:58:14PM +0100, Gerhard Wiesinger wrote:
>>>> On 27.02.2019 10:20, Maxime Ripard wrote:
>>>>> On Sun, Feb 24, 2019 at 09:04:57AM +0100, Gerhard Wiesinger wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I've 3 Banana Pi R1, one running with self compiled kernel
>>>>>> 4.7.4-200.BPiR1.fc24.armv7hl and old Fedora 25 which is VERY STABLE, the 2
>>>>>> others are running with Fedora 29 latest, kernel 4.20.10-200.fc29.armv7hl. I
>>>>>> tried a lot of kernels between of around 4.11
>>>>>> (kernel-4.11.10-200.fc25.armv7hl) until 4.20.10 but all had crashes without
>>>>>> any output on the serial console or kernel panics after a short time of
>>>>>> period (minutes, hours, max. days)
>>>>>>
>>>>>> Latest known working and stable self compiled kernel: kernel
>>>>>> 4.7.4-200.BPiR1.fc24.armv7hl:
>>>>>>
>>>>>> https://www.wiesinger.com/opensource/fedora/kernel/BananaPi-R1/
>>>>>>
>>>>>> With 4.8.x the DSA b53 switch infrastructure has been introduced which
>>>>>> didn't work (until ca8931948344c485569b04821d1f6bcebccd376b and kernel
>>>>>> 4.18.x):
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/dsa/b53?h=v4.20.12
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/dsa/b53?h=v4.20.12
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/dsa/b53?h=v4.20.12&id=ca8931948344c485569b04821d1f6bcebccd376b
>>>>>>
>>>>>> I has been fixed with kernel 4.18.x:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/drivers/net/dsa/b53?h=linux-4.18.y
>>>>>>
>>>>>>
>>>>>> So current status is, that kernel crashes regularly, see some samples below.
>>>>>> It is typically a "Unable to handle kernel paging request at virtual addres"
>>>>>>
>>>>>> Another interesting thing: A Banana Pro works well (which has also an
>>>>>> Allwinner A20 in the same revision) running same Fedora 29 and latest
>>>>>> kernels (e.g. kernel 4.20.10-200.fc29.armv7hl.).
>>>>>>
>>>>>> Since it happens on 2 different devices and with different power supplies
>>>>>> (all with enough power) and also the same type which works well on the
>>>>>> working old kernel) a hardware issue is very unlikely.
>>>>>>
>>>>>> I guess it has something to do with virtual memory.
>>>>>>
>>>>>> Any ideas?
>>>>>> [47322.960193] Unable to handle kernel paging request at virtual addres 5675d0
>>>>> That line is a bit suspicious
>>>>>
>>>>> Anyway, cpufreq is known to cause those kind of errors when the
>>>>> voltage / frequency association is not correct.
>>>>>
>>>>> Given the stack trace and that the BananaPro doesn't have cpufreq
>>>>> enabled, my first guess would be that it's what's happening. Could you
>>>>> try using the performance governor and see if it's more stable?
>>>>>
>>>>> If it is, then using this:
>>>>> https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
>>>>>
>>>>> will help you find the offending voltage-frequency couple.
>>>> For me it looks like they have all the same config regarding cpu governor
>>>> (Banana Pro, old kernel stable one, new kernel unstable ones)
>>> The Banana Pro doesn't have a regulator set up, so it will only change
>>> the frequency, not the voltage.
>>>
>>>> They all have the ondemand governor set:
>>>>
>>>> I set on the 2 unstable "new kernel Banana Pi R1":
>>>>
>>>> # Set to max performance
>>>> echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>>>> echo "performance" > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
>>> What are the results?
>> Stable since more than around 1,5 days. Normally they have been crashed for
>> such a long uptime. So it looks that the performance governor fixes it.
>>
>> I guess crashes occour because of changing CPU voltage and clock changes and
>> invalid data (e.g. also invalid RAM contents might be read, register
>> problems, etc).
>>
>> Any ideas how to fix it for ondemand mode, too?
> Run https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
>
>> But it doesn't explaing that it works with kernel 4.7.4 without any
>> problems.
> My best guess would be that cpufreq wasn't enabled at that time, or
> without voltage scaling.
>
Where can I see the voltage scaling parameters?
on DTS I don't see any difference between kernel 4.7.4 and 4.20.10
regarding voltage:
dtc -I dtb -O dts -o
/boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dts
/boot/dtb-4.20.10-200.fc29.armv7hl/sun7i-a20-lamobo-r1.dtb
There is another strange thing (tested with
kernel-5.0.0-0.rc8.git1.1.fc31.armv7hl, kernel-4.19.8-300.fc29.armv7hl,
kernel-4.20.13-200.fc29.armv7hl, kernel-4.20.10-200.fc29.armv7hl):
There is ALWAYS high CPU of around 10% in kworker:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18722 root 20 0 0 0 0 I 9.5 0.0 0:47.52
[kworker/1:3-events_freezable_power_]
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
776 root 20 0 0 0 0 I 8.6 0.0 0:02.77
[kworker/0:4-events]
Therefore CPU doesn't switch to low frequencies (see below).
Any ideas?
BTW: Still stable at aboout 2,5days on both devices. So solution IS the
performance governor.
Ciao,
Gerhard
================================================================================================================================================================
# monitor frequency
while true; do echo "========================================"; echo -n
"CPU_FREQ0: "; cat
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq; echo -n
"CPU_FREQ1: "; cat
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_cur_freq; sleep 1; done
================================================================================================================================================================
# Kernel 4.7.4:
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
# Kernel 4.20.10
========================================
CPU_FREQ0: 864000
CPU_FREQ1: 720000
========================================
CPU_FREQ0: 960000
CPU_FREQ1: 960000
========================================
CPU_FREQ0: 960000
CPU_FREQ1: 960000
========================================
CPU_FREQ0: 144000
CPU_FREQ1: 144000
========================================
CPU_FREQ0: 720000
CPU_FREQ1: 960000
========================================
CPU_FREQ0: 960000
CPU_FREQ1: 864000
========================================
CPU_FREQ0: 720000
CPU_FREQ1: 864000
========================================
CPU_FREQ0: 528000
CPU_FREQ1: 864000
Powered by blists - more mailing lists