[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec17c6c8-e697-4a5a-a705-bff24daae7b2@gmail.com>
Date: Wed, 29 Nov 2023 22:20:38 +0100
From: Rafał Miłecki <zajec5@...il.com>
To: Linus Walleij <linus.walleij@...aro.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>, Russell King <linux@...linux.org.uk>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
Thomas Gleixner <tglx@...utronix.de>, Florian Fainelli
<f.fainelli@...il.com>, linux-clk@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, openwrt-devel@...ts.openwrt.org,
bcm-kernel-feedback-list@...adcom.com
Subject: Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random
changes
Hi,
it's a late reply but I didn't find enough determination earlier.
On 8.09.2023 10:10, Linus Walleij wrote:
> On Mon, Sep 4, 2023 at 10:34 AM Rafał Miłecki <zajec5@...il.com> wrote:
>
>> I'm clueless at this point.
>> Maybe someone can come up with an idea of actual issue & ideally a
>> solution.
>
> Damn this is frustrating.
>
>> 2. Clock (arm,armv7-timer)
>>
>> While comparing main clock in Broadcom's SDK with upstream one I noticed
>> a tiny difference: mask value. I don't know it it makes any sense but
>> switching from CLOCKSOURCE_MASK(56) to CLOCKSOURCE_MASK(64) in
>> arm_arch_timer.c (to match SDK) increases average uptime (time before a
>> hang/lockup happens) from 4 minutes to 36 minutes.
>
> This could be related to how often the system goes to idle.
>
>> + if (cpu_idle_force_poll == 1234)
>> + arch_cpu_idle();
>> + if (cpu_idle_force_poll == 5678)
>> + arch_cpu_idle();
>> + if (cpu_idle_force_poll == 1234)
>> + arch_cpu_idle();
>> + if (cpu_idle_force_poll == 5678)
>> + arch_cpu_idle();
>> + if (cpu_idle_force_poll == 1234)
>> + arch_cpu_idle();
>> + if (cpu_idle_force_poll == 5678)
>> + arch_cpu_idle();
>> + if (cpu_idle_force_poll == 1234)
>> + arch_cpu_idle();
>
> Idle again.
>
> I would have tried to see what arch_cpu_idle() is doing.
>
> arm_pm_idle() or cpu_do_idle()?
In my case arm_pm_idle is NULL.
> What happens if you just put return in arch_cpu_idle()
> so it does nothing?
Doesn't help. I also tried putting:
udelay(10);
and
udelay(1000);
at the arch_cpu_idle() beginning. None helped.
Here comes more interesting experiment though. Putting there:
if (!(foo++ % 10000)) {
pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}
doesn't seem to help.
Putting following however seems to make kernel/device stable:
if (!(foo++ % 100)) {
pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}
I think I'm just going to assume those chipsets are simply hw broken.
Powered by blists - more mailing lists