linux-kernel - Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ec17c6c8-e697-4a5a-a705-bff24daae7b2@gmail.com>
Date:   Wed, 29 Nov 2023 22:20:38 +0100
From:   Rafał Miłecki <zajec5@...il.com>
To:     Linus Walleij <linus.walleij@...aro.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Russell King <linux@...linux.org.uk>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Florian Fainelli <f.fainelli@...il.com>,
        linux-clk@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        openwrt-devel@...ts.openwrt.org,
        bcm-kernel-feedback-list@...adcom.com
Subject: Re: ARM BCM53573 SoC hangs/lockups caused by locks/clock/random
 changes

Hi,

it's a late reply but I didn't find enough determination earlier.

On 8.09.2023 10:10, Linus Walleij wrote:
> On Mon, Sep 4, 2023 at 10:34 AM Rafał Miłecki <zajec5@...il.com> wrote:
> 
>> I'm clueless at this point.
>> Maybe someone can come up with an idea of actual issue & ideally a
>> solution.
> 
> Damn this is frustrating.
> 
>> 2. Clock (arm,armv7-timer)
>>
>> While comparing main clock in Broadcom's SDK with upstream one I noticed
>> a tiny difference: mask value. I don't know it it makes any sense but
>> switching from CLOCKSOURCE_MASK(56) to CLOCKSOURCE_MASK(64) in
>> arm_arch_timer.c (to match SDK) increases average uptime (time before a
>> hang/lockup happens) from 4 minutes to 36 minutes.
> 
> This could be related to how often the system goes to idle.
> 
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 5678)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 5678)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 5678)
>> +               arch_cpu_idle();
>> +       if (cpu_idle_force_poll == 1234)
>> +               arch_cpu_idle();
> 
> Idle again.
> 
> I would have tried to see what arch_cpu_idle() is doing.
> 
> arm_pm_idle() or cpu_do_idle()?

In my case arm_pm_idle is NULL.


> What happens if you just put return in arch_cpu_idle()
> so it does nothing?

Doesn't help. I also tried putting:
udelay(10);
and
udelay(1000);
at the arch_cpu_idle() beginning. None helped.


Here comes more interesting experiment though. Putting there:

if (!(foo++ % 10000)) {
	pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}

doesn't seem to help.


Putting following however seems to make kernel/device stable:

if (!(foo++ % 100)) {
	pr_info("[%s] arm_pm_idle:%ps\n", __func__, arm_pm_idle);
}


I think I'm just going to assume those chipsets are simply hw broken.