lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38633f6f-c14c-4a74-b372-cdfdab80619e@linaro.org>
Date: Wed, 10 Sep 2025 18:39:02 +0200
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: markus.stockhausen@....de, tglx@...utronix.de,
 linux-kernel@...r.kernel.org, howels@...thatwemight.be, bjorn@...k.no
Subject: Re: AW: [PATCH 1/4] clocksource/drivers/timer-rtl-otto: work around
 dying timers

On 10/09/2025 12:16, markus.stockhausen@....de wrote:
>> Von: Daniel Lezcano <daniel.lezcano@...aro.org>
>> Gesendet: Mittwoch, 10. September 2025 11:03
>>
>> On 04/08/2025 10:03, Markus Stockhausen wrote:
>>> The OpenWrt distribution has switched from kernel longterm 6.6 to
>>> 6.12. Reports show that devices with the Realtek Otto switch platform
>>> die during operation and are rebooted by the watchdog. Sorting out
>>> other possible reasons the Otto timer is to blame. The platform
>>> currently consists of 4 targets with different hardware revisions.
>>> It is not 100% clear which devices and revisions are affected.
>>>
>>> Analysis shows:
>>>
>>> A more aggressive sched/deadline handling leads to more timer starts
>>> with small intervals. This increases the bug chances. See
>>> https://marc.info/?l=linux-kernel&m=175276556023276&w=2
>>>
>>> Focusing on the real issue a hardware limitation on some devices was
>>> found. There is a minimal chance that a timer ends without firing an
>>> interrupt if it is reprogrammed within the 5us before its expiration
>>> time.
>>
>> Is it possible the timer IRQ flag is reset when setting the new counter
>> value ?
>>
>> While in the code path with the interrupt disabled, the timer expires in
>> these 5us, the IRQ flag is raised, then the driver sets a new value and
>> this flag is reset automatically, thus losing the current timer expiration ?
> 
> Something like this ...
> 
> During my analysis I tried a lot of things to identify the situation that
> leads to this error. Especially just before the reprogramming command
> 
> static inline void rttm_enable_timer(void __iomem *base, u32 mode, u32 divisor)
> {
>    iowrite32(RTTM_CTRL_ENABLE | mode | divisor, base + RTTM_CTRL);
> }
> 
> What I tried:
> 
> 1. Read out the current (remaining) timer value: In the error cases
> this can give any value between 1 (=320ns) and 15 (=4800ns).
> 
> 2. Check if IRQ flag is already set and IRQ might trigger next. This was
> never the case.

It would have been interesting to check if we are in the time bug range 
to wait with a delay (5us), check the IRQ flag as the current timer 
should have expired, then set the counter and recheck the IRQ flag.


> 3. Reorder reprogramming sequence (as far as possible). Only the
> double reprogramming helped here.
> 
> So nothing we can do to actively identify and work around the buggy
> situation. There is some hardware limitation between expiring timers
> and reprgramming. Due to missing erratum the current bugfix is the
> only (and best) solution I have.
> 
> Markus
> 


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ