[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38633f6f-c14c-4a74-b372-cdfdab80619e@linaro.org>
Date: Wed, 10 Sep 2025 18:39:02 +0200
From: Daniel Lezcano <daniel.lezcano@...aro.org>
To: markus.stockhausen@....de, tglx@...utronix.de,
linux-kernel@...r.kernel.org, howels@...thatwemight.be, bjorn@...k.no
Subject: Re: AW: [PATCH 1/4] clocksource/drivers/timer-rtl-otto: work around
dying timers
On 10/09/2025 12:16, markus.stockhausen@....de wrote:
>> Von: Daniel Lezcano <daniel.lezcano@...aro.org>
>> Gesendet: Mittwoch, 10. September 2025 11:03
>>
>> On 04/08/2025 10:03, Markus Stockhausen wrote:
>>> The OpenWrt distribution has switched from kernel longterm 6.6 to
>>> 6.12. Reports show that devices with the Realtek Otto switch platform
>>> die during operation and are rebooted by the watchdog. Sorting out
>>> other possible reasons the Otto timer is to blame. The platform
>>> currently consists of 4 targets with different hardware revisions.
>>> It is not 100% clear which devices and revisions are affected.
>>>
>>> Analysis shows:
>>>
>>> A more aggressive sched/deadline handling leads to more timer starts
>>> with small intervals. This increases the bug chances. See
>>> https://marc.info/?l=linux-kernel&m=175276556023276&w=2
>>>
>>> Focusing on the real issue a hardware limitation on some devices was
>>> found. There is a minimal chance that a timer ends without firing an
>>> interrupt if it is reprogrammed within the 5us before its expiration
>>> time.
>>
>> Is it possible the timer IRQ flag is reset when setting the new counter
>> value ?
>>
>> While in the code path with the interrupt disabled, the timer expires in
>> these 5us, the IRQ flag is raised, then the driver sets a new value and
>> this flag is reset automatically, thus losing the current timer expiration ?
>
> Something like this ...
>
> During my analysis I tried a lot of things to identify the situation that
> leads to this error. Especially just before the reprogramming command
>
> static inline void rttm_enable_timer(void __iomem *base, u32 mode, u32 divisor)
> {
> iowrite32(RTTM_CTRL_ENABLE | mode | divisor, base + RTTM_CTRL);
> }
>
> What I tried:
>
> 1. Read out the current (remaining) timer value: In the error cases
> this can give any value between 1 (=320ns) and 15 (=4800ns).
>
> 2. Check if IRQ flag is already set and IRQ might trigger next. This was
> never the case.
It would have been interesting to check if we are in the time bug range
to wait with a delay (5us), check the IRQ flag as the current timer
should have expired, then set the counter and recheck the IRQ flag.
> 3. Reorder reprogramming sequence (as far as possible). Only the
> double reprogramming helped here.
>
> So nothing we can do to actively identify and work around the buggy
> situation. There is some hardware limitation between expiring timers
> and reprgramming. Due to missing erratum the current bugfix is the
> only (and best) solution I have.
>
> Markus
>
--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Powered by blists - more mailing lists