lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <007901dc223b$feb371a0$fc1a54e0$@gmx.de>
Date: Wed, 10 Sep 2025 12:16:36 +0200
From: <markus.stockhausen@....de>
To: "'Daniel Lezcano'" <daniel.lezcano@...aro.org>,
	<tglx@...utronix.de>,
	<linux-kernel@...r.kernel.org>,
	<howels@...thatwemight.be>,
	<bjorn@...k.no>
Subject: AW: [PATCH 1/4] clocksource/drivers/timer-rtl-otto: work around dying timers

> Von: Daniel Lezcano <daniel.lezcano@...aro.org> 
> Gesendet: Mittwoch, 10. September 2025 11:03
> 
> On 04/08/2025 10:03, Markus Stockhausen wrote:
> > The OpenWrt distribution has switched from kernel longterm 6.6 to
> > 6.12. Reports show that devices with the Realtek Otto switch platform
> > die during operation and are rebooted by the watchdog. Sorting out
> > other possible reasons the Otto timer is to blame. The platform
> > currently consists of 4 targets with different hardware revisions.
> > It is not 100% clear which devices and revisions are affected.
> > 
> > Analysis shows:
> > 
> > A more aggressive sched/deadline handling leads to more timer starts
> > with small intervals. This increases the bug chances. See
> > https://marc.info/?l=linux-kernel&m=175276556023276&w=2
> > 
> > Focusing on the real issue a hardware limitation on some devices was
> > found. There is a minimal chance that a timer ends without firing an
> > interrupt if it is reprogrammed within the 5us before its expiration
> > time.
>
> Is it possible the timer IRQ flag is reset when setting the new counter 
> value ?
>
> While in the code path with the interrupt disabled, the timer expires in 
> these 5us, the IRQ flag is raised, then the driver sets a new value and 
> this flag is reset automatically, thus losing the current timer expiration ?

Something like this ...

During my analysis I tried a lot of things to identify the situation that
leads to this error. Especially just before the reprogramming command

static inline void rttm_enable_timer(void __iomem *base, u32 mode, u32 divisor)
{
  iowrite32(RTTM_CTRL_ENABLE | mode | divisor, base + RTTM_CTRL);
}

What I tried: 

1. Read out the current (remaining) timer value: In the error cases
this can give any value between 1 (=320ns) and 15 (=4800ns).

2. Check if IRQ flag is already set and IRQ might trigger next. This was 
never the case. 

3. Reorder reprogramming sequence (as far as possible). Only the
double reprogramming helped here.

So nothing we can do to actively identify and work around the buggy
situation. There is some hardware limitation between expiring timers
and reprgramming. Due to missing erratum the current bugfix is the
only (and best) solution I have.

Markus 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ