linux-kernel - Re: [PATCH] watchdog: sbsa: Adjust keepalive timeout to avoid MediaTek WS0 race condition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cbd0cd0c-83d9-49db-9e84-a8807e6dce75@nvidia.com>
Date: Thu, 17 Jul 2025 17:19:35 -0700
From: Aaron Plattner <aplattner@...dia.com>
To: Guenter Roeck <linux@...ck-us.net>,
 Wim Van Sebroeck <wim@...ux-watchdog.org>
Cc: linux-watchdog@...r.kernel.org, linux-kernel@...r.kernel.org,
 Timur Tabi <ttabi@...dia.com>
Subject: Re: [PATCH] watchdog: sbsa: Adjust keepalive timeout to avoid
 MediaTek WS0 race condition

On 7/16/25 12:01 PM, Guenter Roeck wrote:
> On 7/8/25 16:33, Aaron Plattner wrote:
>> The MediaTek implementation of the sbsa_gwdt watchdog has a race
>> condition where a write to SBSA_GWDT_WRR is ignored if it occurs while
>> the hardware is processing a timeout refresh that asserts WS0.
>>
>> Detect this based on the hardware implementer and adjust wdd->timeout to
>> avoid the race.
>>
>> Signed-off-by: Aaron Plattner <aplattner@...dia.com>
>> Acked-by: Timur Tabi <ttabi@...dia.com>
>> ---
>>   drivers/watchdog/sbsa_gwdt.c | 52 +++++++++++++++++++++++++++++++++---
>>   1 file changed, 49 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c
>> index 5f23913ce3b4..81012dbe9088 100644
>> --- a/drivers/watchdog/sbsa_gwdt.c
>> +++ b/drivers/watchdog/sbsa_gwdt.c
>> @@ -75,11 +75,17 @@
>>   #define SBSA_GWDT_VERSION_MASK  0xF
>>   #define SBSA_GWDT_VERSION_SHIFT 16
>> +#define SBSA_GWDT_IMPL_MASK    0x7FF
>> +#define SBSA_GWDT_IMPL_SHIFT    0
>> +#define SBSA_GWDT_IMPL_MEDIATEK    0x426
>> +
>>   /**
>>    * struct sbsa_gwdt - Internal representation of the SBSA GWDT
>>    * @wdd:        kernel watchdog_device structure
>>    * @clk:        store the System Counter clock frequency, in Hz.
>>    * @version:            store the architecture version
>> + * @need_ws0_race_workaround:
>> + *            indicate whether to adjust wdd->timeout to avoid a race 
>> with WS0
>>    * @refresh_base:    Virtual address of the watchdog refresh frame
>>    * @control_base:    Virtual address of the watchdog control frame
>>    */
>> @@ -87,6 +93,7 @@ struct sbsa_gwdt {
>>       struct watchdog_device    wdd;
>>       u32            clk;
>>       int            version;
>> +    bool            need_ws0_race_workaround;
>>       void __iomem        *refresh_base;
>>       void __iomem        *control_base;
>>   };
>> @@ -161,6 +168,31 @@ static int sbsa_gwdt_set_timeout(struct 
>> watchdog_device *wdd,
>>            */
>>           sbsa_gwdt_reg_write(((u64)gwdt->clk / 2) * timeout, gwdt);
>> +    /*
>> +     * Some watchdog hardware has a race condition where it will ignore
>> +     * sbsa_gwdt_keepalive() if it is called at the exact moment that a
>> +     * timeout occurs and WS0 is being asserted. Unfortunately, the 
>> default
>> +     * behavior of the watchdog core is very likely to trigger this race
>> +     * when action=0 because it programs WOR to be half of the desired
>> +     * timeout, and watchdog_next_keepalive() chooses the exact same 
>> time to
>> +     * send keepalive pings.
>> +     *
>> +     * This triggers a race where sbsa_gwdt_keepalive() can be called 
>> right
>> +     * as WS0 is being asserted, and affected hardware will ignore that
>> +     * write and continue to assert WS0. After another (timeout / 2)
>> +     * seconds, the same race happens again. If the driver wins then the
>> +     * explicit refresh will reset WS0 to false but if the hardware 
>> wins,
>> +     * then WS1 is asserted and the system resets.
>> +     *
>> +     * Avoid the problem by scheduling keepalive heartbeats one second
>> +     * earlier than the WOR timeout.
>> +     *
>> +     * This workaround might not be needed in a future revision of the
>> +     * hardware.
>> +     */
>> +    if (gwdt->need_ws0_race_workaround)
>> +        wdd->timeout -= 2;
>> +
> 
> It seems to me that this is still racy. If the ping is ignored, I would assume
> that this is reflected in the watchdog registers. How about reading the status
> if the workaround is needed and issuing another keepalive if it was ignored ?
> Or just always issue a second write to SBSA_GWDT_WRR in that case, maybe after
> some short delay ?

The window for the race is very small, so triggering the refresh a full 
second earlier should be plenty of time. Are you worried that the 
refresh will be delayed by a second and hit the race window anyway? I'd 
have to check with MediaTek when it is okay to read the status registers 
to check whether another refresh is needed and whether we need to delay 
the read and for how long.

If it would be better, we can also work around the problem by doing the 
refresh *after* WS0 is triggered. We just can't do it at exactly the 
same time.

The other option, if you think it's okay, would be to do the refresh 
from the WS0 interrupt handler. I confirmed in local testing that that 
reliably resets WS0 and the SBSA documentation suggests that that's the 
intended way to do the refresh to begin with.

-- Aaron

> 
> Guenter
>