[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <052ab7b7-ef09-a751-bb03-2cd5742083af@roeck-us.net>
Date: Thu, 22 Sep 2022 07:37:16 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Alexey Klimov <klimov.linux@...il.com>, wim@...ux-watchdog.org
Cc: linux-watchdog@...r.kernel.org, gregkh@...uxfoundation.org,
oneukum@...e.com, linux-usb@...r.kernel.org,
linux-kernel@...r.kernel.org, atishp@...osinc.com,
atishp@...shpatra.org, yury.norov@...il.com, aklimov@...hat.com,
atomlin@...hat.com, stern@...land.harvard.edu
Subject: Re: [PATCH v6] watchdog: add driver for StreamLabs USB watchdog
device
On 9/16/22 20:15, Alexey Klimov wrote:
> Hi Wim/Guenter,
>
> For me it seems that there could be a potential race condition. I have to rely
> on watchdog_active(&streamlabs_wdt->wdt_dev) function which tests the WDOG_ACTIVE
> bit in struct watchdog_device->status member.
> The watchdog_dev changes the state of the device with ->start() or ->ping() and
> ->stop() methods and updates the WDOG_ACTIVE accordingly.
> In {pre,post}_reset methods here I have to change the state of the device from
> running to stopped and back to running conditionally, however WDOG_ACTIVE bit
> could be updated in between these callbacks execution or starting/stopping
> the device can race.
> For instance, I see the potential dangerous race like this:
>
> CPUX CPUY
>
> .. watchdog_stop() {
> .. if (wdd->ops->stop) {
> ...
> err = wdd->ops->stop(wdd)
> }
> usb_streamlabs_wdt_pre_reset() {
> if (watchdog_active())
> stop_command(); /* WDOG_ACTIVE bit is still set
> ... here indicating that watchdog is
> } started, but ->stop() has already
> finished */
> ...
> usb_streamlabs_wdt_post_reset() {
> if (watchdog_active())
> start_command();
> }
> ... /* WDOG_ACTIVE is updated here */
> clear_bit(WDOG_ACTIVE, &wdd->status);
> }
>
> As a result, the watchdog subsystem "thinks" that watchdog is not active and should
> not be pinged. However, the driver observed using watchdog_active() that watchdog
> was active during {pre,post}_reset and restarted the device which will lead to
> unexpected reset. It is very unlikely race to happen but consequence is fatal.
> In other words, there are two independent paths leading to driver changing
> the state of the watchdog device and one path relies on status that can be changed
> by another path.
>
> Thinking about that I see the following approaches:
>
> 1. Introduce a variable in struct streamlabs_wdt that tracks the state of the
> watchdog device itself and checking/updating the state of a device happens under
> semaphore lock.
> Obviously, this "internal" to the driver state variable should be used in
> {pre,post}_reset. In case there will be other drivers (say, USB ones) they also
> need to implement this.
>
> or
>
> 2. The updates to wdd->status should happen under wd_data->lock.
> Currently, it is mutex-based. The acquiring and releasing the lock could be
> exported for the drivers to use. The mutex lock probably should be switched
> to a binary semaphore for that.
>
> In such case, in pre_reset() for example, I would need to do:
> static int pre_reset()
> {
> lock_wdd();
> acquire_internal_driver_lock();
>
> if (watchdog_active())
> stop_command();
> }
>
> static int post_reset()
> {
>
> if (watchdog_active())
> start_command();
>
> release_internal_driver_lock();
> unlock_wdd();
> }
>
> There should be an order that we have to acquire subsystem wdd lock first, then
> internal driver lock. Otherwise there could be deadlocks.
>
> This could be done if you think it's more wiser move.
>
> or
>
> 3. The {pre,post}_reset callbacks should execute watchdog_dev.c subsystem functions
> (not sure which functions exactly). Eventually, it will look similar to what is
> described in the previous point with respect to locks order.
> I meant something like this:
>
> static int pre_reset()
> {
> watchdog_dev_pre_reset_prepare();
> }
>
> static int post_reset()
> {
> watchdog_dev_post_reset_done();
> }
>
> In watchdog_dev.c:
> void watchdog_dev_pre_reset_prepare()
> {
> mutex_lock(&wd_data->lock); <-- should be changed to semaphore too?
>
> watchdog_stop(wdd); <-- without updating WDOG_ACTIVE bit?
> or there should be a way to indicate
> to watchdog_dev_post_reset_done() if
> watchdog should be started or not
> }
>
> void watchdog_dev_post_reset_done()
> {
> if (watchdog_active())
> watchdog_start(wdd);
>
> mutex_unlock(&wd_data->lock);
> }
>
> I didn't really thought about point 3 yet. For me personally the point 2 seems
> the like right way to go but you have more experience with that. The exported
> locks could be re-used by other drivers if needed in future.
> In case of point 1 each USB driver should deal with {pre,post}_reset by themselves.
>
> Any thoughts?
>
Please go with 1). pre_reset/post_reset functionality is a first in the watchdog
subsystem and the first to require locking outside the scope of a function or set
of functions. I'd rather avoid having to deal with the potential consequences
in the watchdog core. We can do that if/when it becomes more common and after
we have a good understanding of the potential consequences.
Thanks,
Guenter
> Thanks,
> Alexey
Powered by blists - more mailing lists