linux-kernel - Re: [PATCH v6] watchdog: add driver for StreamLabs USB watchdog device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <052ab7b7-ef09-a751-bb03-2cd5742083af@roeck-us.net>
Date:   Thu, 22 Sep 2022 07:37:16 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Alexey Klimov <klimov.linux@...il.com>, wim@...ux-watchdog.org
Cc:     linux-watchdog@...r.kernel.org, gregkh@...uxfoundation.org,
        oneukum@...e.com, linux-usb@...r.kernel.org,
        linux-kernel@...r.kernel.org, atishp@...osinc.com,
        atishp@...shpatra.org, yury.norov@...il.com, aklimov@...hat.com,
        atomlin@...hat.com, stern@...land.harvard.edu
Subject: Re: [PATCH v6] watchdog: add driver for StreamLabs USB watchdog
 device

On 9/16/22 20:15, Alexey Klimov wrote:
> Hi Wim/Guenter,
> 
> For me it seems that there could be a potential race condition. I have to rely
> on watchdog_active(&streamlabs_wdt->wdt_dev) function which tests the WDOG_ACTIVE
> bit in struct watchdog_device->status member.
> The watchdog_dev changes the state of the device with ->start() or ->ping() and
> ->stop() methods and updates the WDOG_ACTIVE accordingly.
> In {pre,post}_reset methods here I have to change the state of the device from
> running to stopped and back to running conditionally, however WDOG_ACTIVE bit
> could be updated in between these callbacks execution or starting/stopping
> the device can race.
> For instance, I see the potential dangerous race like this:
> 
> 	CPUX					CPUY
> 
> 	..				watchdog_stop() {
> 	..					if (wdd->ops->stop) {
> 							...
> 							err = wdd->ops->stop(wdd)
> 						}
> usb_streamlabs_wdt_pre_reset() {
> 	if (watchdog_active())
> 		stop_command();			/* WDOG_ACTIVE bit is still set
> 	...					 here indicating that watchdog is
> }						 started, but ->stop() has already
> 						 finished */
> 	...
> usb_streamlabs_wdt_post_reset() {
> 	if (watchdog_active())
> 		start_command();
> }
> 	...					/* WDOG_ACTIVE is updated here */
> 						clear_bit(WDOG_ACTIVE, &wdd->status);
> 					}
> 
> As a result, the watchdog subsystem "thinks" that watchdog is not active and should
> not be pinged. However, the driver observed using watchdog_active() that watchdog
> was active during {pre,post}_reset and restarted the device which will lead to
> unexpected reset. It is very unlikely race to happen but consequence is fatal.
> In other words, there are two independent paths leading to driver changing
> the state of the watchdog device and one path relies on status that can be changed
> by another path.
> 
> Thinking about that I see the following approaches:
> 
> 1. Introduce a variable in struct streamlabs_wdt that tracks the state of the
> watchdog device itself and checking/updating the state of a device happens under
> semaphore lock.
> Obviously, this "internal" to the driver state variable should be used in
> {pre,post}_reset. In case there will be other drivers (say, USB ones) they also
> need to implement this.
> 
> or
> 
> 2. The updates to wdd->status should happen under wd_data->lock.
> Currently, it is mutex-based. The acquiring and releasing the lock could be
> exported for the drivers to use. The mutex lock probably should be switched
> to a binary semaphore for that.
> 
> In such case, in pre_reset() for example, I would need to do:
> static int pre_reset()
> {
> 	lock_wdd();
> 	acquire_internal_driver_lock();
> 	
> 	if (watchdog_active())
> 		stop_command();
> }
> 
> static int post_reset()
> {
> 
> 	if (watchdog_active())
> 		start_command();
> 
> 	release_internal_driver_lock();
> 	unlock_wdd();
> }
> 
> There should be an order that we have to acquire subsystem wdd lock first, then
> internal driver lock. Otherwise there could be deadlocks.
> 
> This could be done if you think it's more wiser move.
> 
> or
> 
> 3. The {pre,post}_reset callbacks should execute watchdog_dev.c subsystem functions
> (not sure which functions exactly). Eventually, it will look similar to what is
> described in the previous point with respect to locks order.
> I meant something like this:
> 
> static int pre_reset()
> {
> 	watchdog_dev_pre_reset_prepare();
> }
> 
> static int post_reset()
> {
> 	watchdog_dev_post_reset_done();
> }
> 
> In watchdog_dev.c:
> void watchdog_dev_pre_reset_prepare()
> {
> 	mutex_lock(&wd_data->lock);	<-- should be changed to semaphore too?
> 
> 	watchdog_stop(wdd);		<-- without updating WDOG_ACTIVE bit?
> 					 or there should be a way to indicate
> 					 to watchdog_dev_post_reset_done() if
> 					 watchdog should be started or not
> }
> 
> void watchdog_dev_post_reset_done()
> {
> 	if (watchdog_active())
> 		watchdog_start(wdd);
> 
>          mutex_unlock(&wd_data->lock);
> }
> 
> I didn't really thought about point 3 yet. For me personally the point 2 seems
> the like right way to go but you have more experience with that. The exported
> locks could be re-used by other drivers if needed in future.
> In case of point 1 each USB driver should deal with {pre,post}_reset by themselves.
> 
> Any thoughts?
> 

Please go with 1). pre_reset/post_reset functionality is a first in the watchdog
subsystem and the first to require locking outside the scope of a function or set
of functions. I'd rather avoid having to deal with the potential consequences
in the watchdog core. We can do that if/when it becomes more common and after
we have a good understanding of the potential consequences.

Thanks,
Guenter

> Thanks,
> Alexey