linux-kernel - Re: [PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.44L0.1801311053240.1370-100000@iolanthe.rowland.org>
Date:   Wed, 31 Jan 2018 11:02:47 -0500 (EST)
From:   Alan Stern <stern@...land.harvard.edu>
To:     Haiqing Bai <Haiqing.Bai@...driver.com>
cc:     gregkh@...uxfoundation.org, <linux-usb@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <Shigeru.Yoshida@...driver.com>
Subject: Re: [PATCH] ohci-hcd: Fix race condition caused by ohci_urb_enqueue()
 and io_watchdog_func()

On Wed, 31 Jan 2018, Haiqing Bai wrote:

> Running io_watchdog_func() while ohci_urb_enqueue() is running can
> cause a race condition where ohci->prev_frame_no is corrupted and the
> watchdog can mis-detect following error:
> 
>   ohci-platform 664a0800.usb: frame counter not updating; disabled
>   ohci-platform 664a0800.usb: HC died; cleaning up
> 
> Specifically, following scenario causes a race condition:
> 
>   1. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
>      and enters the critical section
>   2. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
>      returns false
>   3. ohci_urb_enqueue() sets ohci->prev_frame_no to a frame number
>      read by ohci_frame_no(ohci)
>   4. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
>   5. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
>      flags) and exits the critical section
>   6. Later, ohci_urb_enqueue() is called
>   7. ohci_urb_enqueue() calls spin_lock_irqsave(&ohci->lock, flags)
>      and enters the critical section
>   8. The timer scheduled on step 4 expires and io_watchdog_func() runs
>   9. io_watchdog_func() calls spin_lock_irqsave(&ohci->lock, flags)
>      and waits on it because ohci_urb_enqueue() is already in the
>      critical section on step 7
>  10. ohci_urb_enqueue() calls timer_pending(&ohci->io_watchdog) and it
>      returns false
>  11. ohci_urb_enqueue() sets ohci->prev_frame_no to new frame number
>      read by ohci_frame_no(ohci) because the frame number proceeded
>      between step 3 and 6
>  12. ohci_urb_enqueue() schedules io_watchdog_func() with mod_timer()
>  13. ohci_urb_enqueue() calls spin_unlock_irqrestore(&ohci->lock,
>      flags) and exits the critical section, then wake up
>      io_watchdog_func() which is waiting on step 9
>  14. io_watchdog_func() enters the critical section
>  15. io_watchdog_func() calls ohci_frame_no(ohci) and set frame_no
>      variable to the frame number
>  16. io_watchdog_func() compares frame_no and ohci->prev_frame_no
> 
> On step 16, because this calling of io_watchdog_func() is scheduled on
> step 4, the frame number set in ohci->prev_frame_no is expected to the
> number set on step 3.  However, ohci->prev_frame_no is overwritten on
> step 11.  Because step 16 is executed soon after step 11, the frame
> number might not proceed, so ohci->prev_frame_no must equals to
> frame_no.

That is a nasty bug!

> To address above scenario, this patch introduces timer_running flag to
> ohci_hcd structure.  Setting true to ohci->timer_running indicates
> io_watchdog_func() is scheduled or is running.  ohci_urb_enqueue()
> checks the flag when it schedules the watchdog (step 4 and 12 above),
> so ohci->prev_frame_no is not overwritten while io_watchdog_func() is
> running.

Instead of adding an extra flag variable, which has to be kept in sync 
with the timer routine, how about defining a special sentinel value for 
prev_frame_no?  For example:

#define IO_WATCHDOG_OFF		0xffffff00

Then whenever the timer isn't scheduled or running, set
ohci->prev_frame_no to IO_WATCHDOG_OFF.  And instead of testing
timer_pending(), compare prev_frame_no to this special value.

I think that approach will be slightly more robust.

Alan Stern