[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac2080a1-3b00-ac9e-cd49-d1ee84c6ca25@roeck-us.net>
Date:   Sun, 30 Aug 2020 13:48:24 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Laurent Pinchart <laurent.pinchart@...asonboard.com>
Cc:     Mauro Carvalho Chehab <mchehab@...nel.org>,
        Sakari Ailus <sakari.ailus@....fi>,
        linux-uvc-devel@...ts.sourceforge.net, linux-usb@...r.kernel.org,
        linux-media@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/5] media: uvcvideo: Fix race conditions
Hi Laurent,
On 8/30/20 8:58 AM, Laurent Pinchart wrote:
> Hi Guenter,
> 
> Thank you for the patches.
> 
> On Sun, Aug 30, 2020 at 08:04:38AM -0700, Guenter Roeck wrote:
>> The uvcvideo code has no lock protection against USB disconnects
>> while video operations are ongoing. This has resulted in random
>> error reports, typically pointing to a crash in usb_ifnum_to_if(),
>> called from usb_hcd_alloc_bandwidth(). A typical traceback is as
>> follows.
>>
>> usb 1-4: USB disconnect, device number 3
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>> PGD 0 P4D 0
>> Oops: 0000 [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 5633 Comm: V4L2CaptureThre Not tainted 4.19.113-08536-g5d29ca36db06 #1
>> Hardware name: GOOGLE Edgar, BIOS Google_Edgar.7287.167.156 03/25/2019
>> RIP: 0010:usb_ifnum_to_if+0x29/0x40
>> Code: <...>
>> RSP: 0018:ffffa46f42a47a80 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff904a396c9000
>> RDX: ffff904a39641320 RSI: 0000000000000001 RDI: 0000000000000000
>> RBP: ffffa46f42a47a80 R08: 0000000000000002 R09: 0000000000000000
>> R10: 0000000000009975 R11: 0000000000000009 R12: 0000000000000000
>> R13: ffff904a396b3800 R14: ffff904a39e88000 R15: 0000000000000000
>> FS: 00007f396448e700(0000) GS:ffff904a3ba00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 000000016cb46000 CR4: 00000000001006f0
>> Call Trace:
>>  usb_hcd_alloc_bandwidth+0x1ee/0x30f
>>  usb_set_interface+0x1a3/0x2b7
>>  uvc_video_start_transfer+0x29b/0x4b8 [uvcvideo]
>>  uvc_video_start_streaming+0x91/0xdd [uvcvideo]
>>  uvc_start_streaming+0x28/0x5d [uvcvideo]
>>  vb2_start_streaming+0x61/0x143 [videobuf2_common]
>>  vb2_core_streamon+0xf7/0x10f [videobuf2_common]
>>  uvc_queue_streamon+0x2e/0x41 [uvcvideo]
>>  uvc_ioctl_streamon+0x42/0x5c [uvcvideo]
>>  __video_do_ioctl+0x33d/0x42a
>>  video_usercopy+0x34e/0x5ff
>>  ? video_ioctl2+0x16/0x16
>>  v4l2_ioctl+0x46/0x53
>>  do_vfs_ioctl+0x50a/0x76f
>>  ksys_ioctl+0x58/0x83
>>  __x64_sys_ioctl+0x1a/0x1e
>>  do_syscall_64+0x54/0xde
>>
>> While this is problem rarely observed in the field, it is relatively easy
>> to reproduce by adding msleep() calls into the code.
>>
>> I don't presume to claim that I found every issue, but this patch series
>> should fix at least the major problems.
>>
>> The patch series was tested exensively on a Chromebook running chromeos-4.19
>> and on a Linux system running a v5.8.y based kernel.
> 
> I'll review each patch individually, but I think 2/5, 4/5 and 5/5 should
> be handled in the V4L2 core, not the uvcvideo driver. Otherwise we would
> have to replicate that logic in all drivers, while I think it can easily
> be implemented in a generic fashion as previously discussed.
> 
The problem is that the v4l2 core already does support locking. There is
a global lock, in struct video_device, a queue lock in struct v4l2_m2m_ctx,
and another queue lock in struct vb2_queue. However, all of those have
to be initialized from the driver. The uvcvideo driver uses its own locks and
does not set the lock pointers in the various generic structures. I was able
to figure out how to use the uvcvideo specific locks in the uvcvideo
driver, but all my attempts to initialize and use the generic locks failed.
It may well be that the generic code isn't entirely clean - for example
I am not sure if the lock protection in v4l2_open() is complete since
it doesn't handle disconnects after checking if the video device is still
registered (and I don't really see the point of the second video_is_registered()
call in v4l2_open). However, that may just be a lack of understanding on my
side on how the code is supposed to work. Maybe the actual device open function
is expected to have its own protection against underlying hardware removal
and video device unregistration while opening the device.
[ Regarding the second call to video_is_registered() in v4l2_open():
  Add msleep(5000) between it and the call to the driver open function,
  disconnect the device during the sleep, and it will happily call the device
  open function on a non-registered video device. That is what patch 5/5 tries
  to fix or the uvcvideo driver.
  The same problem applies to other file operations in v4l2-dev.c: They all
  check if the video device is registered before calling the device
  specific code, but I don't really see the point of doing that because
  there is no protection against unregistration after the check was made
  and before/while the device specific code is running.
  Patch 4/5 tries to fix this for the uvcvideo driver.
  If that is a bug in the v4l2 code, I'll be happy to work on a fix,
  but the only generic fix I could think of would be to utilize the lock in
  struct video_device ... but that lock isn't initialized by the uvcvideo
  driver.
]
Either case, I don't think my understanding of the interaction between
v4l2 and uvcvideo is good enough to make more invasive changes. I _think_
any generic improvement should start with refactoring the uvcvideo code to
use the v4l2 locking mechanism. However, from the exchange here, my
understanding is that this locking mechanism is not used on purpose. That
means we'll have a uvcvideo specific locking mechanism, period, and I don't
think it is even possible to solve the problem without utilizing this locking
mechanism.
Of course, it may as well be that I am completely off track and clueless.
After all, the first time I looked into this code was about two weeks ago.
So please bear with me if I talk nonsense.
Thanks,
Guenter
>> ----------------------------------------------------------------
>> Guenter Roeck (5):
>>       media: uvcvideo: Cancel async worker earlier
>>       media: uvcvideo: Lock video streams and queues while unregistering
>>       media: uvcvideo: Release stream queue when unregistering video device
>>       media: uvcvideo: Protect uvc queue file operations against disconnect
>>       media: uvcvideo: In uvc_v4l2_open, check if video device is registered
>>
>>  drivers/media/usb/uvc/uvc_ctrl.c   | 11 ++++++----
>>  drivers/media/usb/uvc/uvc_driver.c | 12 ++++++++++
>>  drivers/media/usb/uvc/uvc_queue.c  | 32 +++++++++++++++++++++++++--
>>  drivers/media/usb/uvc/uvc_v4l2.c   | 45 ++++++++++++++++++++++++++++++++++++--
>>  drivers/media/usb/uvc/uvcvideo.h   |  1 +
>>  5 files changed, 93 insertions(+), 8 deletions(-)
> 
Powered by blists - more mailing lists
 
