[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZSCP0clqb1Nn/Ft3@pengutronix.de>
Date: Sat, 7 Oct 2023 00:53:05 +0200
From: Michael Grzeschik <mgr@...gutronix.de>
To: Avichal Rakesh <arakesh@...gle.com>
Cc: Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Daniel Scally <dan.scally@...asonboard.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
jchowdhary@...gle.com, etalvala@...gle.com,
linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 0/3] usb: gadget: uvc: stability fixes on STREAMOFF.
On Fri, Oct 06, 2023 at 10:00:11AM -0700, Avichal Rakesh wrote:
>
>
>On 10/5/23 15:05, Michael Grzeschik wrote:
>> Hi Avichal,
>>
>> On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote:
>>> On 10/5/23 03:14, Michael Grzeschik wrote:
>>>> On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote:
>>>>> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote:
>>>>>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote:
>>>>>> > We have been seeing two main stability issues that uvc gadget driver
>>>>>> > runs into when stopping streams:
>>>>>> > 1. Attempting to queue usb_requests to a disabled usb_ep
>>>>>> > 2. use-after-free issue for inflight usb_requests
>>>>>> >
>>>>>> > The three patches below fix the two issues above. Patch 1/3 fixes the
>>>>>> > first issue, and Patch 2/3 and 3/3 fix the second issue.
>>>>>> >
>>>>>> > Avichal Rakesh (3):
>>>>>> > usb: gadget: uvc: prevent use of disabled endpoint
>>>>>> > usb: gadget: uvc: Allocate uvc_requests one at a time
>>>>>> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests
>>>>>> >
>>>>>> > drivers/usb/gadget/function/f_uvc.c | 11 +-
>>>>>> > drivers/usb/gadget/function/f_uvc.h | 2 +-
>>>>>> > drivers/usb/gadget/function/uvc.h | 6 +-
>>>>>> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++-
>>>>>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++-------
>>>>>> > 5 files changed, 164 insertions(+), 65 deletions(-)
>>>>>>
>>>>>> These patches are not applying on gregkh/usb-testing since
>>>>>> Greg did take my patches first. I have already rebased them.
>>>>>
>>>>> I think they got merged too soon :-( We could fix things on top, but
>>>>> there's very little time to do so for v6.7.
>>>>
>>>> Agreed. I was jumping from one workaround to another one, since this
>>>> is not easy to fix in a proper way. And still after this long discussion
>>>> with Avichal I don't think we are there yet.
>>>>
>>>>
>>>> So far the first two patches from Avichal look legit. But the overall
>>>> Use-After-Free fix is yet to be done properly.
>>>>
>>>> The "abondoned" method he suggested is really bad to follow and will
>>>> add too much complexity and will be hard to debug.
>>>>
>>>> IMHO it should be possible to introduce two cleanup pathes.
>>>>
>>>> One path would be in the uvc_cleanup_requests that will cleanup the
>>>> requests that are actually not used in the controller and are registered
>>>> in the req_free list.
>>>>
>>>> The second path would be the complete functions that are being run
>>>> from the controller and will ensure that the cleanup will really free
>>>> the requests from the controller after they were consumed.
>>>>
>>>> What do you think?
>>>
>>> I am not sure I follow. Patch 3/3 does exactly what you say here.
>>
>> Yes, it was just to summ up what the latest state of the idea was,
>> so Laurent does not read the whole thread in detail. Sorry for not
>> being clear enough about that.
>
>Whoops! Sorry about the misunderstanding!
>
>>
>>> There are two cleanup paths:
>>> 1. uvcg_video_disable cleans up only the requests in req_free, and
>>> 2. complete handler cleans up the in-flight requests.
>>>
>>> The "abandoned" flag is simply to let the completion handler know
>>> which requests to clean up and which ones to re-queue back to
>>> the gadget driver.
>>
>> What I don't get is, why in the case of shutdown there needs to
>> be something re-queued back to the gadget driver. There should not
>> need to be any sort of barrier flag for the requests. Just the
>> complete handler running past a barrier where it knows that the
>> whole device is stopped. So every call on complete should then clean
>> that exact request it is touching currently.
>>
>> I don't know where the extra complexity comes from.
>
>A lot of this complexity comes from assuming a back to back
>STREAMOFF -> STREAMON sequence is possible where the gadget driver
>doesn't have the time to clean up all in-flight usb_requests.
>However, looking through the usb gadget APIs again, and it
>looks like usb_ep_disable enforces that all requests will
>be sent back to the gadget driver before it returns.
Great!
>So you're right:
>With Patch 1/3 in place, I think we can just guard on uvc->state
>alone, because control requests are blocked until usb_ep_disable
>is finished anyway. I'll upload v4 with the "is_abandoned"
>flag removed and the checks simplified once I've verified the
>fix locally.
>
>That should also remove any bookkeeping issues that may have
>triggered the stack below.
I am currious if we should handle -ECONNRESET and -ESHUTDOWN in more
detail in the completion handler and make sure that the request will not
be added into the req_free list from there.
Will review your v4 then.
>>> The other "complications" are around making sure we can trust
>>> the values in an inherently racey situation. The reasoning
>>> can admittedly be difficult to follow at a glance, which incidentally
>>> is why I went with a simple to prove timed wait in the past
>>> (https://lore.kernel.org/20230912041910.726442-3-arakesh@google.com).
>>>
>>> I am not suggesting we go back to a timed wait, but please do look
>>> at the patch and let me know which parts don't make sense, or are
>>> difficult to understand. We can add more documentation about our
>>> assumptions there, or if you have a way to do this that you
>>> think is simpler to reason about, then please let me know and I'll
>>> be more than happy to use that!
>>
>> I really try to spin my head around the idea of the is_abondoned flag
>> you are using. Unfortunatly for now I am out to debug the issues I see
>> with your series.
>>
>> So I did try these patches you send. Yes the deadlock error is gone with
>> v3. But the linked list is still running into cases where
>> dwc3_gadget_giveback(complete) is touching requests that are already
>> freed.
>>
>> [ 61.408715] ------------[ cut here ]------------
>> [ 61.413897] kernel BUG at lib/list_debug.c:56!
>> ...
>> [ 61.590762] Call trace:
>> [ 61.596890] __list_del_entry_valid+0xb8/0xe8
>> [ 61.603408] dwc3_gadget_giveback+0x3c/0x1b0
>> [ 61.607594] dwc3_remove_requests.part.0+0xcc/0x100
>> [ 61.612948] __dwc3_gadget_ep_disable+0xbc/0x1b8
>> [ 61.621019] dwc3_gadget_ep_disable+0x48/0x100
>> [ 61.627925] usb_ep_disable+0x3c/0x138
>> [ 61.638230] uvc_function_setup_continue+0x3c/0x60
>> [ 61.645040] uvc_v4l2_streamoff+0x5c/0x80
>> [ 61.659812] v4l_streamoff+0x40/0x60
>> [ 61.668950] __video_do_ioctl+0x344/0x420
>> [ 61.679548] video_usercopy+0x1d0/0x788
>> [ 61.685677] video_ioctl2+0x40/0x70
>> [ 61.697439] v4l2_ioctl+0x68/0xa0
>> [ 61.709200] __arm64_sys_ioctl+0x304/0xda0
>> [ 61.720768] invoke_syscall.constprop.0+0x70/0x130
>>
>
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists