[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c19219134cb23e7e0338e42890d1604d@dk-develop.de>
Date: Tue, 15 Aug 2017 14:51:19 +0200
From: Danilo Krummrich <danilokrummrich@...develop.de>
To: Felipe Balbi <balbi@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
gregkh@...uxfoundation.org, k.opasiak@...sung.com
Subject: Re: [PATCH] usb: gadget: udc: udc_stop before gadget unbind
On 2017-08-15 14:02, Felipe Balbi wrote:
> Hi,
>
> Danilo Krummrich <danilokrummrich@...develop.de> writes:
>> thanks for reviewing.
>
> np :-)
>
>> On 2017-08-15 12:03, Felipe Balbi wrote:
>>> Hi,
>>>
>>> Danilo Krummrich <danilokrummrich@...develop.de> writes:
>>>> udc_stop needs to be called before gadget driver unbind. Otherwise
>>>> it
>>>> might happen that udc drivers still call into the gadget driver
>>>> (e.g.
>>>> to reset gadget after OTG event). If this happens it is likely to
>>>> get
>>>> panics from gadget driver dereferencing NULL ptr, as gadget's
>>>> drvdata
>>>> is set to NULL on unbind.
>>>
>>> seems like the problem here is with the OTG layer, not UDC core.
>>>
>> I mentioned this just as example, it can happen whenever a UDC driver
>> calls
>> the gadget driver (e.g. by calling usb_gadget_udc_reset() in ISR)
>> after
>> gadget
>> drivers unbind() was called already (e.g. by gadget configfs).
>> If this happens gadget drivers drvdata was already set to NULL by
>> unbind()
>> and reset() could result into a NULL ptr exception.
>> Therefore my assumption was that it needs to be prevented that the
>> gadget
>> driver is getting called after unbind.
>
> We have a known problem in the design of the gadget API that causes
> this
> races but we couldn't come up with a solution yet :-)
>
> Inverting these two calls is not the correct way to go about this :-)
>
Now I see, thanks for explanation below.
>>>> Signed-off-by: Danilo Krummrich <danilokrummrich@...develop.de>
>>>> ---
>>>> Actually there could still be a race:
>>>> (CPU1 code taken from dwc3 drivers dwc3_disconnect_gadget() as
>>>> exsample)
>>>>
>>>> CPU0 CPU1
>>>> ---- ----
>>>> usb_gadget_disconnect(udc->gadget);
>>>> udc->driver->disconnect(udc->gadget);
>>>> if (dwc->gadget_driver && dwc->gadget_driver->disconnect)
>>>> usb_gadget_udc_stop(udc);
>>>> udc->driver->unbind(udc->gadget);
>>>> dwc->gadget_driver->disconnect(&dwc->gadget);
>>>>
>>>> UDC drivers typically set their gadget driver pointer to NULL in
>>>> udc_stop
>>>> and check for it before calling into the gadget driver. To fix the
>>>> issue
>>>> above every udc driver could apply a lock around this.
>>>>
>>>> If you see the need for having this or another solutions I can
>>>> provide
>>>> further patches. This patch could also just serve as a base for
>>>> discussion
>>>> if someone knows a smarter solution.
>>>>
>>>> I saw this problem causing a panic on hikey960 board and provided a
>>>> quick
>>>> workaround for the same problem here:
>>>> https://android-review.googlesource.com/#/c/kernel/common/+/457476/
>>>> (panic log in the commit message of the linked patch)
>>>> ---
>>>> drivers/usb/gadget/udc/core.c | 6 +++++-
>>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/usb/gadget/udc/core.c
>>>> b/drivers/usb/gadget/udc/core.c
>>>> index efce68e9a8e0..8155468afc0d 100644
>>>> --- a/drivers/usb/gadget/udc/core.c
>>>> +++ b/drivers/usb/gadget/udc/core.c
>>>> @@ -1234,8 +1234,12 @@ static void usb_gadget_remove_driver(struct
>>>> usb_udc *udc)
>>>>
>>>> usb_gadget_disconnect(udc->gadget);
>>>> udc->driver->disconnect(udc->gadget);
>>>> - udc->driver->unbind(udc->gadget);
>>>> + /* udc_stop needs to be called before gadget driver unbind to
>>>> prevent
>>>> + * udc driver calls into gadget driver after unbind which could
>>>> cause
>>>> + * a nullptr exception.
>>>> + */
>>>> usb_gadget_udc_stop(udc);
>>>> + udc->driver->unbind(udc->gadget);
>>>
>>> This patch is incorrect, it will prevent us from giving back requests
>>> to
>>> gadget driver properly. ->unbind() has to happen before ->udc_stop().
>>
>> Do you mean after udc_stop the udc driver can not call the gadget
>> driver
>> anymore? If not, I did not got your point, sorry for that. Can you
>> please
>> help me out? Would the changed order raise another issue I'm not aware
>> of?
>
> right, ->udc_stop() is supposed to completely teardown the USB
> controller, including disabling interrupts and all. The only thing it
> _can_ do from ->udc_stop() would be giving back any pending requests
> that were left (which would cause req->complete() to be called with an
> error status). But even that is unlikely in the case you mention since
> ->unbind() was already called.
>
Ok, got it. That's why req->context = cdev, to overcome being unbound
already. Thanks for clarification.
>> If I understood you correctly, without this patch udc driver can not
>> call
>> the gadget driver back as well, because this would result in a NULL
>> ptr
>> dereference, as unbind() sets drvdata to NULL.
>>
>> In any case the race described in my original message can still
>> happen,
>> regardless of the order of udc_stop and unbind. But with this patch
>> the
>> needed locking could easily done within the udc driver only. Without,
>> the
>> lock needs to be acquired before udc->driver->unbind(udc->gadget) and
>> released after usb_gadget_udc_stop(). Otherwise an ISR of the udc
>> driver
>> trying to call into the gadget driver could do this after gadget
>> driver
>> already unbound.
>
> right
Is someone working on this issue, already?
If not, I would like to offer introducing the needed locking to overcome
this issue.
If you are about to refactor the whole thing already to solve this (as
you state there's a design problem in the gadget API) would it make
sense
for you to have a workaround meanwhile (maybe not in mainline kernel,
but
somewhere else)? As e.g. on hikey960 board this causes panics very
often.
Thanks,
Danilo
Powered by blists - more mailing lists