lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c19219134cb23e7e0338e42890d1604d@dk-develop.de>
Date:   Tue, 15 Aug 2017 14:51:19 +0200
From:   Danilo Krummrich <danilokrummrich@...develop.de>
To:     Felipe Balbi <balbi@...nel.org>
Cc:     linux-kernel@...r.kernel.org, linux-usb@...r.kernel.org,
        gregkh@...uxfoundation.org, k.opasiak@...sung.com
Subject: Re: [PATCH] usb: gadget: udc: udc_stop before gadget unbind

On 2017-08-15 14:02, Felipe Balbi wrote:
> Hi,
> 
> Danilo Krummrich <danilokrummrich@...develop.de> writes:
>> thanks for reviewing.
> 
> np :-)
> 
>> On 2017-08-15 12:03, Felipe Balbi wrote:
>>> Hi,
>>> 
>>> Danilo Krummrich <danilokrummrich@...develop.de> writes:
>>>> udc_stop needs to be called before gadget driver unbind. Otherwise 
>>>> it
>>>> might happen that udc drivers still call into the gadget driver 
>>>> (e.g.
>>>> to reset gadget after OTG event). If this happens it is likely to 
>>>> get
>>>> panics from gadget driver dereferencing NULL ptr, as gadget's 
>>>> drvdata
>>>> is set to NULL on unbind.
>>> 
>>> seems like the problem here is with the OTG layer, not UDC core.
>>> 
>> I mentioned this just as example, it can happen whenever a UDC driver
>> calls
>> the gadget driver (e.g. by calling usb_gadget_udc_reset() in ISR) 
>> after
>> gadget
>> drivers unbind() was called already (e.g. by gadget configfs).
>> If this happens gadget drivers drvdata was already set to NULL by
>> unbind()
>> and reset() could result into a NULL ptr exception.
>> Therefore my assumption was that it needs to be prevented that the
>> gadget
>> driver is getting called after unbind.
> 
> We have a known problem in the design of the gadget API that causes 
> this
> races but we couldn't come up with a solution yet :-)
> 
> Inverting these two calls is not the correct way to go about this :-)
> 
Now I see, thanks for explanation below.

>>>> Signed-off-by: Danilo Krummrich <danilokrummrich@...develop.de>
>>>> ---
>>>> Actually there could still be a race:
>>>> (CPU1 code taken from dwc3 drivers dwc3_disconnect_gadget() as
>>>> exsample)
>>>> 
>>>> CPU0					CPU1
>>>> ----					----
>>>> usb_gadget_disconnect(udc->gadget);
>>>> udc->driver->disconnect(udc->gadget);
>>>> 					if (dwc->gadget_driver && dwc->gadget_driver->disconnect)
>>>> usb_gadget_udc_stop(udc);
>>>> udc->driver->unbind(udc->gadget);
>>>> 					dwc->gadget_driver->disconnect(&dwc->gadget);
>>>> 
>>>> UDC drivers typically set their gadget driver pointer to NULL in
>>>> udc_stop
>>>> and check for it before calling into the gadget driver. To fix the
>>>> issue
>>>> above every udc driver could apply a lock around this.
>>>> 
>>>> If you see the need for having this or another solutions I can 
>>>> provide
>>>> further patches. This patch could also just serve as a base for
>>>> discussion
>>>> if someone knows a smarter solution.
>>>> 
>>>> I saw this problem causing a panic on hikey960 board and provided a
>>>> quick
>>>> workaround for the same problem here:
>>>> https://android-review.googlesource.com/#/c/kernel/common/+/457476/
>>>> (panic log in the commit message of the linked patch)
>>>> ---
>>>>  drivers/usb/gadget/udc/core.c | 6 +++++-
>>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/drivers/usb/gadget/udc/core.c
>>>> b/drivers/usb/gadget/udc/core.c
>>>> index efce68e9a8e0..8155468afc0d 100644
>>>> --- a/drivers/usb/gadget/udc/core.c
>>>> +++ b/drivers/usb/gadget/udc/core.c
>>>> @@ -1234,8 +1234,12 @@ static void usb_gadget_remove_driver(struct
>>>> usb_udc *udc)
>>>> 
>>>>  	usb_gadget_disconnect(udc->gadget);
>>>>  	udc->driver->disconnect(udc->gadget);
>>>> -	udc->driver->unbind(udc->gadget);
>>>> +	/* udc_stop needs to be called before gadget driver unbind to
>>>> prevent
>>>> +	 * udc driver calls into gadget driver after unbind which could
>>>> cause
>>>> +	 * a nullptr exception.
>>>> +	 */
>>>>  	usb_gadget_udc_stop(udc);
>>>> +	udc->driver->unbind(udc->gadget);
>>> 
>>> This patch is incorrect, it will prevent us from giving back requests
>>> to
>>> gadget driver properly. ->unbind() has to happen before ->udc_stop().
>> 
>> Do you mean after udc_stop the udc driver can not call the gadget 
>> driver
>> anymore? If not, I did not got your point, sorry for that. Can you
>> please
>> help me out? Would the changed order raise another issue I'm not aware
>> of?
> 
> right, ->udc_stop() is supposed to completely teardown the USB
> controller, including disabling interrupts and all. The only thing it
> _can_ do from ->udc_stop() would be giving back any pending requests
> that were left (which would cause req->complete() to be called with an
> error status). But even that is unlikely in the case you mention since
> ->unbind() was already called.
> 
Ok, got it. That's why req->context = cdev, to overcome being unbound
already. Thanks for clarification.

>> If I understood you correctly, without this patch udc driver can not
>> call
>> the gadget driver back as well, because this would result in a NULL 
>> ptr
>> dereference, as unbind() sets drvdata to NULL.
>> 
>> In any case the race described in my original message can still 
>> happen,
>> regardless of the order of udc_stop and unbind. But with this patch 
>> the
>> needed locking could easily done within the udc driver only. Without,
>> the
>> lock needs to be acquired before udc->driver->unbind(udc->gadget) and
>> released after usb_gadget_udc_stop(). Otherwise an ISR of the udc 
>> driver
>> trying to call into the gadget driver could do this after gadget 
>> driver
>> already unbound.
> 
> right
Is someone working on this issue, already?
If not, I would like to offer introducing the needed locking to overcome
this issue.
If you are about to refactor the whole thing already to solve this (as
you state there's a design problem in the gadget API) would it make 
sense
for you to have a workaround meanwhile (maybe not in mainline kernel, 
but
somewhere else)? As e.g. on hikey960 board this causes panics very 
often.

Thanks,
Danilo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ