lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 22 May 2023 00:48:39 -0700
From:   Badhri Jagan Sridharan <badhri@...gle.com>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     gregkh@...uxfoundation.org, colin.i.king@...il.com,
        xuetao09@...wei.com, quic_eserrao@...cinc.com,
        water.zhangjiantao@...wei.com, peter.chen@...escale.com,
        balbi@...com, francesco@...cini.it, alistair@...stair23.me,
        stephan@...hold.net, bagasdotme@...il.com, luca@...tu.xyz,
        linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org,
        Francesco Dolcini <francesco.dolcini@...adex.com>
Subject: Re: [PATCH v2] usb: gadget: udc: core: Offload usb_udc_vbus_handler processing

Hi Alan,

Thanks for taking the time out to share more details !
+1 on your comment: " A big problem with the USB gadget
framework is that it does not clearly state which routines have to run
in process context and which have to run in interrupt/atomic context."


I started to work on allow_connect and other suggestions that you had made.
In one of the previous comments you had mentioned that the
connect_lock should be a spinlock and not a mutex.
Right now there are four conditions that seem to be deciding whether
pullup needs to be enabled or disabled through gadget->ops->pullup().
1. Gadget not deactivated through usb_gadget_deactivate()
2. Gadget has to be started through usb_gadget_udc_start().
soft_connect_store() can start/stop gadget.
3. usb_gadget has been connected through usb_gadget_connect(). This is
assuming we are getting rid of usb_udc_vbus_handler.
4. allow_connect is true

I have so far identified two constraints here:
a. gadget->ops->pullup() can sleep in some implementations.
For instance:
BUG: scheduling while atomic: init/1/0x00000002
..
[   26.990631][    T1] Call trace:
[   26.993759][    T1]  dump_backtrace+0x104/0x128
[   26.998281][    T1]  show_stack+0x20/0x30
[   27.002279][    T1]  dump_stack_lvl+0x6c/0x9c
[   27.006627][    T1]  __schedule_bug+0x84/0xb4
[   27.010973][    T1]  __schedule+0x6f0/0xaec
[   27.015147][    T1]  schedule+0xc8/0x134
[   27.019059][    T1]  schedule_timeout+0x98/0x134
[   27.023666][    T1]  msleep+0x34/0x4c
[   27.027317][    T1]  dwc3_core_soft_reset+0xf0/0x354
[   27.032273][    T1]  dwc3_gadget_pullup+0xec/0x1d8
[   27.037055][    T1]  usb_gadget_pullup_update_locked+0xa0/0x1e0
[   27.042967][    T1]  udc_bind_to_driver+0x1e4/0x30c
[   27.047835][    T1]  usb_gadget_probe_driver+0xd0/0x178
[   27.053051][    T1]  gadget_dev_desc_UDC_store+0xf0/0x13c
[   27.058442][    T1]  configfs_write_iter+0x100/0x178
[   27.063399][    T1]  vfs_write+0x278/0x3c4
[   27.067483][    T1]  ksys_write+0x80/0xf4

b. gadget->ops->udc_start can also sleep in some implementations.
For example:
[   28.024255][    T1] BUG: scheduling while atomic: init/1/0x00000002
....
[   28.324996][    T1] Call trace:
[   28.328126][    T1]  dump_backtrace+0x104/0x128
[   28.332647][    T1]  show_stack+0x20/0x30
[   28.336645][    T1]  dump_stack_lvl+0x6c/0x9c
[   28.340993][    T1]  __schedule_bug+0x84/0xb4
[   28.345340][    T1]  __schedule+0x6f0/0xaec
[   28.349513][    T1]  schedule+0xc8/0x134
[   28.353425][    T1]  schedule_timeout+0x4c/0x134
[   28.358033][    T1]  wait_for_common+0xac/0x13c
[   28.362554][    T1]  wait_for_completion_killable+0x20/0x3c
[   28.368118][    T1]  __kthread_create_on_node+0xe4/0x1ec
[   28.373422][    T1]  kthread_create_on_node+0x54/0x80
[   28.378464][    T1]  setup_irq_thread+0x50/0x108
[   28.383072][    T1]  __setup_irq+0x90/0x87c
[   28.387245][    T1]  request_threaded_irq+0x144/0x180
[   28.392287][    T1]  dwc3_gadget_start+0x50/0xac
[   28.396866][    T1]  udc_bind_to_driver+0x14c/0x31c
[   28.401763][    T1]  usb_gadget_probe_driver+0xd0/0x178
[   28.406980][    T1]  gadget_dev_desc_UDC_store+0xf0/0x13c
[   28.412370][    T1]  configfs_write_iter+0x100/0x178
[   28.417325][    T1]  vfs_write+0x278/0x3c4
[   28.421411][    T1]  ksys_write+0x80/0xf4

static int dwc3_gadget_start(struct usb_gadget *g,
                struct usb_gadget_driver *driver)
{
        struct dwc3             *dwc = gadget_to_dwc(g);
...
        irq = dwc->irq_gadget;
        ret = request_threaded_irq(irq, dwc3_interrupt, dwc3_thread_interrupt,
                        IRQF_SHARED, "dwc3", dwc->ev_buf);

Given that "1016fc0c096c USB: gadget: Fix obscure lockdep violation
for udc_mutex" has been there for a while and no one has reported
issues so far, perhaps ->disconnect() callback is no longer being
invoked in atomic context and the documentation is what that needs to
be updated ?

Thanks,
Badhri

On Fri, May 19, 2023 at 10:27 AM Alan Stern <stern@...land.harvard.edu> wrote:
>
> On Fri, May 19, 2023 at 08:44:57AM -0700, Badhri Jagan Sridharan wrote:
> > On Fri, May 19, 2023 at 8:07 AM Alan Stern <stern@...land.harvard.edu> wrote:
> > >
> > > On Fri, May 19, 2023 at 10:49:49AM -0400, Alan Stern wrote:
> > > > On Fri, May 19, 2023 at 04:30:41AM +0000, Badhri Jagan Sridharan wrote:
> > > > > chipidea udc calls usb_udc_vbus_handler from udc_start gadget
> > > > > ops causing a deadlock. Avoid this by offloading usb_udc_vbus_handler
> > > > > processing.
> > > >
> > > > Look, this is way overkill.
> > > >
> > > > usb_udc_vbus_handler() has only two jobs to do: set udc->vbus and call
> > > > usb_udc_connect_control().  Furthermore, it gets called from only two
> > > > drivers: chipidea and max3420.
> > > >
> > > > Why not have the callers set udc->vbus themselves and then call
> > > > usb_gadget_{dis}connect() directly?  Then we could eliminate
> > > > usb_udc_vbus_handler() entirely.  And the unnecessary calls -- the ones
> > > > causing deadlocks -- from within udc_start() and udc_stop() handlers can
> > > > be removed with no further consequence.
> > > >
> > > > This approach simplifies and removes code.  Whereas your approach
> > > > complicates and adds code for no good reason.
> > >
> > > I changed my mind.
> > >
> > > After looking more closely, I found the comment in gadget.h about
> > > ->disconnect() callbacks happening in interrupt context.  This means we
> > > cannot use a mutex to protect the associated state, and therefore the
> > > connect_lock _must_ be a spinlock, not a mutex.
> >
> > Quick observation so that I don't misunderstand.
> > I already see gadget->udc->driver->disconnect(gadget) being called with
> > udc_lock being held.
> >
> >                mutex_lock(&udc_lock);
> >                if (gadget->udc->driver)
> >                        gadget->udc->driver->disconnect(gadget);
> >                mutex_unlock(&udc_lock);
> >
> > The below patch seems to have introduced it:
> > 1016fc0c096c USB: gadget: Fix obscure lockdep violation for udc_mutex
>
> Hmmm...  You're right about this.  A big problem with the USB gadget
> framework is that it does not clearly state which routines have to run
> in process context and which have to run in interrupt/atomic context.
> People therefore don't think about it and frequently get it wrong.
>
> So now the problem is that the UDC or transceiver driver may detect
> (typically in an interrupt handler) that VBUS power has appeared or
> disappeared, and it wants to tell the core to adjust the D+/D- pullup
> signals appropriately.  The core notifies the UDC driver about this, and
> then in the case of a disconnection, it has to notify the gadget driver.
> But notifying the gadget driver requires process context for the
> udc_lock mutex, the ultimate reason being that disconnect notifications
> can race with gadget driver binding and unbinding.
>
> If we could prevent those races in some other way then we wouldn't need
> to hold udc_lock in usb_gadget_disconnect().  This seems like a sensible
> thing to do in any case; the UDC core should never allow a connection to
> occur before a gadget driver is bound or after it is unbound.
>
> The first approach that occurs to me is to add a boolean allow_connect
> flag to struct usb_udc, together with a global spinlock to synchronize
> access to it.  Then usb_gadget_disconnect() could check the flag before
> calling driver->disconnect(), gadget_bind_driver() could set the flag
> before calling usb_udc_connect_control(), and gadget_unbind_driver()
> could clear the flag before calling usb_gadget_disconnect().
>
> (Another possible approach would be to change gadget->deactivated into a
> counter.  It would still need to be synchronized by a spinlock,
> however.)
>
> This will simplify matters considerably.  udc_lock can remain a mutex
> and the deadlock problem should go away.
>
> Do you want to try adding allow_connect as described here or would you
> prefer that I do it?
>
> (And in any case, we should prevent the udc_start and udc_stop callbacks
> in the chipidea and max3420 drivers from trying to update the connection
> status.)
>
> Alan Stern

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ