[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9ab95a1-f901-6bfe-899b-e4577d14cb52@codeaurora.org>
Date: Mon, 14 Jun 2021 13:07:11 +0530
From: Sandeep Maheswaram <sanm@...eaurora.org>
To: Jack Pham <jackp@...eaurora.org>, Felipe Balbi <balbi@...nel.org>
Cc: Alexandru Elisei <alexandru.elisei@....com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
p.zabel@...gutronix.de, linux-usb@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
arm-mail-list <linux-arm-kernel@...ts.infradead.org>,
linux-arm-msm@...r.kernel.org
Subject: Re: [BUG] usb: dwc3: Kernel NULL pointer dereference in dwc3_remove()
On 6/10/2021 9:03 PM, Jack Pham wrote:
> On Thu, Jun 10, 2021 at 01:11:42PM +0300, Felipe Balbi wrote:
>> Jack Pham <jackp@...eaurora.org> writes:
>>> On Fri, Jun 04, 2021 at 11:20:12AM +0300, Felipe Balbi wrote:
>>>> Jack Pham <jackp@...eaurora.org> writes:
>>>>>>>>>> Alexandru Elisei <alexandru.elisei@....com> writes:
>>>>>>>>>>> I've been able to bisect the panic and the offending commit is 568262bf5492 ("usb:
>>>>>>>>>>> dwc3: core: Add shutdown callback for dwc3"). I can provide more diagnostic
>>>>>>>>>>> information if needed and I can help test the fix.
>>>>>>>>>> if you simply revert that commit in HEAD, does the problem really go
>>>>>>>>>> away?
>>>>>>>>> Kernel built from commit 324c92e5e0ee, which is the kernel tip today, the panic is
>>>>>>>>> there. Reverting the offending commit, 568262bf5492, makes the panic disappear.
>>>>>>>> Want to send a revert so I can take it now?
>>>>>>> I can send a revert, but Felipe was asking Sandeep (the commit author) for a fix,
>>>>>>> so I'll leave it up to Felipe to decide how to proceed.
>>>>>> I'm okay with a revert. Feel free to add my Acked-by: Felipe Balbi
>>>>>> <balbi@...nel.org> or it.
>>>>>>
>>>>>> Sandeep, please send a new version that doesn't encounter the same
>>>>>> issue. Make sure to test by reloading the driver in a tight loop for
>>>>>> several iterations.
>>>>> This would probably be tricky to test on other "glue" drivers as the
>>>>> problem appears to be specific only to dwc3_of_simple. It looks like
>>>>> both dwc3_of_simple and the dwc3 core now (due to 568262bf5492) each
>>>>> implement respective .shutdown callbacks. The latter is simply a wrapper
>>>>> around dwc3_remove(). And from the panic call stack above we see that
>>>>> dwc3_of_simple_shutdown() calls of_platform_depopulate() which will
>>>>> again call dwc3_remove() resulting in the double remove.
>>>>>
>>>>> So would an alternative approach be to protect against dwc3_remove()
>>>>> getting called multiple times? IMO it'd be a bit messy to have to add
>>>> no, I don't think so. That sounds like a workaround. We should be able
>>>> to guarantee that ->remove() doesn't get called twice using the driver
>>>> model properly.
>>> Completely fair. So then having a .shutdown callback that directly calls
>>> dwc3_remove() is probably not the right thing to do as it completely
>>> bypasses the driver model so if and when the driver core does later
>>> release the device from the driver that's how we end up with the double
>>> remove.
>> yeah, I would agree with that.
>>
>>>>> additional checks there to know if it had already been called. So maybe
>>>>> avoid it altogether--should dwc3_of_simple_shutdown() just skip calling
>>>>> of_platform_depopulate()?
>>>> I don't know what the idiomatic is nowadays, but at least early on, we
>>>> had to call depopulate.
>>> So any suggestions on how to fix the original issue Sandeep was trying
>>> to fix with 568262bf5492? Maybe implement .shutdown in dwc3_qcom and have
>>> it follow what dwc3_of_simple does with of_platform_depopulate()? But
>>> then wouldn't other "glues" want/need to follow suit?
>> I think we can implement shutdown in core, but we need to careful with
>> it. Instead of just blindly calling remove, let's extract the common
>> parts to another internal function that both remove and shutdown
>> call. debugfs removal should not be part of that generic method :-)
> Hi Sandeep,
>
> Upon re-reading your description in 568262bf5492 it sounds like the
> original intention of your patch is basically to quiesce the HW so that
> it doesn't continue to run after SMMU/IOMMU is disabled right?
>
> If that is the case, couldn't we simply call only dwc3_core_exit_mode()
> assuming there is no other requirement to do any other cleanup/teardown
> (PHYs, clocks, resets, runtime PM, et al)? This function should do the
> bare minimum of stopping the controller in whatever mode (host or
> peripheral) it is currently operating in.
Yes that was the intention. I will call only dwc3_core_exit_mode()
and check. Is there any way we can do from dwc3 qcom glue driver to
avoid problems for other glue drivers?
>> Anything in that generic method should, probably, be idempotent.
> Yes we'll need to ensure that dwc3_core_exit_mode() can be called
> multiple times without additional side effects. At first glance this
> probably means setting dwc->xhci and dwc->gadget to NULL from
> dwc3_host_exit() and dwc3_gadget_exit(), respectively.
Ok. Is there any way to test this ?
>
> Thanks,
> Jack
Powered by blists - more mailing lists