[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d309a6f-39b2-43da-96a6-b7c59b98431e@samsung.com>
Date: Thu, 4 Dec 2025 18:45:49 +0530
From: Selvarasu Ganesan <selvarasu.g@...sung.com>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
Cc: Alan Stern <stern@...land.harvard.edu>, "gregkh@...uxfoundation.org"
<gregkh@...uxfoundation.org>, "linux-usb@...r.kernel.org"
<linux-usb@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "jh0801.jung@...sung.com"
<jh0801.jung@...sung.com>, "dh10.jung@...sung.com" <dh10.jung@...sung.com>,
"naushad@...sung.com" <naushad@...sung.com>, "akash.m5@...sung.com"
<akash.m5@...sung.com>, "h10.kim@...sung.com" <h10.kim@...sung.com>,
"eomji.oh@...sung.com" <eomji.oh@...sung.com>, "alim.akhtar@...sung.com"
<alim.akhtar@...sung.com>, "thiagu.r@...sung.com" <thiagu.r@...sung.com>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH v2] usb: dwc3: gadget: Prevent EPs resource conflict
during StartTransfer
On 12/4/2025 7:21 AM, Thinh Nguyen wrote:
> On Wed, Dec 03, 2025, Selvarasu Ganesan wrote:
>> On 11/21/2025 8:38 AM, Alan Stern wrote:
>>> On Fri, Nov 21, 2025 at 02:22:02AM +0000, Thinh Nguyen wrote:
>>>> On Wed, Nov 19, 2025, Alan Stern wrote:
>>>>> ->set_alt() is called by the composite core when a Set-Interface or
>>>>> Set-Config control request arrives from the host. It happens within the
>>>>> composite_setup() handler, which is called by the UDC driver when a
>>>>> control request arrives, which means it happens in the context of the
>>>>> UDC driver's interrupt handler. Therefore ->set_alt() callbacks must
>>>>> not sleep.
>>>> This should be changed. I don't think we can expect set_alt() to
>>>> be in interrupt context only.
>>> Agreed.
>>>
>>>>> To do this right, I can't think of any approach other than to make the
>>>>> composite core use a work queue or other kernel thread for handling
>>>>> Set-Interface and Set-Config calls.
>>>> Sounds like it should've been like this initially.
>>> I guess the nobody thought through the issues very carefully at the time
>>> the composite framework was designed. Maybe the UDCs that existed back
>>> did not require a lot of time to flush endpoints; I can't remember.
>>>
>>>>> Without that ability, we will have to audit every function driver to
>>>>> make sure the ->set_alt() callbacks do ensure that endpoints are flushed
>>>>> before they are re-enabled.
>>>>>
>>>>> There does not seem to be any way to fix the problem just by changing
>>>>> the gadget core.
>>>>>
>>>> We can have a workaround in dwc3 that can temporarily "work" with what
>>>> we have. However, eventually, we will need to properly rework this and
>>>> audit the gadget drivers.
>>> Clearly, the first step is to change the composite core. That can be
>>> done without messing up anything else. But yes, eventually the gadget
>>> drivers will have to be audited.
>>>
>>> Alan Stern
>>
>> Hi Thinh,
>>
>> Do you have any suggestions that might be helpful for us to try on our side?
>> This EP resource‑conflict problem becomes easily observable when the
>> RNDIS network test executing ifconfig rndis0 down/up is run repeatedly
>> on the device side.
>>
>> Thanks,
>> Selva
> At the moment, I can't think of a way to workaround for all cases. Let's
> just leave bulk streams alone for now. Until we have proper fixes to the
> gadget framework, let's just try the below.
>
> Thanks,
> Thinh
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 3830aa2c10a9..974573304441 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -960,11 +960,18 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
> }
>
> /*
> - * Issue StartTransfer here with no-op TRB so we can always rely on No
> - * Response Update Transfer command.
> + * For streams, at start, there maybe a race where the
> + * host primes the endpoint before the function driver
> + * queues a request to initiate a stream. In that case,
> + * the controller will not see the prime to generate the
> + * ERDY and start stream. To workaround this, issue a
> + * no-op TRB as normal, but end it immediately. As a
> + * result, when the function driver queues the request,
> + * the next START_TRANSFER command will cause the
> + * controller to generate an ERDY to initiate the
> + * stream.
> */
> - if (usb_endpoint_xfer_bulk(desc) ||
> - usb_endpoint_xfer_int(desc)) {
> + if (dep->stream_capable) {
> struct dwc3_gadget_ep_cmd_params params;
> struct dwc3_trb *trb;
> dma_addr_t trb_dma;
> @@ -983,35 +990,21 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
> if (ret < 0)
> return ret;
>
> - if (dep->stream_capable) {
> - /*
> - * For streams, at start, there maybe a race where the
> - * host primes the endpoint before the function driver
> - * queues a request to initiate a stream. In that case,
> - * the controller will not see the prime to generate the
> - * ERDY and start stream. To workaround this, issue a
> - * no-op TRB as normal, but end it immediately. As a
> - * result, when the function driver queues the request,
> - * the next START_TRANSFER command will cause the
> - * controller to generate an ERDY to initiate the
> - * stream.
> - */
> - dwc3_stop_active_transfer(dep, true, true);
> + dwc3_stop_active_transfer(dep, true, true);
>
> - /*
> - * All stream eps will reinitiate stream on NoStream
> - * rejection.
> - *
> - * However, if the controller is capable of
> - * TXF_FLUSH_BYPASS, then IN direction endpoints will
> - * automatically restart the stream without the driver
> - * initiation.
> - */
> - if (!dep->direction ||
> - !(dwc->hwparams.hwparams9 &
> - DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS))
> - dep->flags |= DWC3_EP_FORCE_RESTART_STREAM;
> - }
> + /*
> + * All stream eps will reinitiate stream on NoStream
> + * rejection.
> + *
> + * However, if the controller is capable of
> + * TXF_FLUSH_BYPASS, then IN direction endpoints will
> + * automatically restart the stream without the driver
> + * initiation.
> + */
> + if (!dep->direction ||
> + !(dwc->hwparams.hwparams9 &
> + DWC3_GHWPARAMS9_DEV_TXF_FLUSH_BYPASS))
> + dep->flags |= DWC3_EP_FORCE_RESTART_STREAM;
> }
>
> out:
Hi Thinh,
Thanks for the changes. We understand the given fix and have verified
that the original issue is resolved, but a similar below warning appears
again in `dwc3_gadget_ep_queue` when we run a long duration our test.
And we confirmed this is not due to this new given changes.
This warning is caused by a race between `dwc3_gadget_ep_disable` and
`dwc3_gadget_ep_queue` that manipulates `dep->flags`.
Please refer the below sequence for the reference.
The warning originates from a race condition between
dwc3_gadget_ep_disable and dwc3_send_gadget_ep_cmd from
dwc3_gadget_ep_queue that both manipulate dep->flags. Proper
synchronization or a check is needed when masking (dep->flags &= mask)
inside dwc3_gadget_ep_disable.
Thread#1:
usb_ep_queue
->dwc3_gadget_ep_queue
->__dwc3_gadget_kick_transfer
-> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED);
->if(starting)
->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER)
->dep->flags |= DWC3_EP_TRANSFER_STARTED;
Thread#2:
dwc3_gadget_ep_disable
->__dwc3_gadget_ep_disable
->dwc3_remove_requests
->dwc3_stop_active_transfer
->__dwc3_stop_active_transfer
-> dwc3_send_gadget_ep_cmd (cmd =DWC3_DEPCMD_ENDTRANSFER)
->if(!interrupt)dep->flags &= ~DWC3_EP_TRANSFER_STARTED;
...
While Thread#2 is still running, Thread #1 starts again:
Thread#1:
usb_ep_queue
->dwc3_gadget_ep_queue
->__dwc3_gadget_kick_transfer
-> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED);
->if(starting)
->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER)
->dep->flags |= DWC3_EP_TRANSFER_STARTED;
...
###### Thread#2: Continuation (race)
->__dwc3_gadget_ep_disable
->mask = DWC3_EP_TXFIFO_RESIZED | DWC3_EP_RESOURCE_ALLOCATED;
->dep->flags &= mask; --> // Possible of clears
DWC3_EP_TRANSFER_STARTED flag as well without
sending DWC3_DEPCMD_ENDTRANSFER
Thread#1:(Failure due to race)
usb_ep_queue
->dwc3_gadget_ep_queue
->__dwc3_gadget_kick_transfer
-> starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED);
->if(starting)
->dwc3_send_gadget_ep_cmd (cmd = DWC3_DEPCMD_STARTTRANSFER) ->
Results in “No resource” allocation error because the
previous transfer was never end with ENDTRANSFER.
------------[ cut here ]------------
dwc3 13200000.dwc3: No resource for ep1in
WARNING: CPU: 7 PID: 1748 at drivers/usb/dwc3/gadget.c:398
dwc3_send_gadget_ep_cmd+0x2f8/0x76c
pc : dwc3_send_gadget_ep_cmd+0x2f8/0x76c
lr : dwc3_send_gadget_ep_cmd+0x2f8/0x76c
Call trace:
dwc3_send_gadget_ep_cmd+0x2f8/0x76c
__dwc3_gadget_kick_transfer+0x2ec/0x3f4
dwc3_gadget_ep_queue+0x140/0x1f0
usb_ep_queue+0x60/0xec
mp_tx_task+0x100/0x134
mp_tx_timeout+0xd0/0x1e0
__hrtimer_run_queues+0x130/0x318
hrtimer_interrupt+0xe8/0x340
exynos_mct_comp_isr+0x58/0x80
__handle_irq_event_percpu+0xcc/0x25c
handle_irq_event+0x40/0x9c
handle_fasteoi_irq+0x154/0x2c8
generic_handle_domain_irq+0x58/0x80
gic_handle_irq+0x48/0x104
call_on_irq_stack+0x3c/0x50
do_interrupt_handler+0x4c/0x84
el1_interrupt+0x34/0x58
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0xc/0x4c
binder_wakeup_thread_ilocked+0x50/0xb4
binder_proc_transaction+0x308/0x6ec
binder_transaction+0x1aec/0x23b0
binder_ioctl_write_read+0xa28/0x2534
binder_ioctl+0x1fc/0xb3c
__arm64_sys_ioctl+0xa8/0xe4
invoke_syscall+0x58/0x10c
el0_svc_common+0x80/0xdc
do_el0_svc+0x1c/0x28
el0_svc+0x38/0x88
el0t_64_sync_handler+0x70/0xbc
el0t_64_sync+0x1a8/0x1ac
---[ end trace 0000000000000000 ]---
Thanks,
Selva
Powered by blists - more mailing lists