[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240508231950.ifyawl6bfy6bzvk7@synopsys.com>
Date: Wed, 8 May 2024 23:20:03 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Michael Grzeschik <mgr@...gutronix.de>
CC: Wesley Cheng <quic_wcheng@...cinc.com>,
Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"michael.riesch@...fvision.net" <michael.riesch@...fvision.net>,
"kernel@...gutronix.de" <kernel@...gutronix.de>,
"linux-usb@...r.kernel.org" <linux-usb@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] usb: dwc3: gadget: create per ep interrupts
On Wed, May 08, 2024, Michael Grzeschik wrote:
> On Tue, May 07, 2024 at 11:57:36AM -0700, Wesley Cheng wrote:
> > Hi Michael,
> >
> > On 5/6/2024 4:06 PM, Michael Grzeschik wrote:
> > > This patch is splitting up the interrupt event handling from one
> > > interrupt thread to separate per endpoint interrupt threads.
> > >
> >
> > I assume that the incentive from doing this is to improve overall
> > throughput numbers. Would you be able to share some data on the
> > benefits of moving to per EP event management?
>
> The main benefit is to make it possible to use high demanding usb
> endpoints simultaneously. In our special case we saw that streaming
> via uac and streaming via uvc was producing noise in the audio
> stream. This was due to the fact, that the isoc feedback endpoint
> that would adjust the samplerate was not being called fast enough
> when there was heavy a lot of traffic in the uvc endpoint context.
>
> By moving the endpoints into their own thread handlers the short
> feedback requests are at least able to be scheduled in between the bursts
> of the uvc packages. The next step is to have all threads running on
> different cpu cores, without interfering each other. However, as we
> still have not matrix irq allocator for arm, there still is no direct
> benefit from that yet.
>
>
> > > To achieve this we create a new dwc3 interrupt domain in which
> > > we map all claimed interrupts to individual interrupt threads.
> > >
> > > Although the gadget layer is preparing the claimed parameter
> > > of each usb_ep which could be checked if the endpoint is
> > > to used or not, the claimed value was 0 for each ep in gadget_start.
> > > This was tested when describing some composite gadget using configfs.
> > >
> >
> > yeah... the claimed flag is cleared by the USB gadget, ie USB configfs
> > (not sure if you're using this) whenever it adds a USB config. This is
> > to handle multi config situations, so subsequent USB configs can be
> > assigned (resuse) endpoints, since only one config is active at a time
> > for a USB device.
> >
> > This was a struggle for me as well when adding the TXFIFO resizing
> > logic. We won't actually know which EPs are going to be used until the
> > host issues the set configuration packet to select a config, and the
> > set_alt() callback issues usb_ep_enable(). So the implementation
> > (TXFIFO resizing) is currently based on the maximum potential endpoints
> > used by any USB configuration.
> >
> > Not sure if having 31 (potentially) different IRQ entries would be ok,
> > but maybe it would be simpler to just to request IRQ for dwc->num_eps
> > always?
> >
> > Have you tried this on a multi config device?
>
> No, I didn't. I doubt that this will work after your explanation. So
> thanks for the insides!
>
> I tried putting the request_threaded_irq into the ep_enable function
> but this does not work as I see a lot of schedule while atomic
> errors. This is possible as ep_enable is called from an set alt
> coming from ep0 interrupt thread context.
>
> So there is probably now no other option left to have exact endpoint
> interrupt threads. I will rework this back to request a kthread for each
> endpoint even as we will probably would not be using them.
>
Do you have any data on latency here? I don't see how introducing more
soft interrupts would improve on latency, if anything, it should be
worse? This is making the driver way more complicated and potentially
introduce many bugs. I may be wrong here, but I suspect that by
multiplying the interrupt handlings, you _may_ see improvement due to
the a higher chance being selected by the scheduler. However, the
overall latency will probably be worse. (correct me if I'm wrong). This
will affect other applications. Let's not do this.
BR,
Thinh
Powered by blists - more mailing lists