lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <n6grskuq722vnogwp5obiwzv4pxs5bbqddadesffezhvba5cjh@d6shcrvpxujg>
Date: Fri, 23 Aug 2024 12:48:50 +0300
From: Serge Semin <fancer.lancer@...il.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, 
	Jiri Slaby <jirislaby@...nel.org>, Andy Shevchenko <andy@...nel.org>, 
	Viresh Kumar <vireshk@...nel.org>, Andy Shevchenko <andriy.shevchenko@...ux.intel.com>, 
	Vinod Koul <vkoul@...nel.org>
Cc: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>, 
	dmaengine@...r.kernel.org, linux-serial@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] dmaengine: dw: Prevent tx-status calling desc
 callback (Fix UART deadlock!)

Hi folks

Any comments or suggestion about the change? The kernel occasionally
_deadlocks_ without it for the DW UART + DW DMAC hardware setup.

-Serge(y)

On Fri, Aug 02, 2024 at 11:07:51AM +0300, Serge Semin wrote:
> The dmaengine_tx_status() method updating the DMA-descriptors state and
> eventually calling the Tx-descriptors callback may potentially cause
> problems. In particular the deadlock was discovered in DW UART 8250 device
> interacting with DW DMA controller channels. The call-trace causing the
> deadlock is:
> 
> serial8250_handle_irq()
>   uart_port_lock_irqsave(port); ----------------------+
>   handle_rx_dma()                                     |
>     serial8250_rx_dma_flush()                         |
>       __dma_rx_complete()                             |
>         dmaengine_tx_status()                         |
>           dwc_scan_descriptors()                      |
>             dwc_complete_all()                        |
>               dwc_descriptor_complete()               |
>                 dmaengine_desc_callback_invoke()      |
>                   cb->callback(cb->callback_param);   |
>                   ||                                  |
>                   dma_rx_complete();                  |
>                     uart_port_lock_irqsave(port); ----+ <- Deadlock!
> 
> So in case if the DMA-engine finished working at some point before the
> serial8250_rx_dma_flush() invocation and the respective tasklet hasn't
> been executed yet, then calling the dmaengine_tx_status() will cause the
> DMA-descriptors status update and the Tx-descriptor callback invocation.
> Generalizing the case up: if the dmaengine_tx_status() method callee and
> the Tx-descriptor callback refer to the related critical section, then
> calling dmaengine_tx_status() from the Tx-descriptor callback will
> inevitably cause a deadlock around the guarding lock as it happens in the
> Serial 8250 DMA implementation above. (Note the deadlock doesn't happen
> very often, but can be eventually discovered if the being received data
> size is greater than the Rx DMA-buffer size defined in the 8250_dma.c
> driver. In my case reducing the Rx DMA-buffer size increased the deadlock
> probability.)
> 
> The easiest way to fix the deadlock was to just remove the Tx-descriptors
> state update from the DW DMA-engine Tx-descriptor status method
> implementation, as the most of the DMA-engine drivers imply. After this
> fix is applied the Tx-descriptors status will be only updated in the
> framework of the dwc_scan_descriptors() method called from the tasklet
> handling the deferred DMA-controller IRQ.
> 
> Signed-off-by: Serge Semin <fancer.lancer@...il.com
> 
> ---
> 
> Note I have doubts whether it's the best possible solution of the problem
> since the client-driver deadlock is resolved here by fixing the DMA-engine
> provider code. But I failed to find any reasonable solution in the 8250
> DMA implementation.
> 
> Moreover the suggested fix cause a weird outcome - under the high-speed
> and heavy serial transfers the next error is printed to the log sometimes:
> 
> > dma dma0chan0: BUG: XFER bit set, but channel not idle!
> 
> That's why the patch submitted as RFC. Do you have any better idea in mind
> to prevent the nested lock?
> 
> Cc: Viresh Kumar <vireshk@...nel.org>
> Cc: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
> CC: Andy Shevchenko <andy@...nel.org>
> Cc: Vinod Koul <vkoul@...nel.org>
> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Cc: Jiri Slaby <jirislaby@...nel.org>
> Cc: "Ilpo Järvinen" <ilpo.jarvinen@...ux.intel.com>
> Cc: dmaengine@...r.kernel.org
> Cc: linux-serial@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> ---
>  drivers/dma/dw/core.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
> index 5f7d690e3dba..4b3402156eae 100644
> --- a/drivers/dma/dw/core.c
> +++ b/drivers/dma/dw/core.c
> @@ -925,12 +925,6 @@ dwc_tx_status(struct dma_chan *chan,
>  	struct dw_dma_chan	*dwc = to_dw_dma_chan(chan);
>  	enum dma_status		ret;
>  
> -	ret = dma_cookie_status(chan, cookie, txstate);
> -	if (ret == DMA_COMPLETE)
> -		return ret;
> -
> -	dwc_scan_descriptors(to_dw_dma(chan->device), dwc);
> -
>  	ret = dma_cookie_status(chan, cookie, txstate);
>  	if (ret == DMA_COMPLETE)
>  		return ret;
> -- 
> 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ