[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2024091200-clubhouse-royal-44f3@gregkh>
Date: Thu, 12 Sep 2024 07:27:09 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Serge Semin <fancer.lancer@...il.com>
Cc: Viresh Kumar <vireshk@...nel.org>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Andy Shevchenko <andy@...nel.org>, Vinod Koul <vkoul@...nel.org>,
Maciej Sosnowski <maciej.sosnowski@...el.com>,
Haavard Skinnemoen <haavard.skinnemoen@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
Jiri Slaby <jirislaby@...nel.org>, dmaengine@...r.kernel.org,
linux-serial@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] dmaengine: dw: Fix XFER bit set, but channel not
idle error
On Wed, Sep 11, 2024 at 09:46:10PM +0300, Serge Semin wrote:
> If a client driver gets to use the DW DMAC engine device tougher
> than usual, with occasional DMA-transfers termination and restart, then
> the next error can be randomly spotted in the system log:
>
> > dma dma0chan0: BUG: XFER bit set, but channel not idle!
>
> For instance that happens in case of the 8250 UART port driver handling
> the looped back high-speed traffic (in my case > 1.5Mbaud) by means of the
> DMA-engine interface.
>
> The error happens due to the two-staged nature of the DW DMAC IRQs
> handling procedure and due to the critical section break in the meantime.
> In particular in case if the DMA-transfer is terminated and restarted:
> 1. after the IRQ-handler submitted the tasklet but before the tasklet
> started handling the DMA-descriptors in dwc_scan_descriptors();
> 2. after the XFER completion flag was detected in the
> dwc_scan_descriptors() method, but before the dwc_complete_all() method
> is called
> the error denoted above is printed due to the overlap of the last transfer
> completion and the new transfer execution stages.
>
> There are two places need to be altered in order to fix the problem.
> 1. Clear the IRQs in the dwc_chan_disable() method. That will prevent the
> dwc_scan_descriptors() method call in case if the DMA-transfer is
> restarted in the middle of the two-staged IRQs-handling procedure.
> 2. Move the dwc_complete_all() code to being executed inseparably (in the
> same atomic section) from the DMA-descriptors scanning procedure. That
> will prevent the DMA-transfer restarts after the DMA-transfer completion
> was spotted but before the actual completion is executed.
>
> Fixes: 69cea5a00d31 ("dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback")
> Fixes: 3bfb1d20b547 ("dmaengine: Driver for the Synopsys DesignWare DMA controller")
> Signed-off-by: Serge Semin <fancer.lancer@...il.com>
> ---
> drivers/dma/dw/core.c | 54 ++++++++++++++++++++-----------------------
> 1 file changed, 25 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
> index af1871646eb9..fbc46cbfe259 100644
> --- a/drivers/dma/dw/core.c
> +++ b/drivers/dma/dw/core.c
> @@ -143,6 +143,12 @@ static inline void dwc_chan_disable(struct dw_dma *dw, struct dw_dma_chan *dwc)
> channel_clear_bit(dw, CH_EN, dwc->mask);
> while (dma_readl(dw, CH_EN) & dwc->mask)
> cpu_relax();
> +
> + dma_writel(dw, CLEAR.XFER, dwc->mask);
> + dma_writel(dw, CLEAR.BLOCK, dwc->mask);
> + dma_writel(dw, CLEAR.SRC_TRAN, dwc->mask);
> + dma_writel(dw, CLEAR.DST_TRAN, dwc->mask);
> + dma_writel(dw, CLEAR.ERROR, dwc->mask);
> }
>
> /*----------------------------------------------------------------------*/
> @@ -259,34 +265,6 @@ dwc_descriptor_complete(struct dw_dma_chan *dwc, struct dw_desc *desc,
> dmaengine_desc_callback_invoke(&cb, NULL);
> }
>
> -static void dwc_complete_all(struct dw_dma *dw, struct dw_dma_chan *dwc)
> -{
> - struct dw_desc *desc, *_desc;
> - LIST_HEAD(list);
> - unsigned long flags;
> -
> - spin_lock_irqsave(&dwc->lock, flags);
> - if (dma_readl(dw, CH_EN) & dwc->mask) {
> - dev_err(chan2dev(&dwc->chan),
> - "BUG: XFER bit set, but channel not idle!\n");
> -
> - /* Try to continue after resetting the channel... */
> - dwc_chan_disable(dw, dwc);
> - }
> -
> - /*
> - * Submit queued descriptors ASAP, i.e. before we go through
> - * the completed ones.
> - */
> - list_splice_init(&dwc->active_list, &list);
> - dwc_dostart_first_queued(dwc);
> -
> - spin_unlock_irqrestore(&dwc->lock, flags);
> -
> - list_for_each_entry_safe(desc, _desc, &list, desc_node)
> - dwc_descriptor_complete(dwc, desc, true);
> -}
> -
> /* Returns how many bytes were already received from source */
> static inline u32 dwc_get_sent(struct dw_dma_chan *dwc)
> {
> @@ -303,6 +281,7 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
> struct dw_desc *child;
> u32 status_xfer;
> unsigned long flags;
> + LIST_HEAD(list);
>
> spin_lock_irqsave(&dwc->lock, flags);
> status_xfer = dma_readl(dw, RAW.XFER);
> @@ -341,9 +320,26 @@ static void dwc_scan_descriptors(struct dw_dma *dw, struct dw_dma_chan *dwc)
> clear_bit(DW_DMA_IS_SOFT_LLP, &dwc->flags);
> }
>
> + /*
> + * No more active descriptors left to handle. So submit the
> + * queued descriptors and finish up the already handled ones.
> + */
> + if (dma_readl(dw, CH_EN) & dwc->mask) {
> + dev_err(chan2dev(&dwc->chan),
> + "BUG: XFER bit set, but channel not idle!\n");
> +
> + /* Try to continue after resetting the channel... */
> + dwc_chan_disable(dw, dwc);
> + }
> +
> + list_splice_init(&dwc->active_list, &list);
> + dwc_dostart_first_queued(dwc);
> +
> spin_unlock_irqrestore(&dwc->lock, flags);
>
> - dwc_complete_all(dw, dwc);
> + list_for_each_entry_safe(desc, _desc, &list, desc_node)
> + dwc_descriptor_complete(dwc, desc, true);
> +
> return;
> }
>
> --
> 2.43.0
>
>
Hi,
This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him
a patch that has triggered this response. He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created. Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.
You are receiving this message because of the following common error(s)
as indicated below:
- You have marked a patch with a "Fixes:" tag for a commit that is in an
older released kernel, yet you do not have a cc: stable line in the
signed-off-by area at all, which means that the patch will not be
applied to any older kernel releases. To properly fix this, please
follow the documented rules in the
Documentation/process/stable-kernel-rules.rst file for how to resolve
this.
If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.
thanks,
greg k-h's patch email bot
Powered by blists - more mailing lists