[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16942794-1e03-6da0-b8e5-c82332a217a5@metafoo.de>
Date: Wed, 19 Aug 2020 13:08:29 +0200
From: Lars-Peter Clausen <lars@...afoo.de>
To: Benjamin Bara - SKIDATA <Benjamin.Bara@...data.com>,
Robin Gong <yibin.gong@....com>
Cc: "timur@...nel.org" <timur@...nel.org>,
"nicoleotsuka@...il.com" <nicoleotsuka@...il.com>,
"vkoul@...nel.org" <vkoul@...nel.org>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
dl-linux-imx <linux-imx@....com>,
"shawnguo@...nel.org" <shawnguo@...nel.org>,
"kernel@...gutronix.de" <kernel@...gutronix.de>,
"dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"alsa-devel@...a-project.org" <alsa-devel@...a-project.org>,
Richard Leitner - SKIDATA <Richard.Leitner@...data.com>
Subject: Re: pcm|dmaengine|imx-sdma race condition on i.MX6
On 8/17/20 9:28 AM, Benjamin Bara - SKIDATA wrote:
> We think this is not an i.MX6-specific problem, but a problem of the DMAengine usage from the PCM.
> In case of a XRUN, the DMA channel is never closed but first a SNDRV_PCM_TRIGGER_STOP next a
> SNDRV_PCM_TRIGGER_START is triggered.
> The SNDRV_PCM_TRIGGER_STOP simply executes a dmaengine_terminate_async() [1]
> but does not await the termination by calling dmaengine_synchronize(),
> which is required as stated by the docu [2].
> Anyways, we are not able to fix it in the pcm_dmaengine layer either at the end of
> SNDRV_PCM_TRIGGER_STOP (called from the DMA on complete interrupt handler)
> or at the beginning of SNDRV_PCM_TRIGGER_START (called from a PCM ioctl),
> since the dmaengine_synchronize() requires a non-atomic context.
I think this might be an sdma specific problem after all.
dmaengine_terminate_async() will issue a request to stop the DMA. But it
is still safe to issue the next transfer, even without calling
dmaengine_synchronize(). The DMA should start the new transfer at its
earliest convenience in that case.
dmaegine_synchronize() is so that the consumer has a guarantee that the
DMA is finished using the resources (e.g. the memory buffers) associated
with the DMA transfer so it can safely free them.
>
> Based on my understanding, most of the DMA implementations don't even implement device_synchronize
> and if they do, it might not really be necessary since the terminate_all operation is synchron.
There are a lot of buggy DMAengine drivers :) Pretty much all of them
need device_synchronize() to get this right.
>
> With the i.MX6, it looks a bit different:
> Since [4], the terminate_all operation really schedules a worker which waits the required ~1ms and
> then does the context freeing.
> Now, the ioctl(SNDRV_PCM_IOCTL_PREPARE) and the following ioctl(SNDRV_PCM_IOCTL_READI_FRAMES),
> which are called from US to handle/recover from a XRUN, are in a race with the terminate_worker.
> If the terminate_worker finishes earlier, everything is fine.
> Otherwise, the sdma_prep_dma_cyclic() is called, sets up everything and
> as soon as it is scheduled out to wait for data, the terminate_worker is scheduled and kills it.
> In this case, we wait in [5] until the timeout is reached and return with -EIO.
>
> Based on our understanding, there exist two different fixing approaches:
> We thought that the pcm_dmaengine should handle this by either synchronizing the DMA on a trigger or
> terminating it synchronously.
> However, as we are in an atomic context, we either have to give up the atomic context of the PCM
> to finish the termination or we have to design a synchronous terminate variant which is callable
> from an atomic context.
>
> For the first option, which is potentially more performant, we have to leave the atomic PCM context
> and we are not sure if we are allowed to.
> For the second option, we would have to divide the dma_device terminate_all into an atomic sync and
> an async one, which would align with the dmaengine API, giving it the option to ensure termination
> in an atomic context.
> Based on my understanding, most of them are synchronous anyways, for the currently async ones we
> would have to implement busy waits.
> However, with this approach, we reach the WARN_ON [6] inside of an atomic context,
> indicating we might not do the right thing.
I don't know how feasible this is to implement in the SDMA dmaengine
driver. But I think what is should do is to have some flag to indicate
if a terminate is in progress. If a new transfer is issued while
terminate is in progress the transfer should go on a list. Once
terminate finishes it should check the list and start the transfer if
there are any on the list.
- Lars
Powered by blists - more mailing lists