[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <VE1PR04MB6638AC2A3AE852C3047E7B97895F0@VE1PR04MB6638.eurprd04.prod.outlook.com>
Date: Mon, 17 Aug 2020 09:22:46 +0000
From: Robin Gong <yibin.gong@....com>
To: Benjamin Bara - SKIDATA <Benjamin.Bara@...data.com>
CC: "timur@...nel.org" <timur@...nel.org>,
"nicoleotsuka@...il.com" <nicoleotsuka@...il.com>,
"vkoul@...nel.org" <vkoul@...nel.org>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
dl-linux-imx <linux-imx@....com>,
"shawnguo@...nel.org" <shawnguo@...nel.org>,
"kernel@...gutronix.de" <kernel@...gutronix.de>,
"dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"alsa-devel@...a-project.org" <alsa-devel@...a-project.org>,
Richard Leitner - SKIDATA <Richard.Leitner@...data.com>
Subject: RE: pcm|dmaengine|imx-sdma race condition on i.MX6
On 2020/08/17 15:29 Benjamin Bara - SKIDATA <Benjamin.Bara@...data.com> wrote:
> We think this is not an i.MX6-specific problem, but a problem of the
> DMAengine usage from the PCM.
> In case of a XRUN, the DMA channel is never closed but first a
> SNDRV_PCM_TRIGGER_STOP next a SNDRV_PCM_TRIGGER_START is triggered.
> The SNDRV_PCM_TRIGGER_STOP simply executes a
> dmaengine_terminate_async() [1] but does not await the termination by calling
> dmaengine_synchronize(), which is required as stated by the docu [2].
> Anyways, we are not able to fix it in the pcm_dmaengine layer either at the
> end of SNDRV_PCM_TRIGGER_STOP (called from the DMA on complete
> interrupt handler) or at the beginning of SNDRV_PCM_TRIGGER_START (called
> from a PCM ioctl), since the dmaengine_synchronize() requires a non-atomic
> context.
>
> Based on my understanding, most of the DMA implementations don't even
> implement device_synchronize and if they do, it might not really be necessary
> since the terminate_all operation is synchron.
>
> With the i.MX6, it looks a bit different:
> Since [4], the terminate_all operation really schedules a worker which waits
> the required ~1ms and then does the context freeing.
> Now, the ioctl(SNDRV_PCM_IOCTL_PREPARE) and the following
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES),
> which are called from US to handle/recover from a XRUN, are in a race with the
> terminate_worker.
> If the terminate_worker finishes earlier, everything is fine.
> Otherwise, the sdma_prep_dma_cyclic() is called, sets up everything and as
> soon as it is scheduled out to wait for data, the terminate_worker is scheduled
> and kills it.
> In this case, we wait in [5] until the timeout is reached and return with -EIO.
>
> Based on our understanding, there exist two different fixing approaches:
> We thought that the pcm_dmaengine should handle this by either
> synchronizing the DMA on a trigger or terminating it synchronously.
> However, as we are in an atomic context, we either have to give up the atomic
> context of the PCM to finish the termination or we have to design a
> synchronous terminate variant which is callable from an atomic context.
>
> For the first option, which is potentially more performant, we have to leave the
> atomic PCM context and we are not sure if we are allowed to.
> For the second option, we would have to divide the dma_device terminate_all
> into an atomic sync and an async one, which would align with the dmaengine
> API, giving it the option to ensure termination in an atomic context.
> Based on my understanding, most of them are synchronous anyways, for the
> currently async ones we would have to implement busy waits.
> However, with this approach, we reach the WARN_ON [6] inside of an atomic
> context, indicating we might not do the right thing.
>
> Ad Failure Log (at bottom):
> I haven't added the ioctl syscalls, but this is basically the output with additional
> prints to be able to follow the code execution path.
> A XRUN (buffer size is 480 but 960 available) leads to a
> SNDRV_PCM_TRIGGER_STOP.
> This leads to terminate_async, starting the terminate_worker.
> Next, the XRUN recovery triggers SNDRV_PCM_TRIGGER_START, calling
> sdma_prep_dma_cyclic and then waits for the DMA in wait_for_avail().
> Next we see the two freeings, first the old, then the newly added one; so the
> terminate_worker is back at work.
> Now the DMA is terminated, while we are still waiting on data from it.
>
> What do you think about it? Is any of the provided solutions practicable?
> If you need further information or additional logging, feel free to ask.
busy_wait is not good I think, would you please have a try with the attached patch
which is based on https://lkml.org/lkml/2020/8/11/111? The basic idea is
to keep the freed descriptor into another list for freeing in later terminate_worker
instead of freeing directly all in terminate_worker by vchan_get_all_descriptors
which may break next descriptor coming soon
Download attachment "0001-dmaengine-imx-sdma-add-terminated-list-for-freed-des.patch" of type "application/octet-stream" (2732 bytes)
Powered by blists - more mailing lists