linux-kernel - Re: pcm|dmaengine|imx-sdma race condition on i.MX6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <16942794-1e03-6da0-b8e5-c82332a217a5@metafoo.de>
Date:   Wed, 19 Aug 2020 13:08:29 +0200
From:   Lars-Peter Clausen <lars@...afoo.de>
To:     Benjamin Bara - SKIDATA <Benjamin.Bara@...data.com>,
        Robin Gong <yibin.gong@....com>
Cc:     "timur@...nel.org" <timur@...nel.org>,
        "nicoleotsuka@...il.com" <nicoleotsuka@...il.com>,
        "vkoul@...nel.org" <vkoul@...nel.org>,
        "dan.j.williams@...el.com" <dan.j.williams@...el.com>,
        dl-linux-imx <linux-imx@....com>,
        "shawnguo@...nel.org" <shawnguo@...nel.org>,
        "kernel@...gutronix.de" <kernel@...gutronix.de>,
        "dmaengine@...r.kernel.org" <dmaengine@...r.kernel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "alsa-devel@...a-project.org" <alsa-devel@...a-project.org>,
        Richard Leitner - SKIDATA <Richard.Leitner@...data.com>
Subject: Re: pcm|dmaengine|imx-sdma race condition on i.MX6

On 8/17/20 9:28 AM, Benjamin Bara - SKIDATA wrote:
> We think this is not an i.MX6-specific problem, but a problem of the DMAengine usage from the PCM.
> In case of a XRUN, the DMA channel is never closed but first a SNDRV_PCM_TRIGGER_STOP next a
> SNDRV_PCM_TRIGGER_START is triggered.
> The SNDRV_PCM_TRIGGER_STOP simply executes a dmaengine_terminate_async() [1]
> but does not await the termination by calling dmaengine_synchronize(),
> which is required as stated by the docu [2].
> Anyways, we are not able to fix it in the pcm_dmaengine layer either at the end of
> SNDRV_PCM_TRIGGER_STOP (called from the DMA on complete interrupt handler)
> or at the beginning of SNDRV_PCM_TRIGGER_START (called from a PCM ioctl),
> since the dmaengine_synchronize() requires a non-atomic context.

I think this might be an sdma specific problem after all. 
dmaengine_terminate_async() will issue a request to stop the DMA. But it 
is still safe to issue the next transfer, even without calling 
dmaengine_synchronize(). The DMA should start the new transfer at its 
earliest convenience in that case.

dmaegine_synchronize() is so that the consumer has a guarantee that the 
DMA is finished using the resources (e.g. the memory buffers) associated 
with the DMA transfer so it can safely free them.

>
> Based on my understanding, most of the DMA implementations don't even implement device_synchronize
> and if they do, it might not really be necessary since the terminate_all operation is synchron.
There are a lot of buggy DMAengine drivers :) Pretty much all of them 
need device_synchronize() to get this right.
>
> With the i.MX6, it looks a bit different:
> Since [4], the terminate_all operation really schedules a worker which waits the required ~1ms and
> then does the context freeing.
> Now, the ioctl(SNDRV_PCM_IOCTL_PREPARE) and the following ioctl(SNDRV_PCM_IOCTL_READI_FRAMES),
> which are called from US to handle/recover from a XRUN, are in a race with the terminate_worker.
> If the terminate_worker finishes earlier, everything is fine.
> Otherwise, the sdma_prep_dma_cyclic() is called, sets up everything and
> as soon as it is scheduled out to wait for data, the terminate_worker is scheduled and kills it.
> In this case, we wait in [5] until the timeout is reached and return with -EIO.
>
> Based on our understanding, there exist two different fixing approaches:
> We thought that the pcm_dmaengine should handle this by either synchronizing the DMA on a trigger or
> terminating it synchronously.
> However, as we are in an atomic context, we either have to give up the atomic context of the PCM
> to finish the termination or we have to design a synchronous terminate variant which is callable
> from an atomic context.
>
> For the first option, which is potentially more performant, we have to leave the atomic PCM context
> and we are not sure if we are allowed to.
> For the second option, we would have to divide the dma_device terminate_all into an atomic sync and
> an async one, which would align with the dmaengine API, giving it the option to ensure termination
> in an atomic context.
> Based on my understanding, most of them are synchronous anyways, for the currently async ones we
> would have to implement busy waits.
> However, with this approach, we reach the WARN_ON [6] inside of an atomic context,
> indicating we might not do the right thing.

I don't know how feasible this is to implement in the SDMA dmaengine 
driver. But I think what is should do is to have some flag to indicate 
if a terminate is in progress. If a new transfer is issued while 
terminate is in progress the transfer should go on a list. Once 
terminate finishes it should check the list and start the transfer if 
there are any on the list.

- Lars