lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Nov 2019 10:50:16 -0700
From:   Logan Gunthorpe <logang@...tatee.com>
To:     Vinod Koul <vkoul@...nel.org>
Cc:     linux-kernel@...r.kernel.org, dmaengine@...r.kernel.org,
        Dan Williams <dan.j.williams@...el.com>
Subject: Re: [PATCH 3/5] dmaengine: plx-dma: Introduce PLX DMA engine PCI
 driver skeleton



On 2019-11-09 10:35 a.m., Vinod Koul wrote:
> On 22-10-19, 15:46, Logan Gunthorpe wrote:
>> +static irqreturn_t plx_dma_isr(int irq, void *devid)
>> +{
>> +	return IRQ_HANDLED;
> 
> ??

Yes, sorry this is more of an artifact of how I chose to split the
patches up. The ISR is filled-in in patch 4.


>> +	 */
>> +	schedule_work(&plxdev->release_work);
>> +}
>> +
>> +static void plx_dma_put(struct plx_dma_dev *plxdev)
>> +{
>> +	kref_put(&plxdev->ref, plx_dma_release);
>> +}
>> +
>> +static int plx_dma_alloc_chan_resources(struct dma_chan *chan)
>> +{
>> +	struct plx_dma_dev *plxdev = chan_to_plx_dma_dev(chan);
>> +
>> +	kref_get(&plxdev->ref);
> 
> why do you need to do this?

This has to do with being able to probably unbind while a channel is in
use. If we don't hold a reference to the struct plx_dma_dev between
alloc_chan_resources() and free_chan_resources() then it will panic if a
call back is called after plx_dma_remove(). The way I've done it, once a
device is removed, subsequent calls to dma_prep_memcpy() will fail (see
ring_active).

struct plx_dma_dev needs to be alive between plx_dma_probe() and
plx_dma_remove(), and between calls to alloc_chan_resources() and
free_chan_resources(). So we use a reference count to ensure this.

>> +}
>> +
>> +static void plx_dma_release_work(struct work_struct *work)
>> +{
>> +	struct plx_dma_dev *plxdev = container_of(work, struct plx_dma_dev,
>> +						  release_work);
>> +
>> +	dma_async_device_unregister(&plxdev->dma_dev);
>> +	put_device(plxdev->dma_dev.dev);
>> +	kfree(plxdev);
>> +}
>> +
>> +static void plx_dma_release(struct kref *ref)
>> +{
>> +	struct plx_dma_dev *plxdev = container_of(ref, struct plx_dma_dev, ref);
>> +
>> +	/*
>> +	 * The dmaengine reference counting and locking is a bit of a
>> +	 * mess so we have to work around it a bit here. We might put
>> +	 * the reference while the dmaengine holds the dma_list_mutex
>> +	 * which means we can't call dma_async_device_unregister() directly
>> +	 * here and it must be delayed.
>
> why is that, i have not heard any complaints about locking, can you
> elaborate on why you need to do this?

Per the above explanation, we need to call plx_dma_put() in
plx_dma_free_chan_resources(); and plx_dma_release() is when we can call
dma_async_device_unregister() (seeing that's when we know there are no
longer any active channels).

However, dma_chan_put() (which calls device_free_chan_resources()) holds
the dma_list_mutex and dma_async_device_unregister() tries to take the
dma_list_mutex so, if we call unregister inside free_chan_resources we
would deadlock.

I suspect that most dmaengine drivers will WARN[1] and then panic if
they are unbound while in use. The locking, reference counting and
structure of the API didn't seem to consider it and required extra
reference counts and workarounds to make it work correctly at all.

This is the mess I'm referring to and would require some significant
restructuring to fix generally.

Logan

[1]
https://elixir.bootlin.com/linux/latest/source/drivers/dma/dmaengine.c#L1119


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ