lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1810132256240.4908@lmark-linux.qualcomm.com>
Date:   Sat, 13 Oct 2018 23:01:09 -0700 (PDT)
From:   Liam Mark <lmark@...eaurora.org>
To:     Laura Abbott <labbott@...hat.com>
cc:     John Stultz <john.stultz@...aro.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Beata Michalska <Beata.Michalska@....com>,
        Matt Szczesiak <matt.szczesiak@....com>,
        Anders Pedersen <Anders.Pedersen@....com>,
        John Reitan <John.Reitan@....com>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Todd Kjos <tkjos@...roid.com>,
        Martijn Coenen <maco@...roid.com>,
        dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize
 re-mapping

On Fri, 12 Oct 2018, Laura Abbott wrote:

> On 10/10/2018 04:33 PM, John Stultz wrote:
> > Since 4.12, much later narrowed down to commit 2a55e7b5e544
> > ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
> > we have seen graphics performance issues on the HiKey960.
> > 
> > This was initially confounded by the fact that the out-of-tree
> > DRM driver was using HiSi custom ION heap which broke with the
> > 4.12 ION abi changes, so there was lots of suspicion that the
> > performance problems were due to switching to a somewhat simple
> > cma based DRM driver for HiKey960. Additionally, as no
> > performance regression was seen w/ the original HiKey board
> > (which is SMP, not big.LITTLE as w/ HiKey960), there was some
> > thought that the out-of-tree EAS code wasn't quite optimized.
> > 
> > But after chasing a number of other leads, I found that
> > reverting the ION code to 4.11-era got the majority of the
> > graphics performance back (there may yet be further EAS tweaks
> > needed), which lead me to the dma_map_sg change.
> > 
> > In talking w/ Laura and Liam, it was suspected that the extra
> > cache operations were causing the trouble. Additionally, I found
> > that part of the reason we didn't see this w/ the original
> > HiKey board is that its (proprietary blob) GL code uses ion_mmap
> > and ion_map_dma_buf is called very rarely, where as with
> > HiKey960, the (also proprietary blob) GL code calls
> > ion_map_dma_buf much more frequently via the kernel driver.
> > 
> > Anyway, with the cause of the performance regression isolated,
> > I've tried to find a way to improve the performance of the
> > current code.
> > 
> > This approach, which I've mostly copied from the drm_prime
> > implementation is to try to track the direction we're mapping
> > the buffers so we can avoid calling dma_map/unmap_sg on every
> > ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
> > the work in attach/detach paths.
> > 
> > I'm not 100% sure of the correctness here, so close review would
> > be good, but it gets the performance back to being similar to
> > reverting the ION code to the 4.11-era.
> > 
> > Feedback would be greatly appreciated!
> > 
> > Cc: Beata Michalska <Beata.Michalska@....com>
> > Cc: Matt Szczesiak <matt.szczesiak@....com>
> > Cc: Anders Pedersen <Anders.Pedersen@....com>
> > Cc: John Reitan <John.Reitan@....com>
> > Cc: Liam Mark <lmark@...eaurora.org>
> > Cc: Laura Abbott <labbott@...hat.com>
> > Cc: Sumit Semwal <sumit.semwal@...aro.org>
> > Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> > Cc: Todd Kjos <tkjos@...roid.com>
> > Cc: Martijn Coenen <maco@...roid.com>
> > Cc: dri-devel@...ts.freedesktop.org
> > Signed-off-by: John Stultz <john.stultz@...aro.org>
> > ---
> >   drivers/staging/android/ion/ion.c | 36
> > +++++++++++++++++++++++++++++++-----
> >   1 file changed, 31 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/staging/android/ion/ion.c
> > b/drivers/staging/android/ion/ion.c
> > index 9907332..a4d7fca 100644
> > --- a/drivers/staging/android/ion/ion.c
> > +++ b/drivers/staging/android/ion/ion.c
> > @@ -199,6 +199,7 @@ struct ion_dma_buf_attachment {
> >   	struct device *dev;
> >   	struct sg_table *table;
> >   	struct list_head list;
> > +	enum dma_data_direction dir;
> >   };
> >     static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> > @@ -220,6 +221,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> >     	a->table = table;
> >   	a->dev = attachment->dev;
> > +	a->dir = DMA_NONE;
> >   	INIT_LIST_HEAD(&a->list);
> >     	attachment->priv = a;
> > @@ -236,6 +238,18 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> >   {
> >   	struct ion_dma_buf_attachment *a = attachment->priv;
> >   	struct ion_buffer *buffer = dmabuf->priv;
> > +	struct sg_table *table;
> > +
> > +	if (!a)
> > +		return;
> > +
> > +	table = a->table;
> > +	if (table) {
> > +		if (a->dir != DMA_NONE)
> > +			dma_unmap_sg(attachment->dev, table->sgl,
> > table->nents,
> > +				     a->dir);
> > +		sg_free_table(table);
> > +	}
> >     	free_duped_table(a->table);
> >   	mutex_lock(&buffer->lock);
> > @@ -243,6 +257,7 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> >   	mutex_unlock(&buffer->lock);
> >     	kfree(a);
> > +	attachment->priv = NULL;
> >   }
> >     static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment
> > *attachment,
> > @@ -251,12 +266,24 @@ static struct sg_table *ion_map_dma_buf(struct
> > dma_buf_attachment *attachment,
> >   	struct ion_dma_buf_attachment *a = attachment->priv;
> >   	struct sg_table *table;
> >   -	table = a->table;
> > +	if (WARN_ON(direction == DMA_NONE || !a))
> > +		return ERR_PTR(-EINVAL);
> >   -	if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > -			direction))
> > -		return ERR_PTR(-ENOMEM);
> > +	if (a->dir == direction)
> > +		return a->table;
> >   +	if (WARN_ON(a->dir != DMA_NONE))
> > +		return ERR_PTR(-EBUSY);
> > +
> > +	table = a->table;
> > +	if (!IS_ERR(table)) {
> > +		if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > +				direction)) {
> > +			table = ERR_PTR(-ENOMEM);
> > +		} else {
> > +			a->dir = direction;
> > +		}
> > +	}
> >   	return table;
> >   }
> >   @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct
> > dma_buf_attachment *attachment,
> >   			      struct sg_table *table,
> >   			      enum dma_data_direction direction)
> >   {
> > -	dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
> 
> This changes the semantics so that the only time a buffer
> gets unmapped is on detach. I don't think we want to restrict
> Ion to that behavior but I also don't know if anyone else
> is relying on that. 

I also have a concern with this patch, wouldn't it run into difficulties 
if multiple devices were attached to the same cached ION buffer and one of 
those devices was IO-coherent but the other one wasn't.
For example if device A (which is non IO-coherent) wrote to the buffer but 
skipped the cache invalidation on dma_unmap and then if the buffer was 
dma-mapped into device B (which is IO-coherent) and then device B 
attempted to read the buffer it may end up reading stale cache entries.

I don't believe there is any requirement for device A to detach (which 
would do the cache invalidation) before having device B dma-map the 
buffer.

I believe there would also be issues if device B wrote to the buffer and 
then device A tried to read or write it.

> I thought there might have been some Qualcomm
> stuff that did that (Liam? Todd?)

Yes we have a form of "lazy mapping", which clients can opt into using, 
which results in iommu page table mapping not being unmaped on dma-unamp. 
Instead they are unmapped when the buffer is destroyed.

It is important to note that in our "lazy mapping" implementation cache 
maintenance is still applied on dma-map and dma-unmap.
If you need a full description of this feature I can provide it.

> 
> I suspect most of the cost of the dma_map/dma_unmap is from the
> cache flushing and not the actual mapping operations. If this
> is the case, another option might be to figure out how to
> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
> to decide when they actually want to sync.

We have support for this locally on our 4.14 branch.
We have added a dma_map_attrs field to the dma_buf_attachment struct, 
clients can then specify dma-attributes here such as 
DMA_ATTR_SKIP_CPU_SYNC before dma-mapping the buffer, then we ensure that 
these dma attributes are passed to dma_map_sg_attrs when ion_map_dma_buf 
is called (same for ion_unmap_dma_buf).

I hope to try and upstream this at some point.

> 
> Thanks,
> Laura
> 
> >   }
> >     static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> > 
> 
> 

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ