[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7534ca1d-f874-7809-6125-d9fc72f70e39@redhat.com>
Date: Fri, 12 Oct 2018 10:51:41 -0700
From: Laura Abbott <labbott@...hat.com>
To: John Stultz <john.stultz@...aro.org>,
lkml <linux-kernel@...r.kernel.org>
Cc: Beata Michalska <Beata.Michalska@....com>,
Matt Szczesiak <matt.szczesiak@....com>,
Anders Pedersen <Anders.Pedersen@....com>,
John Reitan <John.Reitan@....com>,
Liam Mark <lmark@...eaurora.org>,
Sumit Semwal <sumit.semwal@...aro.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Todd Kjos <tkjos@...roid.com>,
Martijn Coenen <maco@...roid.com>,
dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize
re-mapping
On 10/10/2018 04:33 PM, John Stultz wrote:
> Since 4.12, much later narrowed down to commit 2a55e7b5e544
> ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
> we have seen graphics performance issues on the HiKey960.
>
> This was initially confounded by the fact that the out-of-tree
> DRM driver was using HiSi custom ION heap which broke with the
> 4.12 ION abi changes, so there was lots of suspicion that the
> performance problems were due to switching to a somewhat simple
> cma based DRM driver for HiKey960. Additionally, as no
> performance regression was seen w/ the original HiKey board
> (which is SMP, not big.LITTLE as w/ HiKey960), there was some
> thought that the out-of-tree EAS code wasn't quite optimized.
>
> But after chasing a number of other leads, I found that
> reverting the ION code to 4.11-era got the majority of the
> graphics performance back (there may yet be further EAS tweaks
> needed), which lead me to the dma_map_sg change.
>
> In talking w/ Laura and Liam, it was suspected that the extra
> cache operations were causing the trouble. Additionally, I found
> that part of the reason we didn't see this w/ the original
> HiKey board is that its (proprietary blob) GL code uses ion_mmap
> and ion_map_dma_buf is called very rarely, where as with
> HiKey960, the (also proprietary blob) GL code calls
> ion_map_dma_buf much more frequently via the kernel driver.
>
> Anyway, with the cause of the performance regression isolated,
> I've tried to find a way to improve the performance of the
> current code.
>
> This approach, which I've mostly copied from the drm_prime
> implementation is to try to track the direction we're mapping
> the buffers so we can avoid calling dma_map/unmap_sg on every
> ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
> the work in attach/detach paths.
>
> I'm not 100% sure of the correctness here, so close review would
> be good, but it gets the performance back to being similar to
> reverting the ION code to the 4.11-era.
>
> Feedback would be greatly appreciated!
>
> Cc: Beata Michalska <Beata.Michalska@....com>
> Cc: Matt Szczesiak <matt.szczesiak@....com>
> Cc: Anders Pedersen <Anders.Pedersen@....com>
> Cc: John Reitan <John.Reitan@....com>
> Cc: Liam Mark <lmark@...eaurora.org>
> Cc: Laura Abbott <labbott@...hat.com>
> Cc: Sumit Semwal <sumit.semwal@...aro.org>
> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Cc: Todd Kjos <tkjos@...roid.com>
> Cc: Martijn Coenen <maco@...roid.com>
> Cc: dri-devel@...ts.freedesktop.org
> Signed-off-by: John Stultz <john.stultz@...aro.org>
> ---
> drivers/staging/android/ion/ion.c | 36 +++++++++++++++++++++++++++++++-----
> 1 file changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> index 9907332..a4d7fca 100644
> --- a/drivers/staging/android/ion/ion.c
> +++ b/drivers/staging/android/ion/ion.c
> @@ -199,6 +199,7 @@ struct ion_dma_buf_attachment {
> struct device *dev;
> struct sg_table *table;
> struct list_head list;
> + enum dma_data_direction dir;
> };
>
> static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> @@ -220,6 +221,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
>
> a->table = table;
> a->dev = attachment->dev;
> + a->dir = DMA_NONE;
> INIT_LIST_HEAD(&a->list);
>
> attachment->priv = a;
> @@ -236,6 +238,18 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> {
> struct ion_dma_buf_attachment *a = attachment->priv;
> struct ion_buffer *buffer = dmabuf->priv;
> + struct sg_table *table;
> +
> + if (!a)
> + return;
> +
> + table = a->table;
> + if (table) {
> + if (a->dir != DMA_NONE)
> + dma_unmap_sg(attachment->dev, table->sgl, table->nents,
> + a->dir);
> + sg_free_table(table);
> + }
>
> free_duped_table(a->table);
> mutex_lock(&buffer->lock);
> @@ -243,6 +257,7 @@ static void ion_dma_buf_detatch(struct dma_buf *dmabuf,
> mutex_unlock(&buffer->lock);
>
> kfree(a);
> + attachment->priv = NULL;
> }
>
> static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> @@ -251,12 +266,24 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> struct ion_dma_buf_attachment *a = attachment->priv;
> struct sg_table *table;
>
> - table = a->table;
> + if (WARN_ON(direction == DMA_NONE || !a))
> + return ERR_PTR(-EINVAL);
>
> - if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> - direction))
> - return ERR_PTR(-ENOMEM);
> + if (a->dir == direction)
> + return a->table;
>
> + if (WARN_ON(a->dir != DMA_NONE))
> + return ERR_PTR(-EBUSY);
> +
> + table = a->table;
> + if (!IS_ERR(table)) {
> + if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> + direction)) {
> + table = ERR_PTR(-ENOMEM);
> + } else {
> + a->dir = direction;
> + }
> + }
> return table;
> }
>
> @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
> struct sg_table *table,
> enum dma_data_direction direction)
> {
> - dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
This changes the semantics so that the only time a buffer
gets unmapped is on detach. I don't think we want to restrict
Ion to that behavior but I also don't know if anyone else
is relying on that. I thought there might have been some Qualcomm
stuff that did that (Liam? Todd?)
I suspect most of the cost of the dma_map/dma_unmap is from the
cache flushing and not the actual mapping operations. If this
is the case, another option might be to figure out how to
incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
to decide when they actually want to sync.
Thanks,
Laura
> }
>
> static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
>
Powered by blists - more mailing lists