lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 15 Oct 2018 09:29:04 -0700
From:   John Stultz <john.stultz@...aro.org>
To:     Laura Abbott <labbott@...hat.com>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        Beata Michalska <Beata.Michalska@....com>,
        Matt Szczesiak <matt.szczesiak@....com>,
        Anders Pedersen <Anders.Pedersen@....com>,
        John Reitan <John.Reitan@....com>,
        Liam Mark <lmark@...eaurora.org>,
        Sumit Semwal <sumit.semwal@...aro.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Todd Kjos <tkjos@...roid.com>,
        Martijn Coenen <maco@...roid.com>,
        dri-devel <dri-devel@...ts.freedesktop.org>
Subject: Re: [PATCH] staging: ion: Rework ion_map_dma_buf() to minimize re-mapping

On Fri, Oct 12, 2018 at 10:51 AM, Laura Abbott <labbott@...hat.com> wrote:
> On 10/10/2018 04:33 PM, John Stultz wrote:
>>
>> Since 4.12, much later narrowed down to commit 2a55e7b5e544
>> ("staging: android: ion: Call dma_map_sg for syncing and mapping"),
>> we have seen graphics performance issues on the HiKey960.
>>
>> This was initially confounded by the fact that the out-of-tree
>> DRM driver was using HiSi custom ION heap which broke with the
>> 4.12 ION abi changes, so there was lots of suspicion that the
>> performance problems were due to switching to a somewhat simple
>> cma based DRM driver for HiKey960. Additionally, as no
>> performance regression was seen w/ the original HiKey board
>> (which is SMP, not big.LITTLE as w/ HiKey960), there was some
>> thought that the out-of-tree EAS code wasn't quite optimized.
>>
>> But after chasing a number of other leads, I found that
>> reverting the ION code to 4.11-era got the majority of the
>> graphics performance back (there may yet be further EAS tweaks
>> needed), which lead me to the dma_map_sg change.
>>
>> In talking w/ Laura and Liam, it was suspected that the extra
>> cache operations were causing the trouble. Additionally, I found
>> that part of the reason we didn't see this w/ the original
>> HiKey board is that its (proprietary blob) GL code uses ion_mmap
>> and ion_map_dma_buf is called very rarely, where as with
>> HiKey960, the (also proprietary blob) GL code calls
>> ion_map_dma_buf much more frequently via the kernel driver.
>>
>> Anyway, with the cause of the performance regression isolated,
>> I've tried to find a way to improve the performance of the
>> current code.
>>
>> This approach, which I've mostly copied from the drm_prime
>> implementation is to try to track the direction we're mapping
>> the buffers so we can avoid calling dma_map/unmap_sg on every
>> ion_map_dma_buf/ion_unmap_dma_buf call, and instead try to do
>> the work in attach/detach paths.
>>
>> I'm not 100% sure of the correctness here, so close review would
>> be good, but it gets the performance back to being similar to
>> reverting the ION code to the 4.11-era.
>>
>> Feedback would be greatly appreciated!
>>
...
>>   @@ -264,7 +291,6 @@ static void ion_unmap_dma_buf(struct
>> dma_buf_attachment *attachment,
>>                               struct sg_table *table,
>>                               enum dma_data_direction direction)
>>   {
>> -       dma_unmap_sg(attachment->dev, table->sgl, table->nents,
>> direction);
>
>
> This changes the semantics so that the only time a buffer
> gets unmapped is on detach. I don't think we want to restrict
> Ion to that behavior but I also don't know if anyone else
> is relying on that. I thought there might have been some Qualcomm
> stuff that did that (Liam? Todd?)
>
> I suspect most of the cost of the dma_map/dma_unmap is from the
> cache flushing and not the actual mapping operations. If this
> is the case, another option might be to figure out how to
> incorporate dma_attrs so drivers can use DMA_ATTR_SKIP_CPU_SYNC
> to decide when they actually want to sync.

Ok. Thanks so much for the feedback and the suggestion. I'll try to
look into dma_attrs here shortly.

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ