lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 8 Aug 2018 13:29:56 -0300
From:   Ezequiel Garcia <ezequiel@...guardiasur.com.ar>
To:     Laurent Pinchart <laurent.pinchart@...asonboard.com>
Cc:     Alan Stern <stern@...land.harvard.edu>,
        Keiichi Watanabe <keiichiw@...omium.org>,
        Tomasz Figa <tfiga@...omium.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Linux Media Mailing List <linux-media@...r.kernel.org>,
        kieran.bingham@...asonboard.com,
        Douglas Anderson <dianders@...omium.org>,
        Ezequiel Garcia <ezequiel@...labora.com>, matwey@....msu.ru
Subject: Re: [RFC PATCH v1] media: uvcvideo: Cache URB header data before processing

On 8 August 2018 at 13:22, Laurent Pinchart
<laurent.pinchart@...asonboard.com> wrote:
> Hello,
>
> On Wednesday, 8 August 2018 17:20:21 EEST Alan Stern wrote:
>> On Wed, 8 Aug 2018, Keiichi Watanabe wrote:
>> > Hi Laurent, Kieran, Tomasz,
>> >
>> > Thank you for reviews and suggestions.
>> > I want to do additional measurements for improving the performance.
>> >
>> > Let me clarify my understanding:
>> > Currently, if the platform doesn't support coherent-DMA (e.g. ARM),
>> > urb_buffer is allocated by usb_alloc_coherent with
>> > URB_NO_TRANSFER_DMA_MAP flag instead of using kmalloc.
>>
>> Not exactly.  You are mixing up allocation with mapping.  The speed of
>> the allocation doesn't matter; all that matters is whether the memory
>> is cached and when it gets mapped/unmapped.
>>
>> > This is because we want to avoid frequent DMA mappings, which are
>> > generally expensive. However, memories allocated in this way are not
>> > cached.
>> >
>> > So, we wonder if using usb_alloc_coherent is really fast.
>> > In other words, we want to know which is better:
>> > "No DMA mapping/Uncached memory" v.s. "Frequent DMA mapping/Cached
>> > memory".
>
> The second option should also be split in two:
>
> - cached memory with DMA mapping/unmapping around each transfer
> - cached memory with DMA mapping/unmapping at allocation/free time, and DMA
> sync around each transfer
>

I agree with this, the second one should be better.

I still wonder if there is anyway we can create a helper for this,
as I am under the impression most USB video4linux drivers
will want to implement the same.

> The second option should in theory lead to at least slightly better
> performances, but tests with the pwc driver have reported contradictory
> results. I'd like to know whether that's also the case with the uvcvideo
> driver, and if so, why.
>

I believe that is no longer the case. Matwey measured again and the results
are what we expected: a single mapping, and sync in the interrupt handler
is a little bit faster. See https://lkml.org/lkml/2018/8/4/44

2) dma_unmap and dma_map in the handler:
2A) dma_unmap_single call: 28.8 +- 1.5 usec
2B) memcpy and the rest: 58 +- 6 usec
2C) dma_map_single call: 22 +- 2 usec
Total: 110 +- 7 usec

3) dma_sync_single_for_cpu
3A) dma_sync_single_for_cpu call: 29.4 +- 1.7 usec
3B) memcpy and the rest: 59 +- 6 usec
3C) noop (trace events overhead): 5 +- 2 usec
Total: 93 +- 7 usec

-- 
Ezequiel GarcĂ­a, VanguardiaSur
www.vanguardiasur.com.ar

Powered by blists - more mailing lists