lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 8 Aug 2018 13:04:45 -0400 (EDT)
From:   Alan Stern <stern@...land.harvard.edu>
To:     Laurent Pinchart <laurent.pinchart@...asonboard.com>
cc:     Keiichi Watanabe <keiichiw@...omium.org>,
        Tomasz Figa <tfiga@...omium.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Linux Media Mailing List <linux-media@...r.kernel.org>,
        <kieran.bingham@...asonboard.com>,
        Douglas Anderson <dianders@...omium.org>,
        <ezequiel@...labora.com>, <matwey@....msu.ru>
Subject: Re: [RFC PATCH v1] media: uvcvideo: Cache URB header data before
 processing

On Wed, 8 Aug 2018, Laurent Pinchart wrote:

> Hello,
> 
> On Wednesday, 8 August 2018 17:20:21 EEST Alan Stern wrote:
> > On Wed, 8 Aug 2018, Keiichi Watanabe wrote:
> > > Hi Laurent, Kieran, Tomasz,
> > > 
> > > Thank you for reviews and suggestions.
> > > I want to do additional measurements for improving the performance.
> > > 
> > > Let me clarify my understanding:
> > > Currently, if the platform doesn't support coherent-DMA (e.g. ARM),
> > > urb_buffer is allocated by usb_alloc_coherent with
> > > URB_NO_TRANSFER_DMA_MAP flag instead of using kmalloc.
> > 
> > Not exactly.  You are mixing up allocation with mapping.  The speed of
> > the allocation doesn't matter; all that matters is whether the memory
> > is cached and when it gets mapped/unmapped.
> > 
> > > This is because we want to avoid frequent DMA mappings, which are
> > > generally expensive. However, memories allocated in this way are not
> > > cached.
> > > 
> > > So, we wonder if using usb_alloc_coherent is really fast.
> > > In other words, we want to know which is better:
> > > "No DMA mapping/Uncached memory" v.s. "Frequent DMA mapping/Cached
> > > memory".
> 
> The second option should also be split in two:
> 
> - cached memory with DMA mapping/unmapping around each transfer
> - cached memory with DMA mapping/unmapping at allocation/free time, and DMA 
> sync around each transfer
> 
> The second option should in theory lead to at least slightly better 
> performances, but tests with the pwc driver have reported contradictory 
> results. I'd like to know whether that's also the case with the uvcvideo 
> driver, and if so, why.
> 
> > There is no need to wonder.  "Frequent DMA mapping/Cached memory" is
> > always faster than "No DMA mapping/Uncached memory".
> 
> Is it really, doesn't it depend on the CPU access pattern ?

Well, if your access pattern involves transferring data in from the
device and then throwing it away without reading it, you might get a
different result.  :-)  But assuming you plan to read the data after
transferring it, using uncached memory slows things down so much that
the overhead of DMA mapping/unmapping is negligible by comparison.

The only exception might be if you were talking about very small
amounts of data.  I don't know exactly where the crossover occurs, but
bear in mind that Matwey's tests required ~50 us for mapping/unmapping
and 3000 us for accessing uncached memory.  He didn't say how large the
transfers were, but that's still a pretty big difference.

Alan Stern

> > The only issue is that on some platform (such as x86) but not others,
> > there is a third option: "No DMA mapping/Cached memory".  On platforms
> > which support it, this is the fastest option.

Powered by blists - more mailing lists