lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190701092932.GA3939@localhost.localdomain>
Date:   Mon, 1 Jul 2019 18:29:34 +0900
From:   Suwan Kim <suwan.kim027@...il.com>
To:     shuah <shuah@...nel.org>
Cc:     stern@...land.harvard.edu, valentina.manea.m@...il.com,
        gregkh@...uxfoundation.org, linux-usb@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] usbip: Skip DMA mapping and unmapping for urb at vhci

On Fri, Jun 28, 2019 at 06:11:54PM -0600, shuah wrote:
> Hi Suwan,
> 
> On 6/21/19 11:45 AM, Suwan Kim wrote:
> > vhci doesn’t do dma for remote device. Actually, the real dma
> > operation is done by network card driver. So, vhci doesn’t use and
> > need dma address of transfer buffer of urb.
> > 
> > When vhci supports SG, it is useful to use native SG list instead
> > of mapped SG list because dma mapping fnuction can adjust the
> > number of SG list that is urb->num_mapped_sgs.
> > 
> > But hcd provides dma mapping and unmapping function by defualt.
> 
> Typo "defualt"
> 
> > Moreover, it causes unnecessary dma mapping and unmapping which
> > will be done again at the NIC driver and it wastes CPU cycles.
> > So, implement map_urb_for_dma and unmap_urb_for_dma function for
> > vhci in order to skip the dma mapping and unmapping procedure.
> > 
> 
> How did you verify that unnecessary dma map/unmap are happening?
> How many CPU cycles did you manage to reduce with this change?

Dma mapping/unmapping is not required for vhci because vhci passes
the virtual address of the buffer to the network stack without
passing the dma address of the buffer. Network stack receive the
virtual address of the buffer from vhci and later, network card
driver performs dma mapping for the buffer. So, as far as I know,
dma address of the buffer is not used for vhci and virtual address
is only used by vhci.

I used ftrace to measure a duration of usb_hcd_map_urb_for_dma().
As a result, usb_hcd_map_urb_for_dma() took a duration of about
0.14us out of about 10us which is the duration of usb_hcd_submit_urb().
However, this figure is the dma mapping measurement value for
physically contiguous buffers when vhci does not support SG, and
if vhci supports SG, more CPU cycles will be consumed for SG dma
mapping.

I think that the important point is dma mapping/unmapping is
unnecessary at vhci. So we can skip dma mapping/unmapping and save
the CPU cycles (even if it is small). This is an opportunity to
reduce the end-to-end latency of usbip and improve the performance.

Regards

Suwan Kim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ