[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <063D6719AE5E284EB5DD2968C1650D6D1726FB4D@AcuExch.aculab.com>
Date: Thu, 10 Jul 2014 08:50:22 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Alexey Orishko' <alexey.orishko@...il.com>,
Bjørn Mork <bjorn@...k.no>
CC: joey ming <joey.zming@...il.com>,
"jim_baxter@...tor.com" <jim_baxter@...tor.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"zhao.ming9@....com.cn" <zhao.ming9@....com.cn>
Subject: RE: the side effect of using copy skb instead of skb_clone in cdc
ncm/mbim driver
From: Alexey Orishko
> On Wed, Jul 9, 2014 at 6:01 PM, Bjrn Mork <bjorn@...k.no> wrote:
> > Alexey's results were on modem hardware, and I am guessing the OS wasn't
> > Linux.
> NCM driver was deployed on Unix-like realtime OS and on various
> embedded application CPU running Android.
>
> > So NCM is probably a great win for the modems,
> > and you are right: That is likely why this aggregating protocol was
> > invented.
>
> One comment for this and earlier passages about USB-IF: during
> specification development there was no intention "to make life easier"
> on the host side, the goal was to make mobile device more efficient
> with a weak CPU (since 2007 situation changed significantly for mobile
> computing power).
>
> One important comment though about the main idea of the aggregation protocol:
> - in order to get a *real* benefit of the protocol sender shall
> initially send only a table containing pointers in a separate 512
> bytes packet and only then send the rest of NTB.
> - receiving side shall allocate DMA job for receiving 512 bytes first
> and after parsing it, setup a job for receiving all IP packets into
> separate data buffers. After that DMA engine will handle data without
> involving CPU.
> As a result: minimum CPU usage and all IP packets are placed into
> separate skb-s.
>
> However, I don't believe usbnet infrastructure is capable of doing
> that, but it can be done in proprietary code in usb modem. I don't
> have any info if someone actually manage to build such a system, which
> required a lot of efforts; it is much easier to do data coping with
> Intel Quad Core 3+GHz CPU with 8+GB RAM - you hardly notice any
> difference comparing to embedded systems...
If the subsequent data is packed into a single USB bulk data transfer,
then only the xhci controller has the capability to perform the required
dma transfers - since it needs arbitrary scatter-gather support.
The usbnet infrastructure would also need changing.
In practise I suspect that a data copy in the host is unlikely to
be significant for anything running USB2 speeds or 100M ethernet.
There is probably more scope for reducing cpu usage by optimising
the USB stack itself.
David
Powered by blists - more mailing lists