lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Jan 2013 10:54:20 +0100
From:	Bjørn Mork <bjorn@...k.no>
To:	Alexey ORISHKO <alexey.orishko@...ricsson.com>
Cc:	"netdev\@vger.kernel.org" <netdev@...r.kernel.org>,
	"linux-usb\@vger.kernel.org" <linux-usb@...r.kernel.org>,
	Greg Suarez <gsuarez@...thmicro.com>,
	Oliver Neukum <oneukum@...e.de>,
	Alexey Orishko <alexey.orishko@...il.com>
Subject: Re: [PATCH net 2/3] net: cdc_mbim: send ZLP after max sized NTBs

Alexey ORISHKO <alexey.orishko@...ricsson.com> writes:

> If you add ZLP for NTBs of dwNtbOutMaxSize, you are heavily affecting CPU
> load, increasing interrupt load by factor of 2 in high load traffic
> scenario and possibly decreasing throughput for all other devices
> which behaves correctly. 

Hello Alexey,

Still trying to understand the mechanisms involved here. I must
apologize for my lack of knowledge of the hardware restrictions
involved.

The current cdc_ncm/cdc_mbim drivers will pad a NTB to the full
dwNtbOutMaxSize whenever it reaches at least 512 bytes.  The reason is
that this allows more efficient device DMA operation.  This is something
we do to adapt to device hardware restrictions even though there is no
such recommendations in the NCM/MBIM specs.  The penalty on the host and
bus should be obvious: Even with a quite small dwNtbOutMaxSize of 4096,
we end up sending 8 x 512-byte data packets instead of the 2 we could
have managed with.

Now you claim that sending 9 packets, where the last one is a zero
length packet, increaes the interrupt load by factor of 2?  How is that?

This is where my lack of hardware knowledge shows up, but I assumed
based on the device DMA argument that the device USB engine would do the
actual packet reception and assembly and DMA the complete 4096-byte data
transfer to the device memory. Won't the device USB engine also handle
the ZLP?  Won't it just collect 9 packets instead of 8 and assemble them
to the 4096-byte data transfer which is DMAed away the same way it would
if there were just 8 packets in the transfer?  The only differences
being that the ZLP variant works even if the device stack for some
reason requests more than dwNtbOutMaxSize bytes (which of course could
be claimed to be stupid, but the spec allows it), and of course that we
waste even more USB bus bandwidth.  But if USB bandwidth is a factor we
consider here then I am going to question the padding to dwNtbOutMaxSize
based on some devices having DMA restrictions making that more efficient.

Why is the device DMA restrictions not a fault, while the device ZLP
requirement is?  Both seem like reasonable device hardware/firmware
implementation imposed restrictions to me.  Something we'll just have to
accept.

Note that the problem with the ZLP on the Sierra devices is made much
worse than necessary by the padding.  Without this we would rarely have
hit the dwNtbOutMaxSize limit, and would have been able to work around
it by simply using tx_max = dwNtbOutMaxSize - 1 if 
(dwNtbOutMaxSize % wMaxPacketSize == 0).

Maybe that is what Windows does?

But we cannot do that due to other devices having issues with transfers
less than dwNtbOutMaxSize..


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ