netdev - Re: [PATCH net 2/3] net: cdc_mbim: send ZLP after max sized NTBs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zk01a6gj.fsf@nemi.mork.no>
Date:	Tue, 22 Jan 2013 10:54:20 +0100
From:	Bjørn Mork <bjorn@...k.no>
To:	Alexey ORISHKO <alexey.orishko@...ricsson.com>
Cc:	"netdev\@vger.kernel.org" <netdev@...r.kernel.org>,
	"linux-usb\@vger.kernel.org" <linux-usb@...r.kernel.org>,
	Greg Suarez <gsuarez@...thmicro.com>,
	Oliver Neukum <oneukum@...e.de>,
	Alexey Orishko <alexey.orishko@...il.com>
Subject: Re: [PATCH net 2/3] net: cdc_mbim: send ZLP after max sized NTBs

Alexey ORISHKO <alexey.orishko@...ricsson.com> writes:

> If you add ZLP for NTBs of dwNtbOutMaxSize, you are heavily affecting CPU
> load, increasing interrupt load by factor of 2 in high load traffic
> scenario and possibly decreasing throughput for all other devices
> which behaves correctly. 

Hello Alexey,

Still trying to understand the mechanisms involved here. I must
apologize for my lack of knowledge of the hardware restrictions
involved.

The current cdc_ncm/cdc_mbim drivers will pad a NTB to the full
dwNtbOutMaxSize whenever it reaches at least 512 bytes.  The reason is
that this allows more efficient device DMA operation.  This is something
we do to adapt to device hardware restrictions even though there is no
such recommendations in the NCM/MBIM specs.  The penalty on the host and
bus should be obvious: Even with a quite small dwNtbOutMaxSize of 4096,
we end up sending 8 x 512-byte data packets instead of the 2 we could
have managed with.

Now you claim that sending 9 packets, where the last one is a zero
length packet, increaes the interrupt load by factor of 2?  How is that?

This is where my lack of hardware knowledge shows up, but I assumed
based on the device DMA argument that the device USB engine would do the
actual packet reception and assembly and DMA the complete 4096-byte data
transfer to the device memory. Won't the device USB engine also handle
the ZLP?  Won't it just collect 9 packets instead of 8 and assemble them
to the 4096-byte data transfer which is DMAed away the same way it would
if there were just 8 packets in the transfer?  The only differences
being that the ZLP variant works even if the device stack for some
reason requests more than dwNtbOutMaxSize bytes (which of course could
be claimed to be stupid, but the spec allows it), and of course that we
waste even more USB bus bandwidth.  But if USB bandwidth is a factor we
consider here then I am going to question the padding to dwNtbOutMaxSize
based on some devices having DMA restrictions making that more efficient.

Why is the device DMA restrictions not a fault, while the device ZLP
requirement is?  Both seem like reasonable device hardware/firmware
implementation imposed restrictions to me.  Something we'll just have to
accept.

Note that the problem with the ZLP on the Sierra devices is made much
worse than necessary by the padding.  Without this we would rarely have
hit the dwNtbOutMaxSize limit, and would have been able to work around
it by simply using tx_max = dwNtbOutMaxSize - 1 if 
(dwNtbOutMaxSize % wMaxPacketSize == 0).

Maybe that is what Windows does?

But we cannot do that due to other devices having issues with transfers
less than dwNtbOutMaxSize..

Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html