lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <50044F1D.6000703@hp.com> Date: Mon, 16 Jul 2012 10:27:57 -0700 From: Rick Jones <rick.jones2@...com> To: Thadeu Lima de Souza Cascardo <cascardo@...ux.vnet.ibm.com> CC: "davem@...emloft.net" <davem@...emloft.net>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "yevgenyp@...lanox.co.il" <yevgenyp@...lanox.co.il>, "ogerlitz@...lanox.com" <ogerlitz@...lanox.com>, "amirv@...lanox.com" <amirv@...lanox.com>, "brking@...ux.vnet.ibm.com" <brking@...ux.vnet.ibm.com>, "leitao@...ux.vnet.ibm.com" <leitao@...ux.vnet.ibm.com>, "klebers@...ux.vnet.ibm.com" <klebers@...ux.vnet.ibm.com> Subject: Re: [PATCH] mlx4_en: map entire pages to increase throughput On 07/16/2012 10:01 AM, Thadeu Lima de Souza Cascardo wrote: > In its receive path, mlx4_en driver maps each page chunk that it pushes > to the hardware and unmaps it when pushing it up the stack. This limits > throughput to about 3Gbps on a Power7 8-core machine. That seems rather extraordinarily low - Power7 is supposed to be a rather high performance CPU. The last time I noticed O(3Gbit/s) on 10G for bulk transfer was before the advent of LRO/GRO - that was in the x86 space though. Is mapping really that expensive with Power7? > One solution is to map the entire allocated page at once. However, this > requires that we keep track of every page fragment we give to a > descriptor. We also need to work with the discipline that all fragments will > be released (in the sense that it will not be reused by the driver > anymore) in the order they are allocated to the driver. > > This requires that we don't reuse any fragments, every single one of > them must be reallocated. We do that by releasing all the fragments that > are processed and only after finished processing the descriptors, we > start the refill. > > We also must somehow guarantee that we either refill all fragments in a > descriptor or none at all, without resorting to giving up a page > fragment that we would have already given. Otherwise, we would break the > discipline of only releasing the fragments in the order they were > allocated. > > This has passed page allocation fault injections (restricted to the > driver by using required-start and required-end) and device hotplug > while 16 TCP streams were able to deliver more than 9Gbps. What is the effect on packet-per-second performance? (eg aggregate, burst-mode netperf TCP_RR with TCP_NODELAY set or perhaps UDP_RR) rick jones -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists