netdev - Re: [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141127130057.5403429c@redhat.com>
Date:	Thu, 27 Nov 2014 13:00:57 +0100
From:	Jesper Dangaard Brouer <brouer@...hat.com>
To:	Alexander Duyck <alexander.h.duyck@...hat.com>
Cc:	netdev@...r.kernel.org, davem@...emloft.net,
	jeffrey.t.kirsher@...el.com, eric.dumazet@...il.com,
	ast@...mgrid.com, brouer@...hat.com
Subject: Re: [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool

On Wed, 26 Nov 2014 16:05:50 -0800
Alexander Duyck <alexander.h.duyck@...hat.com> wrote:

> This patch series implements a means of allocating page fragments without
> the need for the local_irq_save/restore in __netdev_alloc_frag.  By doing
> this I am able to decrease packet processing time by 11ns per packet in my
> test environment.

This is really good work!

I've tested the patchset (detail see below).  Two different packet
sizes 64bytes and 272bytes, due to "copy-break" point in driver.

Notice, these tests are single flow, resulting in single CPU getting
activated on receiver.

If I drop packets very early in iptables "raw" table, I see an
improvement 10.51 ns to 13.22 ns (for 272bytes between 9.64 ns to 11.97
ns).  Which corrospond with Alex'es observations.

A little surprising, when doing full forwarding (IP-routing), I see a
much larger "nanosec" improvement, for 64bytes of between 47.64ns to
58.15ns (for 272bytes between 29.08ns to 30.14ns).  This improvement is
larger than I expected.  One pitfall is with full forwarding, we can
only forwards approx 1Mpps (single CPU), and the accuracy between tests
runs vary more.

Setup
-----
Generator: ixgbe, pktgen (3x CPUs), sending 10G wirespeed
 - Single flow pktgen, resulting in single CPU activation on target
 - pkt@...ytes:  tx:14900856 pps (wirespeed)
 - pkt@...bytes: tx: 4228696 pps (wirespeed)

Ethernet wirespeed:
 * (1/((64+20)*8))*(10*10^9)  = 14880952
 * (1/((272+20)*8))*(10*10^9) =  4280822

Receiver CPU E5-2695 running state-c0@...GHz

baseline
--------

Baseline: Full forwarding (no-netfilter):

 * pkt@...ytes: tx:977414 pps
 * pkt@...ytes: tx:974404 pps
 * test-variation@...ytes: 3010pps (1/977414*10^9)-(1/974404*10^9) = -3.16ns

 * pkt@...bytes: tx:911657 pps
 * pkt@...bytes: tx:906229 pps
 * test-variation@...bytes: 5428pps -6.57ns

Baseline: Drop in iptables RAW:

 * pkt@...ytes: rx:2801058 pps
 * pkt@...ytes: rx:2785579 pps
 * test-variation@...ytes: 15479pps -1.98 ns

 * pkt@...bytes: rx:2559718 pps
 * pkt@...bytes: rx:2544577 pps
 * test-variation@...ytes diff: 6230pps 0.746ns

With patch: alex'es napi_alloc_skb
----------------------------------

Full forwarding (no-netfilter) (pkt@...ytes):

 * pkt@...ytes: tx:1025150 pps
 * pkt@...ytes: tx:1032930 pps
 * test-variation@...ytes: -7780pps 7.34ns
 * Patchset improvements@...fwd:
 - 977414 -> 1025150 = 47736pps -> 47.64ns
 - 974404 -> 1032930 = 58526pps -> 58.15ns

 * pkt@...bytes: tx:937416 pps
 * pkt@...bytes: tx:930761 pps
 * test-variation@...bytes: 6655pps -7.62ns
 * Patchset improvements@...-fwd:
  - 911657 -> 937416 = 25759pps -> 30.14ns
  - 906229 -> 930761 = 24532pps -> 29.08ns

Drop in iptables RAW (pkt@...ytes):

 * pkt@...ytes: rx:2885820 pps
 * pkt@...ytes: rx:2892050 pps
 * test-variation@...ytes diff: 6230pps 0.746ns
 * Patchset improvements@...drop:
  - 2800896 -> 2885820 =  84924pps -> 10.51 ns
  - 2785579 -> 2892050 = 106471pps -> 13.22 ns

 * pkt@...bytes: rx:2624484 pps
 * pkt@...bytes: rx:2624492 pps
 * test-variation: pkt@...bytes diff: 8pps 0ns
 * Patchset improvements@...-drop:
  - 2624484 -> 2559718 = 64766 pps ->  9.64 ns
  - 2624492 -> 2544577 = 79915 pps -> 11.97 ns


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html