| lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
|
Open Source and information security mailing list archives
| ||
|
Message-ID: <1323438427.20077.57.camel@zakaz.uk.xensource.com> Date: Fri, 9 Dec 2011 13:47:07 +0000 From: Ian Campbell <Ian.Campbell@...rix.com> To: Eric Dumazet <eric.dumazet@...il.com> CC: David Miller <davem@...emloft.net>, Jesse Brandeburg <jesse.brandeburg@...el.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org> Subject: Re: [PATCH 0/4] skb paged fragment destructors On Wed, 2011-12-07 at 13:35 +0000, Ian Campbell wrote: > On Tue, 2011-12-06 at 13:24 +0000, Eric Dumazet wrote: > > Le mardi 06 décembre 2011 à 11:57 +0000, Ian Campbell a écrit : > > > On Wed, 2011-11-09 at 15:01 +0000, Ian Campbell wrote: > > > > * split linear data allocation and shinfo allocation into two. I > > > > suspect this will have its own performance implications? On the > > > > positive side skb_shared_info could come from its own fixed size > > > > pool/cache which might have some benefits > > > > > > I played with this to see how it would look. Illustrative patch below. > > > > > > I figure that lots of small frames is the interesting workload for a > > > change such as this but I don't know if iperf is necessarily the best > > > benchmark for measuring that. > > > Before changing things I got: > > > iperf -c qarun -m -t 60 -u -b 10000M -l 64 > > > ------------------------------------------------------------ > > > Client connecting to qarun, UDP port 5001 > > > Sending 64 byte datagrams > > > UDP buffer size: 224 KByte (default) > > > ------------------------------------------------------------ > > > [ 3] local 10.80.225.63 port 45857 connected with 10.80.224.22 port 5001 > > > [ ID] Interval Transfer Bandwidth > > > [ 3] 0.0-60.0 sec 844 MBytes 118 Mbits/sec > > > [ 3] Sent 13820376 datagrams > > > [ 3] Server Report: > > > [ 3] 0.0-60.0 sec 844 MBytes 118 Mbits/sec 0.005 ms 0/13820375 (0%) > > > [ 3] 0.0-60.0 sec 1 datagrams received out-of-order > > > whereas with the patch: > > > # iperf -c qarun -m -t 60 -u -b 10000M -l 64 > > > ------------------------------------------------------------ > > > Client connecting to qarun, UDP port 5001 > > > Sending 64 byte datagrams > > > UDP buffer size: 224 KByte (default) > > > ------------------------------------------------------------ > > > [ 3] local 10.80.225.63 port 42504 connected with 10.80.224.22 port 5001 > > > [ ID] Interval Transfer Bandwidth > > > [ 3] 0.0-60.0 sec 833 MBytes 116 Mbits/sec > > > [ 3] Sent 13645857 datagrams > > > [ 3] Server Report: > > > [ 3] 0.0-60.0 sec 833 MBytes 116 Mbits/sec 0.005 ms 0/13645856 (0%) > > > [ 3] 0.0-60.0 sec 1 datagrams received out-of-order > > > > > > With 1200 byte datagrams I get basically identical throughput. > > > > > > (nb: none of the skb destructor stuff was present in either case) > > > > Sorry, but the real problem is that if skb producer and consumer are not > > on same CPU, each skb will now hit SLUB slowpath three times instead of > > two. > > > > Some workloads are : One cpu fully handling IRQ from device, dispatching > > skbs to consumers on other cpus. > > So something like a multithreaded apache benchmark would be interesting? > > > Plus skb->truesize is wrong after your patch. > > Not sure if cloning is correct either... > > Me neither, the patch was just one which happened to run well enough to > run some numbers on. I'll be sure to double-check/correct that stuff if > I persist with the approach. > > > Anyway, do we _really_ need 16 frags per skb, I dont know.... > > MAX_SKB_FRAGS is: > /* To allow 64K frame to be packed as single skb without frag_list. Since > * GRO uses frags we allocate at least 16 regardless of page size. > */ > #if (65536/PAGE_SIZE + 2) < 16 > #define MAX_SKB_FRAGS 16UL > #else > #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) > #endif > > So I think actually it turns out to be 18 on systems with 4k pages. If > we reduced that to be 16 that would save 48 bytes on amd64 which pulls > the shinfo size down to 440 and then > > NET_SKB_PAD (64) + 1500 + 14 + 440 = 2018 > > which does fit in half a page. > > Using 16 still means that a 64k frame fits precisely in frags on a 4096 > page size system. I don't know what the extra 2 was for (it predates > git), perhaps just to allow for some slop due to misalignment in the > first page of data? Tracked it down in TGLX's historic tree as: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=80223d5186f73bf42a7e260c66c9cb9f7d8ec9cf Having found the title I did a bit of searching around for discussion but all I could find was other folks not having any idea where the + 2 comes from either... http://kerneltrap.org/mailarchive/linux-netdev/2008/4/21/1529914 sort of wonders if the + 2 might be including a head which crosses a page boundary. I'm not sure it is sane/useful to account for that in MAX_SKB_FRAGS. If we had a concept like MAX_SKB_PAGES then it would perhaps make sense to have + 2 there, but AFAICT drivers etc are already accounting for this appropriately by adding a further + 2 (or sometimes + 1) to MAX_SKB_FRAGS. Ian. > > > This gives problems when/if skb must be linearized and we hit > > PAGE_ALLOC_COSTLY_ORDER > > > > Alternatively, we could use order-1 or order-2 pages on x86 to get > > 8192/16384 bytes frags. (fallback to order-0 pages in case of allocation > > failures) > > Or fallback to separate allocation of shinfo? > > Ian. > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@...r.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists