[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1323438427.20077.57.camel@zakaz.uk.xensource.com>
Date: Fri, 9 Dec 2011 13:47:07 +0000
From: Ian Campbell <Ian.Campbell@...rix.com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: David Miller <davem@...emloft.net>,
Jesse Brandeburg <jesse.brandeburg@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH 0/4] skb paged fragment destructors
On Wed, 2011-12-07 at 13:35 +0000, Ian Campbell wrote:
> On Tue, 2011-12-06 at 13:24 +0000, Eric Dumazet wrote:
> > Le mardi 06 décembre 2011 à 11:57 +0000, Ian Campbell a écrit :
> > > On Wed, 2011-11-09 at 15:01 +0000, Ian Campbell wrote:
> > > > * split linear data allocation and shinfo allocation into two. I
> > > > suspect this will have its own performance implications? On the
> > > > positive side skb_shared_info could come from its own fixed size
> > > > pool/cache which might have some benefits
> > >
> > > I played with this to see how it would look. Illustrative patch below.
> > >
> > > I figure that lots of small frames is the interesting workload for a
> > > change such as this but I don't know if iperf is necessarily the best
> > > benchmark for measuring that.
> > > Before changing things I got:
> > > iperf -c qarun -m -t 60 -u -b 10000M -l 64
> > > ------------------------------------------------------------
> > > Client connecting to qarun, UDP port 5001
> > > Sending 64 byte datagrams
> > > UDP buffer size: 224 KByte (default)
> > > ------------------------------------------------------------
> > > [ 3] local 10.80.225.63 port 45857 connected with 10.80.224.22 port 5001
> > > [ ID] Interval Transfer Bandwidth
> > > [ 3] 0.0-60.0 sec 844 MBytes 118 Mbits/sec
> > > [ 3] Sent 13820376 datagrams
> > > [ 3] Server Report:
> > > [ 3] 0.0-60.0 sec 844 MBytes 118 Mbits/sec 0.005 ms 0/13820375 (0%)
> > > [ 3] 0.0-60.0 sec 1 datagrams received out-of-order
> > > whereas with the patch:
> > > # iperf -c qarun -m -t 60 -u -b 10000M -l 64
> > > ------------------------------------------------------------
> > > Client connecting to qarun, UDP port 5001
> > > Sending 64 byte datagrams
> > > UDP buffer size: 224 KByte (default)
> > > ------------------------------------------------------------
> > > [ 3] local 10.80.225.63 port 42504 connected with 10.80.224.22 port 5001
> > > [ ID] Interval Transfer Bandwidth
> > > [ 3] 0.0-60.0 sec 833 MBytes 116 Mbits/sec
> > > [ 3] Sent 13645857 datagrams
> > > [ 3] Server Report:
> > > [ 3] 0.0-60.0 sec 833 MBytes 116 Mbits/sec 0.005 ms 0/13645856 (0%)
> > > [ 3] 0.0-60.0 sec 1 datagrams received out-of-order
> > >
> > > With 1200 byte datagrams I get basically identical throughput.
> > >
> > > (nb: none of the skb destructor stuff was present in either case)
> >
> > Sorry, but the real problem is that if skb producer and consumer are not
> > on same CPU, each skb will now hit SLUB slowpath three times instead of
> > two.
> >
> > Some workloads are : One cpu fully handling IRQ from device, dispatching
> > skbs to consumers on other cpus.
>
> So something like a multithreaded apache benchmark would be interesting?
>
> > Plus skb->truesize is wrong after your patch.
> > Not sure if cloning is correct either...
>
> Me neither, the patch was just one which happened to run well enough to
> run some numbers on. I'll be sure to double-check/correct that stuff if
> I persist with the approach.
>
> > Anyway, do we _really_ need 16 frags per skb, I dont know....
>
> MAX_SKB_FRAGS is:
> /* To allow 64K frame to be packed as single skb without frag_list. Since
> * GRO uses frags we allocate at least 16 regardless of page size.
> */
> #if (65536/PAGE_SIZE + 2) < 16
> #define MAX_SKB_FRAGS 16UL
> #else
> #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2)
> #endif
>
> So I think actually it turns out to be 18 on systems with 4k pages. If
> we reduced that to be 16 that would save 48 bytes on amd64 which pulls
> the shinfo size down to 440 and then
>
> NET_SKB_PAD (64) + 1500 + 14 + 440 = 2018
>
> which does fit in half a page.
>
> Using 16 still means that a 64k frame fits precisely in frags on a 4096
> page size system. I don't know what the extra 2 was for (it predates
> git), perhaps just to allow for some slop due to misalignment in the
> first page of data?
Tracked it down in TGLX's historic tree as:
http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commitdiff;h=80223d5186f73bf42a7e260c66c9cb9f7d8ec9cf
Having found the title I did a bit of searching around for discussion
but all I could find was other folks not having any idea where the + 2
comes from either...
http://kerneltrap.org/mailarchive/linux-netdev/2008/4/21/1529914 sort of
wonders if the + 2 might be including a head which crosses a page
boundary. I'm not sure it is sane/useful to account for that in
MAX_SKB_FRAGS. If we had a concept like MAX_SKB_PAGES then it would
perhaps make sense to have + 2 there, but AFAICT drivers etc are already
accounting for this appropriately by adding a further + 2 (or sometimes
+ 1) to MAX_SKB_FRAGS.
Ian.
>
> > This gives problems when/if skb must be linearized and we hit
> > PAGE_ALLOC_COSTLY_ORDER
> >
> > Alternatively, we could use order-1 or order-2 pages on x86 to get
> > 8192/16384 bytes frags. (fallback to order-0 pages in case of allocation
> > failures)
>
> Or fallback to separate allocation of shinfo?
>
> Ian.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists