[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090921222355.631467e0@nehalam>
Date: Mon, 21 Sep 2009 22:23:55 -0700
From: Stephen Hemminger <shemminger@...tta.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Jesse Brandeburg <jesse.brandeburg@...il.com>,
Jesper Dangaard Brouer <hawk@...u.dk>, netdev@...r.kernel.org
Subject: Re: [RFC] skb align patch
On Tue, 22 Sep 2009 05:20:53 +0200
Eric Dumazet <eric.dumazet@...il.com> wrote:
> Stephen Hemminger a écrit :
> > On Mon, 21 Sep 2009 08:13:20 +0200
> > Eric Dumazet <eric.dumazet@...il.com> wrote:
> >
> >> Stephen Hemminger a écrit :
> >>> Based on the Intel suggestion that PCI-express overhead is
> >>> a significant cost.
> >>>
> >>> Would people doing performance please measure the impact of
> >>> changing SKB alignment (64 bit only).
> >> I had this idea some time ago when I hit a limit on bnx2 adapter
> >> (Giga bit link, BCM5708S), with small packets. pktgen was able
> >> to send ~500 Mbps 'only', or 700kps if I remember well.
> >> So I tried to align the pktgen build packet to a cache line,
> >> it gave no difference at all, but it was on a 32 bit kernel.
> >> (Thus my patch was for pktgen only, not a generic one as yours)
> >>
> >> Could you elaborate why this change could be useful on 64bit ?
> >>
> >
> > It is useful on all architecture where unaligned CPU access is
> > relatively cheap.
> >
> > The issue is that a unaligned DMA requires a read/modify/write
> > cache line access versus just a write access. I am not a bus
> > expert, but writes are probably more pipelined as well.
> >
>
> Oh I see, you want to optimize the rx (NIC has to do a DMA
> to write packet into host memory and this DMA could be a read
> /modify/write if address is not aligned, instead of a pure write),
> while I tried to align skb to optimize the pktgen tx
> (NIC has to do a DMA to read packet from host), and align the skb
> had no effect.
>
> Maybe we should separate the rx/tx, and try your idea only
> for skb allocated for rx.
>
> Also/Or we might try
> __builtin_prefetch (addr, 0, 0);
> to instruct cpu to commit to memory cache lines that are
> going to be modified by NIC.
Don't think it matters whether RX buffer has to read/modify/write
from cpu cache or memory on modern cache snooping architecures.
The cost is the PCI traffic.
--
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists