[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090817095302.0c41ef68@jbarnes-g45>
Date: Mon, 17 Aug 2009 09:53:02 -0700
From: Jesse Barnes <jbarnes@...tuousgeek.org>
To: Bill Fink <billfink@...dspring.com>
Cc: "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
Neil Horman <nhorman@...driver.com>,
Andrew Gallatin <gallatin@...i.com>,
Brice Goglin <Brice.Goglin@...ia.fr>,
Linux Network Developers <netdev@...r.kernel.org>,
Yinghai Lu <yhlu.kernel@...il.com>
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA
On Fri, 14 Aug 2009 16:31:55 -0400
Bill Fink <billfink@...dspring.com> wrote:
> On Wed, 12 Aug 2009, Bill Fink wrote:
>
> > On Tue, 11 Aug 2009, Brandeburg, Jesse wrote:
> >
> > > Bill Fink wrote:
> > > > On Sat, 8 Aug 2009, Neil Horman wrote:
> > > >
> > > >> On Sat, Aug 08, 2009 at 02:21:36PM -0400, Andrew Gallatin
> > > >> wrote:
> > > >>> Neil Horman wrote:
> > > >>>> On Sat, Aug 08, 2009 at 07:08:20AM -0400, Andrew Gallatin
> > > >>>> wrote:
> > > >>>>> Bill Fink wrote:
> > > >>>>>> On Fri, 07 Aug 2009, Andrew Gallatin wrote:
> > > >>>>>>
> > > >>>>>>> Bill Fink wrote:
> > > >>>>>>>
> > > >>>>>>>> All sysfs local_cpus values are the same
> > > >>>>>>>> (00000000,000000ff), so yes they are also wrong.
> > >
> > > bill, I recently helped Jesse Barnes push a patch that addresses
> > > this kind of issue on CoreI7, the root cause was the numa_node
> > > variable was initialized based on slot on AMD systems, but needed
> > > to be set to -1 by default on systems with a uniform IOH to slot
> > > architecture.
> > >
> > > here is the commit ID:
> > > http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=3c38
> > > d674be519109696746192943a6d524019f7f
> > >
> > > I'm not sure it is in linus' tree yet, this link is to net-next
> > >
> > > Maybe see if it helps?
> >
> > It's worth a shot.
> >
> > Hopefully I can get a chance to build a new kernel tomorrow to check
> > out some of the suggestions, like this one, the setting of
> > ACPI_DEBUG, and the new ftrace module for checking NUMA affinity of
> > skbs.
>
> I applied this patch to my 2.6.29.6 kernel (from Fedora 11).
>
> Now when I do:
>
> find /sys -name numa_node -exec grep . {} /dev/null \;
>
> the numa_node for _all_ PCI devices is -1.
Yeah, that sounds right (indicates they're not really tied to a
specific node).
> When I do:
>
> find /sys -name local_cpus -exec grep . {} /dev/null \;
>
> I find that local_cpus is always 00000000,00000000.
>
> Is that OK or should it be 00000000,000000ff (for my dual quad-core
> Xeon 5580 system with no hyperthreading)?
Hm, yeah it probably should have the full CPU mask...
> Also, is it just not possible on this type of Intel Xeon system to
> properly associate the PCI devices with the nearest NUMA node?
All the PCI devices hang off the root complex, which is the same
distance to each node of memory (at least that's my understanding for
current platforms).
> In any event, the patch didn't help (or hurt). The transmit
> performance remained at ~100 Gbps while the receive performance
> remained at 55 Gbps.
Maybe the other Jesse has some ideas here.
--
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists