[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090609054704.GC12431@alberich.amd.com>
Date: Tue, 9 Jun 2009 07:47:04 +0200
From: Andreas Herrmann <andreas.herrmann3@....com>
To: Jesse Barnes <jbarnes@...tuousgeek.org>
CC: Yinghai Lu <yhlu.kernel@...il.com>, Ingo Molnar <mingo@...e.hu>,
"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] pci: derive nearby CPUs from device's instead of bus'
NUMA information
On Mon, May 11, 2009 at 02:54:23PM -0700, Jesse Barnes wrote:
> On Thu, 7 May 2009 10:51:36 +0200
> Andreas Herrmann <andreas.herrmann3@....com> wrote:
> > On Mon, Apr 20, 2009 at 01:03:41PM -0700, Jesse Barnes wrote:
> > > On Mon, 20 Apr 2009 10:47:47 +0200
> > > Andreas Herrmann <andreas.herrmann3@....com> wrote:
> > > > On Fri, Apr 17, 2009 at 12:26:54PM -0700, Yinghai Lu wrote:
> > > > > On Fri, Apr 17, 2009 at 9:21 AM, Ingo Molnar <mingo@...e.hu>
> > > > > wrote:
> > > > > > const struct cpumask * cpumask_of_pcidev(struct pci_dev *dev)
> > > > > > {
> > > > > > if (dev->numa_node == -1)
> > > > > > return cpumask_of_pcibus(to_pci_dev(dev)->bus);
> > > > > >
> > > > > > return cpumask_of_node(dev_to_node(dev));
> > > > > > }
> > > > > >
> > > > > > ? This would work fine in all cases.
> > > >
> > > > Yes, I think so. That's the general solution w/o additional
> > > > "ifdefing".
> > > >
> > > > > you are right, dev_to_node(dev) could return -1 on 64bit, if
> > > > > there is no memory on that node.
> > > >
> > > > Hmm, I thought just in the CONFIG_NUMA=n case -1 is returned.
> > > >
> > > > During initialization the struct device's numa_node is set to -1
> > > > and later on the information is inherited from the parent
> > > > numa_node.
> > > >
> > > > So what do I miss?
> > >
> > > I like the idea of cpumask_of_pcidev(), but it seems like
> > > cpumask_of_pcibus should return the same value. So if the node is
> > > unassigned or "equadistant" (there's code that treats -1 as both I
> > > think), cpumask_of_pcibus should figure out what the nearest CPUs
> > > are and return that, right?
> >
> > Usually this is true.
> >
> > But there is one special case.
> >
> > Northbridge functions of AMD CPUs appear to be on bus 0 device 24-31
> > (each having 4 or 5 functions depending on the CPU family).
> >
> > Requests to those devices (e.g. reading config space) are handled by
> > the processor(s) themselves and aren't routed to the PCI bus.
> > At most such requests are routed to another processor (node) if the
> > request is for a northbridge function of a different processor.
> >
> > See 9b94b3a19b13e094c10f65f24bc358f6ffe4eacd for some additional info.
> >
> > That is why I think that using cpumask_of_pcidev should have
> > precedence over cpumask_of_pcibus. (numa_node information of a PCI
> > device can be fixed up and then differ from node information of the
> > PCI bus .)
>
> So we're making the generic code more confusing to handle an AMD
> special case?
Yes.
> Are the functions you mention likely to have drivers
> that allocate memory or need cpumask_of_pcibus info?
Rarely or better say not at the moment.
> I guess there are no nice solutions given the above split of the
> device across busses (in a logical sense), so the cleanups Ingo
> suggested may be the best we can do.
Yes, I think so.
Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists