[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20060731214328.5770f1a5.akpm@osdl.org>
Date: Mon, 31 Jul 2006 21:43:28 -0700
From: Andrew Morton <akpm@...l.org>
To: Greg Banks <gnb@...bourne.sgi.com>
Cc: neilb@...e.de, nfs@...ts.sourceforge.net,
linux-kernel@...r.kernel.org
Subject: Re: [NFS] [PATCH 010 of 11] knfsd: make rpc threads pools numa
aware
On Mon, 31 Jul 2006 15:54:57 +1000
Greg Banks <gnb@...bourne.sgi.com> wrote:
> On Mon, 2006-07-31 at 14:42, Greg Banks wrote:
> > On Mon, 2006-07-31 at 14:36, Neil Brown wrote:
> > > On Sunday July 30, akpm@...l.org wrote:
> > > > On Mon, 31 Jul 2006 10:42:34 +1000
> > > > NeilBrown <neilb@...e.de> wrote:
> > > >
> > > > > +static int
> > > > > +svc_pool_map_init_percpu(struct svc_pool_map *m)
> > > > > +{
> > > > > + unsigned int maxpools = num_possible_cpus();
> > > > > + unsigned int pidx = 0;
> > > > > + unsigned int cpu;
> > > > > + int err;
> > > > > +
> > > > > +
> > > >
> > > > That isn't right - it assumes that cpu_possible_map is not sparse. If it
> > > > is sparse, we allocate undersized pools and then overindex them.
> >
> > Umm, I think Andrew's right, num_possible_cpus() should be NR_CPUS.
>
> How about this version of the patch? It replaces num_possible_cpus()
> with highest_possible_processor_id()+1 and similarly for nodes.
> --
>
> knfsd: Actually implement multiple pools. On NUMA machines, allocate
> a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single
> global pool. Enqueue sockets on the svc_pool corresponding to the CPU
> on which the socket bh is run (i.e. the NIC interrupt CPU). Threads
> have their cpu mask set to limit them to the CPUs in the svc_pool that
> owns them.
>
> This is the patch that allows an Altix to scale NFS traffic linearly
> beyond 4 CPUs and 4 NICs.
>
> Incorporates changes and feedback from Neil Brown, Trond Myklebust,
> Christoph Hellwig and Andrew Morton.
>
Something has gone rather wrong here.
> - serv = __svc_create(prog, bufsize, shutdown, /*npools*/1);
> + serv = __svc_create(prog, bufsize, shutdown, npools);
__svc_create() is:
__svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
void (*shutdown)(struct svc_serv *serv))
so heaven knows what tree you're patching.
Incremental patches really are preferred. So we can see what people are
monkeying with ;)
After fixing the rejects and cleaning a few things up, your proposed change
amounts to:
--- a/net/sunrpc/svc.c~knfsd-make-rpc-threads-pools-numa-aware-fix
+++ a/net/sunrpc/svc.c
@@ -116,7 +116,7 @@ fail:
static int
svc_pool_map_init_percpu(struct svc_pool_map *m)
{
- unsigned int maxpools = num_possible_cpus();
+ unsigned int maxpools = highest_possible_processor_id() + 1;
unsigned int pidx = 0;
unsigned int cpu;
int err;
@@ -136,6 +136,18 @@ svc_pool_map_init_percpu(struct svc_pool
return pidx;
};
+static int
+highest_possible_node_id(void)
+{
+ unsigned int node;
+ unsigned int highest = 0;
+
+ for_each_node(node)
+ highest = node;
+
+ return highest;
+}
+
/*
* Initialise the pool map for SVC_POOL_PERNODE mode.
@@ -144,7 +156,7 @@ svc_pool_map_init_percpu(struct svc_pool
static int
svc_pool_map_init_pernode(struct svc_pool_map *m)
{
- unsigned int maxpools = num_possible_nodes();
+ unsigned int maxpools = highest_possible_node_id() + 1;
unsigned int pidx = 0;
unsigned int node;
int err;
_
Which shouldn't have compiled, due to the missing forward declaration. And
I'd be surprised if it worked very well with CONFIG_NUMA=n. And it's
naughty to be sneaking general library functions into the sunrpc code
anyway.
Please,
- Write a standalone patch which adds highest_possible_node_id() to
lib/cpumask.c(?)
Make sure it's inside #ifdef CONFIG_NUMA
Remember to export it to modules.
Provide a !CONFIG_NUMA version in include/linux/nodemask.h which just
returns constant zero.
Consider doing something more efficient than the for_each_node() loop.
Although I'm not sure what that would be, given that we don't have
find_last_bit().
- Provide an incremental patch against
knfsd-make-rpc-threads-pools-numa-aware.patch which utilises
highest_possible_node_id().
A replacement patch will be grudgingly accepted, but I'll only go and
turn it into an incremental one, so you can't hide ;)
- Test it real good. Modular, non-modular, NUMA, non-NUMA, !SMP.
Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists