[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4896FFFE.7080400@sgi.com>
Date: Mon, 04 Aug 2008 06:11:26 -0700
From: Stephen Champion <schamp@....com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
CC: Robin Holt <holt@....com>, linux-kernel@...r.kernel.org,
Pavel Emelyanov <xemul@...nvz.org>,
Oleg Nesterov <oleg@...sign.ru>,
Sukadev Bhattiprolu <sukadev@...ibm.com>,
Paul Menage <menage@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [Patch] Scale pidhash_shift/pidhash_size up based on num_possible_cpus().
Eric W. Biederman wrote:
> Robin Holt <holt@....com> writes:
>> Oops, confusing details. That was a different problem we had been
>> tracking.
>
> Which leads back to the original question. What were you measuring
> that showed improvement with a larger pid hash size?
>
> Almost by definition a larger hash table will perform better. However
> my intuition is that we are talking about something that should be in
> the noise for most workloads.
Robin asked me to chime in on this, as I did the early "look at that"
work and suggested it to Robin.
I noticed the potential for increasing pid_shift while chasing down a
patch to our kernel (2.6.16 stable based) which had proc_pid_readdir()
calling find_pid() for init_task through the highest pid #. This patch
caused a rather serious problem on a 2048 core Altix. Before
identifying the culprit, I increased pidhash_shift. This made a *huge*
difference: enough to get the box marginally functional while I tracked
down the origins of the problem.
After backing out the problematic patch, I took a look at pidhash_shift
in normal circumstances: With pidhash_shift == 12, running only a few
common services and monitoring tools (sendmail, nagios, etc for ~28k
active processes, mostly of the kernel variety), the 20 cpu boot cpuset
we use on that system to confine normal system processes and interactive
logins was spending >1% of it's time in find_pid(), and an 'ls /proc >
/dev/null' took >0.4s. With pidhash_shift == 16, the timing went to
<0.2, and the total time spent in find_pid() was reduced to noise level.
In addition to raising the limit on larger systems, it looked reasonable
to scale the pid hash with the # processors instead of memory. While I
observed variably high process:cpu ratios on small systems (2c - 32c),
they also have relatively few processes. The 192c - 2048c systems I was
able to look at were all hovering at 13 +/- 2 processes per cpu, even
with wildly varying memory sizes.
Despite more recent changes in proc_pid_readdir, my results should apply
to current source. It looks like both the old 2.6.16 implementation and
the current version will call find_pid (or equivalent) once for each
successive getdents() call on /proc, excepting when the cursor is on the
first entry. A quick look, and we have 88 getdents64() calls both 'ps'
and 'ls /proc' with 29k processes running, which appears to be the
primary source of calls.
It's not giganormous, although I probably could come up with a pointless
microbenchmark to show it's 300% better. Importantly, it does
noticeably improve normal interactive tools like 'ps' and 'top', a
performance visualization tool developed by a customer (nodemon)
refreshes faster. For a 512k init allocation, that seems like a very
good deal.
I'd like to lose 20,000 kernel processes in addition to growing the pid
hash!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists