[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1205455050.19551.16.camel@bobble.smo.corp.google.com>
Date: Thu, 13 Mar 2008 17:37:30 -0700
From: Frank Mayhar <fmayhar@...gle.com>
To: Roland McGrath <roland@...hat.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: posix-cpu-timers revamp
After the recent conversation with Roland and after more testing, I have
another patch for review (although _not_ for submission, as again it's
against 2.6.18.5). This patch breaks the shared utime/stime/sched_time
fields out into their own structure which is allocated as needed via
alloc_percpu(). This avoids cache thrashing when running lots of
threads on lots of CPUs.
Please take a look and let me know what you think. In the meantime I'll
be working on a similar patch to 2.6-head that has optimizations for
uniprocessor and two-CPU operation, to avoid the overhead of the percpu
functions when they are unneeded.
This patch:
Replaces the utime, stime and sched_time fields in signal_struct with
the shared_times structure, which is cacheline aligned and allocated
when needed using the alloc_percpu() mechanism. There is one copy of
this structure per running CPU when it is being used.
Each place that loops through all threads in a thread group to sum
task->utime and/or task->stime now use the shared_*_sum() inline
functions defined in sched.h to sum the per-CPU structures. This
includes compat_sys_times(), do_task_stat(), do_getitimer(),
sys_times() and k_getrusage().
Certain routines that used task->signal->[us]time now use the
shared_*_sum() functions instead, which may (but hopefully will not)
change their semantics slightly. These include fill_prstatus() (in
fs/binfmt_elf.c), do_task_stat() (in fs/proc/array.c),
wait_task_zombie() and do_notify_parent().
At each tick, update_cpu_clock(), account_user_time() and
account_system_time() update the relevant field of the shared_times
structure using a pointer obtained using per_cpu_ptr, with the effect
that these functions do not compete with one another for the cacheline.
Each of these functions updates the task-private field followed by the
shared_times version if one is present.
Finally, kernel/posix-cpu-timers.c has changed quite dramatically.
First, run_posix_cpu_timers() decides whether a timer has expired by
consulting the it_*_expires fields in the task struct of the running
thread and the shared_*_sum() functions that cover the entire process.
The check_process_timers() routine bases its computations on the
shared structure, removing two loops through the threads. "Rebalancing"
is no longer required, the process_timer_rebalance() routine as
disappeared entirely and the arm_timer() routine merely fills
p->signal->it_*_expires from timer->it.cpu.expires.*. The
cpu_clock_sample_group_locked() loses its summing loops, using the
the shared structure instead. Finally, set_process_cpu_timer() sets
tsk->signal->it_*_expires directly rather than calling the deleted
rebalance routine.
The only remaining open question is whether these changes break the
semantics of the status-returning routines fill_prstatus(),
do_task_stat(), wait_task_zombie() and do_notify_parent().
--
Frank Mayhar <fmayhar@...gle.com>
Google, Inc.
View attachment "posix-timers.patch" of type "text/x-patch" (26325 bytes)
Powered by blists - more mailing lists