[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080321071846.1B22B26F9A7@magilla.localdomain>
Date: Fri, 21 Mar 2008 00:18:46 -0700 (PDT)
From: Roland McGrath <roland@...hat.com>
To: Frank Mayhar <fmayhar@...gle.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: posix-cpu-timers revamp
Sorry for the delay.
> Please take a look and let me know what you think. In the meantime I'll
> be working on a similar patch to 2.6-head that has optimizations for
> uniprocessor and two-CPU operation, to avoid the overhead of the percpu
> functions when they are unneeded.
My mention of a 2-CPU special case was just an off-hand idea. I don't
really have any idea if that would be optimal given the tradeoff of
increaing signal_struct size. The performance needs be analyzed.
> disappeared entirely and the arm_timer() routine merely fills
> p->signal->it_*_expires from timer->it.cpu.expires.*. The
> cpu_clock_sample_group_locked() loses its summing loops, using the
> the shared structure instead. Finally, set_process_cpu_timer() sets
> tsk->signal->it_*_expires directly rather than calling the deleted
> rebalance routine.
I think I misled you about the use of the it_*_expires fields, sorry.
The task_struct.it_*_expires fields are used solely as a cache of the
head of cpu_timers[]. Despite the poor choice of the same name, the
signal_struct.it_*_expires fields serve a different purpose. For an
analogous cache of the soonest timer to expire, you need to add new
fields. The signal_struct.it_{prof,virt}_{expires,incr} fields hold
the setitimer settings for ITIMER_{PROF,VTALRM}. You can't change
those in arm_timer. For a quick cache you need a new field that is
the sooner of it_foo_expires or the head cpu_timers[foo] expiry time.
The shared_utime_sum et al names are somewhat oblique to anyone who
hasn't just been hacking on exactly this thing like you and I have.
Things like thread_group_*time make more sense.
There are now several places where you call both shared_utime_sum and
shared_stime_sum. It looks simple because they're nicely encapsulated.
But now you have two loops through all CPUs, and three loops in
check_process_timers.
I think what we want instead is this:
struct task_cputime
{
cputime_t utime;
cputime_t stime;
unsigned long long schedtime;
};
Use one in task_struct to replace the utime, stime, and sum_sched_runtime
fields, and another to replace it_*_expires. Use a single inline function
thread_group_cputime() that fills a sum struct task_cputime using a single
loop. For the places only one or two of the sums is actually used, the
compiler should optimize away the extra summing from the loop.
Don't use __cacheline_aligned on this struct type itself, because most of
the uses don't need that. When using alloc_percpu, you can rely on it to
take care of those needs--that's what it's for. If you implement a
variant that uses a flat array, you can use a wrapper struct with
__cacheline_aligned for that.
Thanks,
Roland
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists