[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200808191956.59898.nickpiggin@yahoo.com.au>
Date: Tue, 19 Aug 2008 19:56:59 +1000
From: Nick Piggin <nickpiggin@...oo.com.au>
To: Ingo Molnar <mingo@...e.hu>
Cc: Jeremy Fitzhardinge <jeremy@...p.org>,
LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
Andi Kleen <andi@...stfloor.org>,
Jens Axboe <jens.axboe@...cle.com>
Subject: Re: [PATCH 0 of 9] x86/smp function calls: convert x86 tlb flushes to use function calls [POST 2]
On Tuesday 19 August 2008 19:31, Ingo Molnar wrote:
> * Jeremy Fitzhardinge <jeremy@...p.org> wrote:
> > Ingo Molnar wrote:
> > > nice stuff!
> > >
> > > I suspect the extra cost might be worth it for two reasons: 1) we could
> > > optimize the cross-call implementation further
> >
> > Unfortunately, I think the kmalloc fix for the RCU issue is going to
> > hurt quite a lot.
>
> yeah :-(
>
> Nick, is there any way to get rid of that kmalloc() in the async
> function call case? The whole premise of the smp_function_call() rewrite
> was that it's faster - and now it's measurably slower.
Not quite. smp_call_function_single is much faster, it is now scalable,
and it is queueing (and only needs a single IPI to submit multiple requests
if the target isn't keeping up). The rewrite is meant primarily to speed up
call single (for really interesting things like block request completion
migration). Before that, it was totally useless for anything remotely
performance critical.
A secondary goal was to make smp_call_function_mask at least somewhat
scalable. smp_call_function_mask used to have to execute the entire
call and wait-for-ack-from-all under a global lock (shared by
call_function_single, mind you). Can't get a whole lot more serialised
than that. I wanted to improve this to improve vmalloc flushing
scalability. There wasn't much other performance critical stuff that
used it.
For that guy -- as I said, we could possibly look at retuning to a non
queueing implementation to avoid the kmalloc... but I'm not so hopeful
that it would bring TLB flushing to parity. And scalability would
probably suffer somewhat.
> At least we could/should perhaps standardize/generalize all the
> 'specific' IPI handlers into the smp_function_call() framework: if
> function address equals to a pre-cooked IPI entry point we could call
> that function without a kmalloc. As these are all hardwired,
> __builtin_is_constant_p() could come to the help as well. Hm?
No, it's not just the function call but also payload, list entry for
queue, scoreboard of CPUs have processed it, a lock, etc etc etc.
smp_call_function is *always* going to be heavier than a hard wired
special case, no matter how it is implemented. For such low level
performance critical functionality, I miss the days when people were
rabid about saving every cycle rather than every line of code ;)
I'm especially sore about mmap because I have a customer with a
database that uses a lot of mmap/munmap and those calls have slowed
down something like 50%(!!) from 2.6.early to 2.6.late.
Put another way: if TLB flushing were currently using
smp_call_function, I would be very happy to submit a patch to have
it use a hardwired call scheme even if it only gained 1% improvement
(in a realistic case).
Just let me reiterate that I would love anybody to make
smp_call_function go faster, or unify special case TLB flushing *if
it no longer makes sense to have*.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists