[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <499AEA72.9090702@redhat.com>
Date: Tue, 17 Feb 2009 11:48:50 -0500
From: Masami Hiramatsu <mhiramat@...hat.com>
To: Nick Piggin <npiggin@...e.de>, Steven Rostedt <rostedt@...dmis.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
Peter Zijlstra <peterz@...radead.org>,
akpm <akpm@...ux-foundation.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>,
Ananth N Mavinakayanahalli <ananth@...ibm.com>,
Jim Keniston <jkenisto@...ibm.com>
Subject: Re: irq-disabled vs vmap vs text_poke
Nick Piggin wrote:
> On Mon, Feb 16, 2009 at 09:00:35PM -0500, Masami Hiramatsu wrote:
>> Mathieu Desnoyers wrote:
>>> * Nick Piggin (npiggin@...e.de) wrote:
>>>> On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
>>>>>>>>>> BTW, what about using map_vm_area() in text_poke() instead of
>>>>>>>>>> vmap()?
>>>>>>>>>> Since text_poke() just maps text pages to alias pages temporarily,
>>>>>>>>>> I think we don't need to use delayed vunmap().
>>>> [...]
>>>>
>>>>> Here is the patch which replace v(un)map with (un)map_vm_area.
>>>> I don't quite understand the point of this... delayed vunmap() is
>>>> just an implementation detail of vmap subsystem. Callers should not
>>>> have to care.
>>>>
>>> AFAIK, map_vm_area/unmap_vm_area is faster than vmap/vunmap. This is
>>> the point of this patch. Masami, could you provide a quick benchmark of
>>> text_poke()/seconds before and after this optimization is applied to
>>> confirm this ?
>> Sure, here is the result of calling text_poke() 2^14 times.
>>
>> <Without this patch>
>> Total: 3634133356(cycles), 221809(cycles/text_poke)
>> Total: 3699532690(cycles), 225801(cycles/text_poke)
>> Total: 3249855588(cycles), 198355(cycles/text_poke)
>>
>> <With this patch>
>> Total: 483467579(cycles), 29508(cycles/text_poke)
>> Total: 497441301(cycles), 30361(cycles/text_poke)
>> Total: 497604548(cycles), 30371(cycles/text_poke)
>
> Hmm, on bigger SMP systems, I think the global TLB flush required
> for unmap_kernel_range will reverse these numbers.
Sure, that's possible. unfortunately, I don't have that bigger machine...
It's just the result on 4-core smp machine.
>> BTW, this is not only for performance, but also simplicity and its need.
>> Vmap may allocate new vm_area. However, since text_poke() just needs to
>> map pages temporarily (yeah, very short time), we don't want to call
>> kmalloc or any other memory allocators.
>> And since text_poke() makes WRITABLE aliases of READ-ONLY pages, we
>> want to purge these pages ASAP.
>> So, I think just reserving a small vm_area for text_poke() and
>> reusing it is enough.
>
> It is not a bad idea, but I don't think it quite goes far enough.
> IMO we should reserve 2 pages of virtual memory for each CPU, and
> then do the mapping/unmapping without locking, and with another
> variant of unmap_kernel_range that does not do the global TLB
> flush.
>
> Unless performance doesn't really matter much, in which case, I
> guess your patch is nice because it avoids doing the allocations.
Thanks, I think text_poke() doesn't need high performance currently,
because it's not called so frequently, nor from the normal operation.
However, Would dynamic ftrace need performance?
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division
e-mail: mhiramat@...hat.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists