[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080428221122.GC16153@elte.hu>
Date: Tue, 29 Apr 2008 00:11:22 +0200
From: Ingo Molnar <mingo@...e.hu>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 0/2] Immediate Values - jump patching update
* H. Peter Anvin <hpa@...or.com> wrote:
>>> I still think this is the completely wrong approach.
>>
>> hm, can it result in a broken kernel? If yes, how? Or are your
>> objections more higher level?
>
> My objections are higher level, I believe the current code is (a)
> painfully complex, and I'd rather not see it in the kernel, and (b)
> the wrong thing anyway.
>
> Put a 5-byte nop in as the marker, and patch it with a call
> instruction, out of line, to a collector function.
the counter argument was that by specific sched.o analysis, this results
in slower code. The reason is that the "function call parameter
preparation" halo around that 5-byte patch site is larger than that
single conditional branch operation to an offline place of the current
function is.
i.e. the current optimized marker approach does roughly this:
[ .... fastpath head .... ]
[ immediate value instruction ] --->
[ branch instruction ] ---> these two get NOP-ed out
[ .... fastpath tail .... ]
[ ............................. ]
[ ... offline area ............ ]
[ ... parameter preparation ... ]
[ ... marker call ............. ]
your proposed 5-byte call NOP approach (which btw. was what i proposed
multiple times in the past 2 years) would do this:
[ .... fastpath head ...... ]
[ ... parameter preparation ... ]
[ .... 5-byte CALL .......... ] ---> NOP-ed out
[ .... fastpath tail .......... ]
[ ............................. ]
in the first case we have little "marker parameter/value preparation"
cost: it all happens in the 'offline area' _by GCC_. I.e. the fastpath
is relatively undisturbed.
in the latter case, all the 'parameter preparation' phase has to happen
at around the 5-byte CALL site, in the fastpath. This, in the specific,
assembly level analysis of sched.o, was shown by Matthieu to be a
pessimisation. We are better off by inserting that conditional and
letting gcc generate the call, than by forcing it in the middle of the
fastpath - even if we end up NOP-ing out the call.
wrt. complexity i agree with you - if the current optimization cannot be
made correctly we have to fall back to a simpler variant, even if it's
slower.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists