[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45FF4043.4000805@vmware.com>
Date: Mon, 19 Mar 2007 18:00:35 -0800
From: Zachary Amsden <zach@...are.com>
To: Rusty Russell <rusty@...tcorp.com.au>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Andi Kleen <ak@....de>, David Miller <davem@...emloft.net>,
jeremy@...p.org, mingo@...e.hu, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, virtualization@...ts.osdl.org,
xen-devel@...ts.xensource.com, chrisw@...s-sol.org,
anthony@...emonkey.ws, netdev@...r.kernel.org
Subject: Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops
callsites to make them patchable
Rusty Russell wrote:
> On Mon, 2007-03-19 at 11:38 -0700, Linus Torvalds wrote:
>
>> On Mon, 19 Mar 2007, Eric W. Biederman wrote:
>>
>>> True. You can use all of the call clobbered registers.
>>>
>> Quite often, the biggest single win of inlining is not so much the code
>> size (although if done right, that will be smaller too), but the fact that
>> inlining DOES NOT CLOBBER AS MANY REGISTERS!
>>
For VMI, the default clobber was "cc", and you need a way to allow at
least that, because saving and restoring flags is too expensive on x86.
> Thanks Linus.
>
> *This* was the reason that the current hand-coded calls only clobber %
> eax. It was a compromise between native (no clobbers) and others (might
> need a reg).
>
I still don't think this was a good trade. The primary motivation for
clobbering %eax was that Xen wanted a free register to use for computing
the offset into the shared data in the case of SMP preemptible kernels.
Xen no longer needs such a register, they can use the PDA offset
instead. And it does hurt native performance by unconditionally
stealing a register in the four most commonly invoked paravirt-ops code
sequences.
> Now, since we decided to allow paravirt_ops operations to be normal C
> (ie. the patching is optional and done late), we actually push and pop %
> ecx and %edx. This makes the call site 10 bytes long, which is a nice
> size for patching anyway (enough for a movl $0, <addr>, a-la lguest's
> cli, or movw $0, %gs:<addr> if we supported SMP).
>
You can do it in 11 bytes with no clobbers and normal C semantics by
linking to a direct address instead of calling to an indirect, but then
you need some gross fixup technology in paravirt_patch:
if (call_addr == (void*)native_sti) {
...
}
I think we should probably try to do it in 12 bytes. Freeing eax to the
inline caller is likely to make up the 2 bytes of space more we have to nop.
One thing I always tried to get in VMI was to encapsulate the actual
code which went through the business of computing arguments that were
not even used in the native case. Unfortunately, that seems impossible
in the current design, but I don't think it is an issue because I don't
think there is actually a way to express:
SWITCHABLE_CODE_BLOCK_BEGIN {
/* arbitrary C code for native */
} SWITCHABLE_CODE_BLOCK_ALTERNATIVE {
/* arbitrary C code for something else */
}
Dave's linker suggestion is probably the best for things like that.
Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists