linux-kernel - Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops callsites to make them patchable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45FAED2B.8070403@goop.org>
Date:	Fri, 16 Mar 2007 12:16:59 -0700
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Ingo Molnar <mingo@...e.hu>
CC:	David Miller <davem@...emloft.net>, ak@....de,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	virtualization@...ts.osdl.org, xen-devel@...ts.xensource.com,
	chrisw@...s-sol.org, zach@...are.com, rusty@...tcorp.com.au,
	anthony@...emonkey.ws, torvalds@...ux-foundation.org
Subject: Re: [patch 13/26] Xen-paravirt_ops: Consistently wrap paravirt ops
 callsites to make them patchable

Ingo Molnar wrote:

> * David Miller <davem@...emloft.net> wrote:

> > Perhaps the problem can be dealt with using ELF relocations.
> > 
> > There is another case, discussed yesterday on netdev, where run-time 
> > resolution of ELF relocations would be useful (for 
> > very-very-very-read-only variables) so if it can solve this problem 
> > too it would be nice to have a generic infrastructure for it.
>   

> yeah, and i really think this is very fundamental: [...]

I think what Dave is suggesting is that we use the reloc information the
compiler generates to find the patchable callsites rather than have
special wrappers.  This is an interesting idea.

> Limited, instruction-level patching like alternatives.h is fine because 
> that makes it easier to support multiple, incompatible CPU 
> architectures, without having to do a hugely intrusive split at the 
> kernel RPM level.
>
> but the level of 'binary patching' done by the paravirt and Xen goes way 
> beyond that,

Not really.  There are only three cases:

   1. replace an indirect call with a direct call
   2. nop out a callsite
   3. patch in a short inline sequence

And as I pointed out, this is used by all pv_op backends, using a common
piece of code to implement at least 1 and 2.  3 could be implemented
semi-generically by using rules like "if (func == native_sti) {
patch("sti"); }", which would cover many cases where a hypervisor
doesn't need any special handling for a particular operation.

The goal is to eliminate the cost of the indirect calls with nice
predictable indirect calls.  There's a 1 byte/callsite overhead, but I
don't think that's a horrible overhead.

And, at worst, its only a little more complex than the kinds of
transformations.

Ideally, its a mechanism which could be used elsewhere.  It applies with
you have some kind of ops_vector table which is updated once (or perhaps
very rarely), and you don't want to wear the overhead of indirect calls
everywhere.

>  and the changes here really underscore that we:
>
>   _should not emulate the closed source world_
>
> There the only solution is to binary-patch - because they have no source 
> code. But here, we've got all the source code.
>   

I don't think this is a relevant comparison.  This is purely a matter of
optimising out unnecessary indirect calls.

> nobody wants to boot a xen-paravirt kernel from a floppy, so image size 
> is not an issue. In-RAM overhead would in fact be /reduced/, because 
> currently all the paravirt overhead hits both the native and the 
> paravirt kernel. Nor would /all/ of the vmlinuz have to be replicated in 
> the images - it's enough to replicate only those functions that truly 
> differ between the two build methods.

One of the explicit goals of pv_ops was to allow a single kernel to
either boot on native hardware or under any one of the supported
hypervisors, explicitly to avoid having to manage multiple kernel
images.  Compiling the kernel N+1 times for N hypervisors, and then
bundling them up in some kind of multi-image format doesn't seem like a
particularly good tradeoff.  The kernel RPM on my machine here is
already ~50Mbytes; expanding that to 250Mbytes to support native, Xen,
vmi, lguest and kvm doesn't seem reasonable.

    J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/