linux-kernel - Re: A proposal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <69304d110608041146t44077033j9a10ae6aee19a16d@mail.gmail.com>
Date:	Fri, 4 Aug 2006 20:46:05 +0200
From:	"Antonio Vargas" <windenntw@...il.com>
To:	"David Lang" <dlang@...italinsight.com>,
	"Rusty Russell" <rusty@...tcorp.com.au>,
	"Andrew Morton" <akpm@...l.org>, jeremy@...source.com,
	greg@...ah.com, zach@...are.com, linux-kernel@...r.kernel.org,
	torvalds@...l.org, hch@...radead.org, jlo@...are.com,
	xen-devel@...ts.xensource.com, simon@...source.com,
	ian.pratt@...source.com, jeremy@...p.org
Subject: Re: A proposal - binary

On 8/4/06, David Lang <dlang@...italinsight.com> wrote:
> On Fri, 4 Aug 2006, Rusty Russell wrote:
>
> > On Thu, 2006-08-03 at 22:53 -0700, Andrew Morton wrote:
> >> On Fri, 04 Aug 2006 15:04:35 +1000
> >> Rusty Russell <rusty@...tcorp.com.au> wrote:
> >>
> >>> On Thu, 2006-08-03 at 21:18 -0700, Andrew Morton wrote:
> >>> Everywhere in the kernel where we have multiple implementations we want
> >>> to select at runtime, we use an ops struct.  Why should the choice of
> >>> Xen/VMI/native/other be any different?
> >>
> >> VMI is being proposed as an appropriate way to connect Linux to Xen.  If
> >> that is true then no other glue is needed.
> >
> > Sorry, this is wrong.  VMI was proposed as the appropriate way to
> > connect Linux to Xen, *and* native, *and* VMWare's hypervisors (and
> > others).  This way one Linux binary can boot on all three, using
> > different VMI blobs.
> >
> >>> Yes, we could force native and Xen to work via VMI, but the result would
> >>> be less clear, less maintainable, and gratuitously different from
> >>> elsewhere in the kernel.
> >>
> >> I suspect others would disagree with that.  We're at the stage of needing
> >> to see code to settle this.
> >
> > Wrong again.  We've *seen* the code for VMI, and fairly hairy.  Seeing
> > the native-implementation and Xen-implementation VMI blobs will not make
> > it less hairy!
> >
> >>>  And, of course, unlike paravirt_ops where we
> >>> can change and add ops at any time, we can't similarly change the VMI
> >>> interface because it's an ABI (that's the point: the hypervisor can
> >>> provide the implementation).
> >>
> >> hm.  Dunno.  ABIs can be uprevved.  Perhaps.
> >
> > Certainly VMI can be.  But I'd prefer to leave the excellent hackers at
> > VMWare with the task of maintaining their ABI, and let Linux hackers
> > (most of whom will run native) manipulate paravirt_ops freely.
> >
> > We're not good at maintaining ABIs.  We're going to be especially bad at
> > maintaining an ABI when the 99% of us running native will never notice
> > the breakage.
>
> some questions from a user. pleas point out where I am misunderstanding things.

asking is the smart way :)

> one of the big uses of virtualization will be to run things in sandboxes, when
> people do this they typicaly migrate the sandbox from system to system over time
> (working with chroot sandboxes I've seen some HUGE skews between what's running
> in the sandbox and what's running in the host). If the interface between the
> guest kernel and the hypervisor isn't fixed how could somone run a 2.6.19 guest
> and a 2.6.30 guest at the same time?
>
> if it's only a source-level API this implies that when you move your host kernel
> from 2.6.19 to 2.6.25 you would need to recompile your 2.6.19 guest kernel to
> support the modifications. where are the patches going to come from to do this?
>
> It seems to me from reading this thread that the PowerPC and S390 have a ABI
> defined, specificly defined by the hardware in the case of PowerPC and by the
> externaly maintained, Linux-independant hypervisor (which is effectivly the
> hardware) in the case of the s390.

the trick with ppc, s390, m68k... is that they were defined since day
zero (*simplifies 68000/68010 history here*) so that when you run as
non-priviledged-task and try to execute a priviledged instruction,
then the security acts out and the OS gets control. x86 wasn't since
they had some instructions where the non-priviledged could detect it
was so, thus barring any way of the hypervisor appearing invisible.
this is solved on x86 and x64_64 with the new extensions.

> If there's going to be long-term compatability between different hosts and
> guests there need some limits to what can change.
>
> needing to uprev the host when you uprev a guest is acceptable
>
> needing to uprev a guest when you uprev a host is not.

Now, allowing this transparent acting is great since you can run your
normal kernel as-is as a guest. But to get close to 100% speed, what
you do is to rewrite parts of the OS to be aware of the hypervisor,
and stablish a common way to talk.

Thus happens the work with the paravirt-ops. Just like you can use any
filesystem under linux because they have a well-defined intrface to
the rest of the kernel, the paravirt-ops are the way we are wrking to
define an interface so that the rest of the kernel can be ignorant to
whether it's running on the bare metal or as a guest.

Then, if you needed to run say 2.6.19 with hypervisor A-1.0, you just
need to write paravirt-ops which talk and translate between 2.6.19 and
A-1.0. If 5 years later you are still running A-1.0 and want to run a
2.6.28 guest, then you would just need to write the paravirt-ops
between 2.6.28 and A-1.0, with no need to modify the rest of the code
or the hypervisor.

At the moment we only have 1 GPL hypervisor and 1 binary one. Then
maybe it's needed to define if linux should help run under binary
hypervisors, but imagine instead of this one, we had the usual Ghyper
vs Khyper separation. We would prefer to give the same adaptations to
both of them and abstract them away just like we do with filesystems.

> this basicly boils down to 'once you expose an interface to a user it can't
> change', with the interface that's being exposed being the calls that the guest
> makes to the host.

Yes, that's the reason some mentioned ppc, sparc, s390... because they
have been doing this longer than us and we could consider adopting
some of their designs (just like we did for POSIX system calls ;)

> David Lang

-- 
Greetz, Antonio Vargas aka winden of network
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/