linux-kernel - Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <45EF6077.9090302@vmware.com>
Date:	Wed, 07 Mar 2007 17:01:43 -0800
From:	Daniel Arai <arai@...are.com>
To:	tglx@...utronix.de
Cc:	Zachary Amsden <zach@...are.com>,
	Virtualization Mailing List <virtualization@...ts.osdl.org>,
	john stultz <johnstul@...ibm.com>, Ingo Molnar <mingo@...e.hu>,
	akpm@...ux-foundation.org, LKML <linux-kernel@...r.kernel.org>
Subject: Re: + stupid-hack-to-make-mainline-build.patch added to -mm tree

Thomas Gleixner wrote:

> You managed to avoid the usage of other code (i.e. PIT / HPET) already,
> so why is it sooo desireable to emulate apics instead of substituting it
> by a small and sane replacement ? Just because you happen to have an
> LAPIC emulator ? That's no reason to wire yourself into the kernel code
> and make it harder to change and maintain.

There are several reasons why it's desirable to emulate the APIC.  As you 
mentioned, we already have APIC emulation, and APIC emulation isn't a huge 
bottleneck on most workloads.  Our code works, the Linux code works, and 
replacing both pieces of code with something "small and sane" isn't going to 
improve performance very much, so why bother?  Any hypervisor implementation is 
going to be a tradeoff between what's easy to implement in the hypervisor, 
what's easy to implement in the guest operating system, and what's performance 
critical.

Secondly, not all (para-)virtualized operating systems will want to use 
abstracted devices.  Some virtual operating systems will be given direct access 
to hardware devices, and will need to run the actual driver for that device and 
not some abstracted device driver.  So I don't buy your argument that every 
piece of the kernel that interacts with a paravirtualized driver should have a 
"small and sane replacement."

But more importantly, we want a kernel that can run both on native hardware and 
in a paravirtualized environment.  Linux doesn't really provide abstractions for 
  replacing the appropriate code.  We tried to hook into the source code at a 
level that seemed possible.

For example, take smp_call_function().  What this essentially does is call 
send_IPI_allbutself().

void fastcall send_IPI_self(int vector)
{
         __send_IPI_shortcut(APIC_DEST_SELF, vector);
}

void __send_IPI_shortcut(unsigned int shortcut, int vector)
{
         /*
          * Subtle. In the case of the 'never do double writes' workaround
          * we have to lock out interrupts to be safe.  As we don't care
          * of the value read we use an atomic rmw access to avoid costly
          * cli/sti.  Otherwise we use an even cheaper single atomic write
          * to the APIC.
          */
         unsigned int cfg;

         /*
          * Wait for idle.
          */
         apic_wait_icr_idle();

         /*
          * No need to touch the target chip field
          */
         cfg = __prepare_ICR(shortcut, vector);

         /*
          * Send the IPI. The write to APIC_ICR fires this off.
          */
         apic_write_around(APIC_ICR, cfg);
}

There's no good way to override __send_IPI_shortcut.  I suppose we could add 
paravirt ops for __send_IPI_shortcut and every other op that touches the APIC. 
But there are dozens of functions in apic.c that would need to be included in 
paravirt ops.  And for our implementation, we really just want to override 
apic_read and apic_write, since we can make these faster when done through 
hypercalls than through memory accesses.  If we were to make these paravirt ops, 
their implementations would be the same, except with a different apic_read and 
apic_write.  This is a whole lot of useless code duplication.

Most of the interrupt system is not written in such a way that multiple APICs 
implementations can be selected from at boot time.  This is an absolute 
requirement so that the same kernel can boot on native and in a paravirtualized 
environment.  While this could be implemented, it seems like a waste of time, 
since we can just emulate something similar to a real interrupt system and not 
change things very much.

Dan Arai
VMware, Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/