linux-kernel - Re: [PATCH] Add I/O hypercalls for i386 paravirt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46CC9FF2.3040406@vmware.com>
Date:	Wed, 22 Aug 2007 13:43:30 -0700
From:	Zachary Amsden <zach@...are.com>
To:	Andi Kleen <ak@...e.de>
CC:	Andrew Morton <akpm@...l.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Virtualization Mailing List <virtualization@...ts.osdl.org>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Chris Wright <chrisw@...s-sol.org>,
	Avi Kivity <avi@...ranet.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
>
> How is that measured? In a loop? In the same pipeline state?
>
> It seems a little dubious to me.
>   

I did the experiments in a controlled environment, with interrupts 
disabled and care to get the pipeline in the same state.  It was a 
perfectly repeatable experiment.  I don't have exact cycle time anymore, 
but they were the tightest measurements I've even seen on cycle counts 
because of the unique nature of serializing the processor for the fault 
/ privilege transition.  I tested a variety of different conditions, 
including different types of #GP (yes, the cost does vary), #NP, #PF, 
sysenter, int $0xxx.  Sysenter was the fastest, by far.  Int was about 
5x the cost.  #GP and friends were all about similar costs.  #PF was the 
most expensive.


>   
>>>> to verify protection in the page tables mapping the page allows 
>>>> execution (P, !NX, and U/S check).  This is a lot more expensive than a 
>>>>    
>>>>         
>>> When the page is not executable or not present you get #PF not #GP. 
>>> So the hardware already checks that.
>>>
>>> The only case where you would need to check yourself is if you emulate
>>> NX on non NX capable hardware, but I can't see you doing that.
>>>  
>>>       
>> No, it doesn't.  Between the #GP and decode, you have an SMP race where 
>> another processor can rewrite the instruction.
>>     
>
> That can be ignored imho. If the page goes away you'll notice
> when you handle the page fault on read. If it becomes NX then the execution
> just happened to be logically a little earlier.
>
>   

No, you can't ignore it.  The page protections won't change between the 
GP and the decoder execution, but the instruction can, causing you to 
decode into the next page where the processor would not have.  !P 
becomes obvious, but failure to respect NX or U/S is an exploitable 
bug.  Put a 1 byte instruction at the end of a page crossing into a NX 
(or supervisor page).  Remotely, change keep switching between the 
instruction and a segment override.

Result: user executes instruction on supervisor code page, learning data 
as a result of this; code on NX page gets executed.

> Or easier to just write a backend for the lguest virtio drivers,
> that will be likely faster in the end anyways than this gross
> hack.
>   

We already have drivers for all of our hardware in Linux.  Most of the 
hardware we emulate is physical hardware, and there are no virtual 
drivers for it.  Should we take the BusLogic driver and "paravirtualize" 
it by adding VMI hypercalls?  We might benefit from it, but would the 
BusLogic driver?  It sets a nasty precedent for maintenance as different 
hypervisors and emulators hack up different drivers for their own 
performance.

Our SCSI and IDE emulation and thus the drivers used by Linux are pretty 
much fixed in stone; we are not going to go about changing a tricky 
hardware interface to a virtual one, it is simply too risky for 
something as critical as storage.  We might be able to move our network 
driver over to virtio, but that is not a short-term prospect either.

There is great advantage in talking to our existing device layer faster, 
and this is something that is valuable today.

> Really LinuxHAL^wparavirt ops is already so complicated that
> any new hooks need an extremly good justification and that is
> just not here for this.
>
> We can add it if you find an equivalent number of hooks
> to eliminate.
>   

Interesting trade.  What if I sanitized the whole I/O messy macros into 
something fun and friendly:

native_port_in(int port, iosize_t opsize, int delay)
native_port_out(int port, iosize_t opsize, u32 output, int delay)
native_port_string_in(int port, void *ptr, iosize_t opsize, unsigned 
count, int delay)
native_port_string_out(int port, void *ptr, iosize_t opsize, unsigned 
count, int delay)

Then we can be rid of all the macro goo in io.h, which frightens my 
mother.  We might even be able to get rid of the umpteen different 
places where drivers wrap iospace access with their own byte / word / 
long functions so they can switch between port I/O and memory mapped I/O 
by moving it all into common infrastructure.

We could make similar (unwelcome?) advances on the pte functions if it 
were not for the regrettable disconnect between pte_high / pte_low and 
the rest.  Perhaps if it was hidden in macros?

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/