linux-kernel - Re: x86 SGDT emulation for Wine

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1C37C311-CF8A-44EC-89B5-D826EF458708@zytor.com>
Date: Wed, 03 Jan 2024 07:33:10 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Sean Christopherson <seanjc@...gle.com>,
        Elizabeth Figura <zfigura@...eweavers.com>
CC: x86@...nel.org, Linux Kernel <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
        wine-devel@...ehq.org
Subject: Re: x86 SGDT emulation for Wine

On January 3, 2024 7:19:02 AM PST, Sean Christopherson <seanjc@...gle.com> wrote:
>On Tue, Jan 02, 2024, Elizabeth Figura wrote:
>> On Wednesday, December 27, 2023 5:58:19 PM CST H. Peter Anvin wrote:
>> > On December 27, 2023 2:20:37 PM PST, Elizabeth Figura <zfigura@...eweavers.com> wrote:
>> > >Hello all,
>> > >
>> > >There is a Windows 98 program, a game called Nuclear Strike, which wants to
>> > >do some amount of direct VGA access. Part of this is port I/O, which
>> > >naturally throws SIGILL that we can trivially catch and emulate in Wine.
>> > >The other part is direct access to the video memory at 0xa0000, which in
>> > >general isn't a problem to catch and virtualize as well.
>> > >
>> > >However, this program is a bit creative about how it accesses that memory;
>> > >instead of just writing to 0xa0000 directly, it looks up a segment
>> > >descriptor whose base is at 0xa0000 and then uses the %es override to
>> > >write bytes. In pseudo-C, what it does is:
>
>...
>
>> > A prctl() to set the UMIP-emulated return values or disable it (giving
>> > SIGILL) would be easy enough.
>> > 
>> > For the non-UMIP case, and probably for a lot of other corner cases like
>> > relying on certain magic selector values and what not, the best option
>> > really would be to wrap the code in a lightweight KVM container. I do *not*
>> > mean running the Qemu user space part of KVM; instead have Wine interface
>> > with /dev/kvm directly.
>> > 
>> > Non-KVM-capable hardware is basically historic at this point.
>> 
>> Sorry for the late response—I've been trying to do research on what would be 
>> necessary to use KVM (plus I made the poor choice of sending this during the 
>> holiday season...)
>> 
>> I'm concerned that KVM is going to be difficult or even intractable. Here are 
>> some of the problems that I (perhaps incorrectly) understand:
>> 
>> * As I am led to understand, there can only be one hypervisor on the machine 
>> at a time,
>
>No.  Only one instance of KVM-the-module is allowed, but there is no arbitrary
>limit on the number of VMs that userspace can create.  The only meaningful
>limitation is memory, and while struct kvm isn't tiny, it's not _that_ big.
>
>> and KVM has a hard limit on the number of vCPUs.
>>
>>   The obvious way to use KVM for Wine is to make each (guest) thread a vCPU. 
>> That will, at the very least, run into the thread limit. In order to avoid 
>> that we'd need to ship a whole scheduler, which is concerning. That's a huge 
>> component to ship and a huge burden to keep updated. It also means we need to 
>> hoist *all* of the ipc and sync code into the guest, which will take an 
>> enormous amount of work.
>> 
>>   Moreover, because there can only be one hypervisor, and Wine is a multi-
>> process beast, that means that we suddenly need to throw every process into 
>> the same VM.
>
>As above, this is wildly inaccurate.  The only KVM restriction with respect to
>processes is that a VM is bound to the process (address space) that created the
>VM.  There are no restrictions on the number of VMs that can be created, e.g. a
>single process can create multiple VMs.
>
>> That has unfortunate implications regarding isolation (it's been a dream for
>> years that we'd be able to share a single wine "VM" between multiple users),
>> it complicates memory management (though perhaps not terribly?). And it means
>> you can only have one Wine VM at a time, and can't use Wine at the same time
>> as a "real" VM, neither of which are restrictions that currently exist.
>> 
>>   And it's not even like we can refactor—we'd have to rewrite tons of code to 
>> work inside a VM, but also keep the old code around for the cases where we 
>> don't have a VM and want to delegate scheduling to the host OS.
>> 
>> * Besides scheduling, we need to exit the VM every time we would normally call 
>> into Unix code, which in practice is every time that the application does an 
>> NT syscall, or uses a library which we delegate to the host (including e.g. 
>> GPU, multimedia, audio...)
>
>Maybe I misinterpreted Peter's suggestion, but at least in my mind I wasn't thinking
>that the entire Wine process would run in a VM, but rather Wine would run just
>the "problematic" code in a VM.
>

Yes, the idea would be that you would run the "problematic" code inside a VM *mapped 1:1 with the external address space*, i.e. use KVM simply as a special execution mode to give you more control of the fine grained machine state like the GDT. The code that you don't want executed in the VM context simply leave unmapped in the VM page tables and set up #PF to always exit the VM context.