lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 27 Dec 2023 15:58:19 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: Elizabeth Figura <zfigura@...eweavers.com>, x86@...nel.org,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
        wine-devel@...ehq.org
Subject: Re: x86 SGDT emulation for Wine

On December 27, 2023 2:20:37 PM PST, Elizabeth Figura <zfigura@...eweavers.com> wrote:
>Hello all,
>
>There is a Windows 98 program, a game called Nuclear Strike, which wants to do 
>some amount of direct VGA access. Part of this is port I/O, which naturally 
>throws SIGILL that we can trivially catch and emulate in Wine. The other part 
>is direct access to the video memory at 0xa0000, which in general isn't a 
>problem to catch and virtualize as well.
>
>However, this program is a bit creative about how it accesses that memory; 
>instead of just writing to 0xa0000 directly, it looks up a segment descriptor 
>whose base is at 0xa0000 and then uses the %es override to write bytes. In 
>pseudo-C, what it does is:
>
>int get_vga_selector()
>{
>    sgdt(&gdt_size, &gdt_ptr);
>    sldt(&ldt_segment);
>    ++gdt_size;
>    descriptor = gdt_ptr;
>    while (descriptor->base != 0xa0000)
>    {
>        ++descriptor;
>        gdt_size -= sizeof(*descriptor);
>        if (!gdt_size)
>            break;
>    }
>
>    if (gdt_size)
>        return (descriptor - gdt_ptr) << 3;
>
>    descriptor = gdt_ptr[ldt_segment >> 3]->base;
>    ldt_size = gdt_ptr[ldt_segment >> 3]->limit + 1;
>    while (descriptor->base != 0xa0000)
>    {
>        ++descriptor;
>        ldt_size -= sizeof(*descriptor);
>        if (!ldt_size)
>            break;
>    }
>
>    if (ldt_size)
>        return (descriptor - ldt_ptr) << 3;
>
>    return 0;
>}
>
>
>Currently we emulate IDT access. On a read fault, we execute sidt ourselves, 
>check if the read address falls within the IDT, and return some dummy data 
>from the exception handler if it does [1]. We can easily enough implement GDT 
>access as well this way, and there is even an out-of-tree patch written some 
>years ago that does this, and helps the game run.
>
>However, there are two problems that I have observed or anticipated:
>
>(1) On systems with UMIP, the kernel emulates sgdt instructions and returns a 
>consistent address which we can guarantee is invalid. However, it also returns 
>a size of zero. The program doesn't expect this (cf. the way the loop is 
>written above) and I believe will effectively loop forever in that case, or 
>until it finds the VGA selector or hits invalid memory.
>
>    I see two obvious ways to fix this: either adjust the size of the fake 
>kernel GDT, or provide a switch to stop emulating and let Wine handle it. The 
>latter may very well a more sustainable option in the long term (although I'll 
>admit I can't immediately come up with a reason why, other than "we might need 
>to raise the size yet again".)
>
>    Does anyone have opinions on this particular topic? I can look into 
>writing a patch but I'm not sure what the best approach is.
>
>(2) On 64-bit systems without UMIP, sgdt returns a truncated address when in 
>32-bit mode. This truncated address in practice might point anywhere in the 
>address space, including to valid memory.
>
>    In order to fix this, we would need the kernel to guarantee that the GDT 
>base points to an address whose bottom 32 bits we can guarantee are 
>inaccessible. This is relatively easy to achieve ourselves by simply mapping 
>those pages as noaccess, but it also means that those pages can't overlap 
>something we need; we already go to pains to make sure that certain parts of 
>the address space are free. Broadly anything above the 2G boundary *should* be 
>okay though. Is this feasible?
>
>    We could also just decide we don't care about systems without UMIP, but 
>that seems a bit unfortunate; it's not that old of a feature. But I also have 
>no idea how hard it would be to make this kind of a guarantee on the kernel 
>side.
>
>    This is also, theoretically, a problem for the IDT, except that on the 
>machines I've tested, the IDT is always at 0xfffffe0000000000. That's not 
>great either (it's certainly caused some weirdness and confusion when 
>debugging, when we unexpectedly catch an unrelated null pointer access) but it 
>seems to work in practice.
>
>--Zeb
>
>[1] https://source.winehq.org/git/wine.git/blob/HEAD:/dlls/krnl386.exe16/
>instr.c#l702
>
>

A prctl() to set the UMIP-emulated return values or disable it (giving SIGILL) would be easy enough.

For the non-UMIP case, and probably for a lot of other corner cases like relying on certain magic selector values and what not, the best option really would be to wrap the code in a lightweight KVM container. I do *not* mean running the Qemu user space part of KVM; instead have Wine interface with /dev/kvm directly.

Non-KVM-capable hardware is basically historic at this point.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ