lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Jun 2023 12:33:31 +0200
From:   Juergen Gross <jgross@...e.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Per Bilse <Per.Bilse@...rix.com>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
        "open list:X86 ENTRY CODE" <linux-kernel@...r.kernel.org>,
        "moderated list:XEN HYPERVISOR INTERFACE" 
        <xen-devel@...ts.xenproject.org>
Subject: Re: [PATCH] Updates to Xen hypercall preemption

On 22.06.23 10:26, Peter Zijlstra wrote:
> On Thu, Jun 22, 2023 at 07:22:53AM +0200, Juergen Gross wrote:
> 
>> The hypercalls we are talking of are synchronous ones. They are running
>> in the context of the vcpu doing the call (like a syscall from userland is
>> running in the process context).
> 
> (so time actually passes from the guest's pov?)

Correct.

> 
>> The hypervisor will return to guest context from time to time by modifying
>> the registers such that the guest will do the hypercall again with different
>> input values for the hypervisor, resulting in a proper continuation of the
>> hypercall processing.
> 
> Eeeuw.. that's pretty terrible. And changing this isn't in the cards,
> like at all?

In the long run this should be possible, but not for already existing Xen
versions.

> 
> That is, why isn't this whole thing written like:
> 
> 	for (;;) {
> 		ret = hypercall(foo);
> 		if (ret == -EAGAIN) {
> 			cond_resched();
> 			continue;
> 		}
> 		break;
> 	}

The hypervisor doesn't return -EAGAIN for hysterical reasons.

This would be one of the options to change the interface. OTOH there are cases
where already existing hypercalls need to be modified in the hypervisor to do
preemption in the middle due to e.g. security reasons (avoiding cpu hogging in
special cases).

Additionally some of the hypercalls being subject to preemption are allowed in
unprivileged guests, too. Those are mostly hypercalls allowed for PV guests
only, but some are usable by all guests.

> 
>> It is an awful interface and I agree that switching to full preemption in
>> dom0 seems to be the route which we should try to take.
> 
> Well, I would very strongly suggest the route to take is to scrap the
> whole thing and invest in doing something saner so we don't have to jump
> through hoops like this.
> 
> This is quite possibly the worst possible interface for this Xen could
> have come up with -- awards material for sure.

Yes.

> 
>> The downside would be that some workloads might see worse performance
>> due to backend I/O handling might get preempted.
> 
> Is that an actual concern? Mark this a legaxy inteface and anybody who
> wants to get away from it updates.

It isn't that easy. See above.

> 
>> Just thinking - can full preemption be enabled per process?
> 
> Nope, that's a system wide thing. Preemption is something that's driven
> by the requirements of the tasks that preempt, not something by the
> tasks that get preempted.

Depends. If a task in a non-preempt system could switch itself to be
preemptable, we could do so around hypercalls without compromising the
general preemption setting. Disabling preemption in a preemptable system
should continue to be possible for short code paths only, of course.

> Andy's idea of having that thing intercepted as an exception (EXTABLE
> like) and relocating the IP to a place that does cond_resched() before
> going back is an option.. gross, but possibly better, dunno.
> 
> Quite the mess indeed :/

Yeah.


Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3099 bytes)

Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ