linux-kernel - Re: [PATCH] Updates to Xen hypercall preemption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8cd2788-a695-964a-3311-dbecb669bb72@suse.com>
Date:   Thu, 22 Jun 2023 07:22:53 +0200
From:   Juergen Gross <jgross@...e.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Per Bilse <Per.Bilse@...rix.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
        "open list:X86 ENTRY CODE" <linux-kernel@...r.kernel.org>,
        "moderated list:XEN HYPERVISOR INTERFACE" 
        <xen-devel@...ts.xenproject.org>
Subject: Re: [PATCH] Updates to Xen hypercall preemption

On 21.06.23 22:04, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 07:19:21PM +0000, Per Bilse wrote:
>> On 6/21/2023 5:40 PM, Peter Zijlstra wrote:
>>> I don't understand it -- fundamentally, how can linux schedule when the
>>> guest isn't even running? Hypercall transfers control to the
>>> host/hypervisor and leaves the guest suspended.
>>
>> Hi Peter, as noted in earlier note to Andy, this is essentially existing
>> code that other commits have rendered ineffective over time.  Hence,
>> the finer details of how or why it works haven't changed since it was
>> first introduced.
> 
> That doesn't mean you don't have to explain how stuff works.
> 
>>> This makes no sense; the race that warning warns about is:
>>>
>>> 	CPU0			CPU1
>>> 	per-cpu write
>>> 	<preempt-out>
>>> 				<preempt-in>
>>> 				do-hypercall
>>>
>>> So you wrote the value on CPU0, got migrated to CPU1 because you had
>>> preemptioned enabled, and then continue with the percpu value of CPU1
>>> because that's where you're at now.
>>
>> This issue was raised internally, and it was noted that the only way
>> for the preemptible code to switch task is via an interrupt that goes
>> through xen_pv_evtchn_do_upcall(), which handles this.  I'm happy to
>> check with my sources, but it's holiday season right now.
> 
> Then it should have all sorts of comments on and a comprehensive
> changelog.
> 
>>>> 4) Update irqentry_exit_cond_resched() to raw_irqentry_exit_cond_resched().
>>>> The code will call irqentry_exit_cond_resched() if the flag (as noted
>>>> above) is set, but the dynamic preemption feature will livepatch that
>>>> function to a no-op unless full preemption is selected.  The code is
>>>> therefore updated to call raw_irqentry_exit_cond_resched().
>>>
>>> That, again meeds more explanation. Why do you want this if not
>>> preemptible?
>>
>> I'm not quite sure what you mean here.  Dynamic preemption
>> will livepatch irqentry_exit_cond_resched() to be a no-op, while
>> raw_irqentry_exit_cond_resched() remains functional.  This was
>> introduced in commit 4624a14f4daa last year which was said to fix
>> the problem, but doesn't.  You may remember, it was signed off by
>> yourself and Mark Rutland.
> 
> I don't see the relation; what you're doing is making dynamic preempt
> that's not configured for full preempt do preemption. That's weird, and
> again no comments.
> 
> I'm with Andy in that simply forcing full preemption would make far more
> sense -- but I'm still missing something fundamental, see below.
> 
>>> You're doing 4 things, that should be 4 patches. Also, please give more
>>> clues for how this is supposed to work at all.
>>
>> I respectfully have to disagree with that.  The fixes here are very
>> closely related, and we're not introducing anything new, we're merely
>> re-enabling code which has been rendered ineffective due to oversights
>> in commits made after the code was first introduced.  How the code is
>> supposed to work hasn't changed, and is beyond the scope of these fixes;
>> I'm sure it must have been discussed at great length at the time (commit
>> fdfd811ddde3).
> 
> You didn't even so much as reference that commit, nor provide any other
> explanation. And having now read that commit, I'm not much enlightend.
> 
> *HOW* can a hypercall, something that exits the Guest and has the
> Host/Hypervisor run get preempted in the Guest -- that isn't running.
> 
> Or are you calling apples pears?

The hypercalls we are talking of are synchronous ones. They are running
in the context of the vcpu doing the call (like a syscall from userland is
running in the process context).

The hypervisor will return to guest context from time to time by modifying
the registers such that the guest will do the hypercall again with different
input values for the hypervisor, resulting in a proper continuation of the
hypercall processing.

It is an awful interface and I agree that switching to full preemption in
dom0 seems to be the route which we should try to take. The downside would
be that some workloads might see worse performance due to backend I/O
handling might get preempted.

Just thinking - can full preemption be enabled per process?


Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3099 bytes)

Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)