lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230621164038.GM2053369@hirez.programming.kicks-ass.net>
Date:   Wed, 21 Jun 2023 18:40:38 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Per Bilse <per.bilse@...rix.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>, Juergen Gross <jgross@...e.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Oleksandr Tyshchenko <oleksandr_tyshchenko@...m.com>,
        "open list:X86 ENTRY CODE" <linux-kernel@...r.kernel.org>,
        "moderated list:XEN HYPERVISOR INTERFACE" 
        <xen-devel@...ts.xenproject.org>
Subject: Re: [PATCH] Updates to Xen hypercall preemption

On Wed, Jun 21, 2023 at 03:14:42PM +0000, Per Bilse wrote:
> Some Xen hypercalls issued by dom0 guests may run for many 10s of
> seconds, potentially causing watchdog timeouts and other problems.
> It's rare for this to happen, but it does in extreme circumstances,
> for instance when shutting down VMs with very large memory allocations
> (> 0.5 - 1TB).  These hypercalls are preemptible, but the fixes in the
> kernel to ensure preemption have fallen into a state of disrepair, and
> are currently ineffective.  This patch brings things up to date by way of:

I don't understand it -- fundamentally, how can linux schedule when the
guest isn't even running? Hypercall transfers control to the
host/hypervisor and leaves the guest suspended.

> 1) Update general feature selection from XEN_PV to XEN_DOM0.
> The issue is unique to dom0 Xen guests, but isn't unique to PV dom0s,
> and will occur in future PVH dom0s.  XEN_DOM0 depends on either PV or PVH,
> as well as the appropriate details for dom0.
> 
> 2) Update specific feature selection from !PREEMPTION to !PREEMPT.
> The following table shows the relationship between different preemption
> features and their indicators/selectors (Y = "=Y", N = "is not set",
> . = absent):
> 
>                             | np-s | np-d | vp-s | vp-d | fp-s | fp-d
>     CONFIG_PREEMPT_DYNAMIC      N      Y      N      Y      N      Y
>          CONFIG_PREEMPTION      .      Y      .      Y      Y      Y
>             CONFIG_PREEMPT      N      N      N      N      Y      Y
>   CONFIG_PREEMPT_VOLUNTARY      N      N      Y      Y      N      N
>        CONFIG_PREEMPT_NONE      Y      Y      N      N      N      N
> 
> Unless PREEMPT is set, we need to enable the fixes.
> 
> 3) Update flag access from __this_cpu_XXX() to raw_cpu_XXX().
> The long-running hypercalls are flagged by way of a per-cpu variable
> which is set before and cleared after the relevant calls.  This elicits
> a warning "BUG: using __this_cpu_write() in preemptible [00000000] code",
> but xen_pv_evtchn_do_upcall() deals specifically with this.  For
> consistency, flag testing is also updated, and the code is simplified
> and tidied accordingly.

This makes no sense; the race that warning warns about is:

	CPU0			CPU1
	per-cpu write
	<preempt-out>
				<preempt-in>
				do-hypercall

So you wrote the value on CPU0, got migrated to CPU1 because you had
preemptioned enabled, and then continue with the percpu value of CPU1
because that's where you're at now.

Simply making the warning go away doesn't help, CPU1 does hypercall
while store was on CPU0.

> 4) Update irqentry_exit_cond_resched() to raw_irqentry_exit_cond_resched().
> The code will call irqentry_exit_cond_resched() if the flag (as noted
> above) is set, but the dynamic preemption feature will livepatch that
> function to a no-op unless full preemption is selected.  The code is
> therefore updated to call raw_irqentry_exit_cond_resched().

That, again meeds more explanation. Why do you want this if not
preemptible?

You're doing 4 things, that should be 4 patches. Also, please give more
clues for how this is supposed to work at all.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ