lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB41573A4D6C19105E21E1709DD411A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Fri, 19 Sep 2025 19:48:18 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Mukesh R <mrathor@...ux.microsoft.com>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-arch@...r.kernel.org"
	<linux-arch@...r.kernel.org>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
	<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de"
	<bp@...en8.de>, "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
	"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
	"arnd@...db.de" <arnd@...db.de>
Subject: RE: [PATCH v1 5/6] x86/hyperv: Implement hypervisor ram collection
 into vmcore

From: Mukesh R <mrathor@...ux.microsoft.com> Sent: Thursday, September 18, 2025 7:32 PM
> 
> On 9/18/25 16:53, Michael Kelley wrote:
> > From: Mukesh R <mrathor@...ux.microsoft.com> Sent: Tuesday, September 16, 2025 6:13 PM
> >>
> >> On 9/15/25 10:55, Michael Kelley wrote:
> >>> From: Mukesh Rathor <mrathor@...ux.microsoft.com> Sent: Tuesday, September 9, 2025 5:10 PM

[snip]

> >>>> +
> >>>> +/*
> >>>> + * Common function for all cpus before devirtualization.
> >>>> + *
> >>>> + * Hypervisor crash: all cpus get here in nmi context.
> >>>> + * Linux crash: the panicing cpu gets here at base level, all others in nmi
> >>>> + *		context. Note, panicing cpu may not be the bsp.
> >>>> + *
> >>>> + * The function is not inlined so it will show on the stack. It is named so
> >>>> + * because the crash cmd looks for certain well known function names on the
> >>>> + * stack before looking into the cpu saved note in the elf section, and
> >>>> + * that work is currently incomplete.
> >>>> + *
> >>>> + * Notes:
> >>>> + *  Hypervisor crash:
> >>>> + *    - the hypervisor is in a very restrictive mode at this point and any
> >>>> + *	vmexit it cannot handle would result in reboot. For example, console
> >>>> + *	output from here would result in synic ipi hcall, which would result
> >>>> + *	in reboot. So, no mumbo jumbo, just get to kexec as quickly as possible.
> >>>> + *
> >>>> + *  Devirtualization is supported from the bsp only.
> >>>> + */
> >>>> +static noinline __noclone void crash_nmi_callback(struct pt_regs *regs)
> >>>> +{
> >>>> +	struct hv_input_disable_hyp_ex *input;
> >>>> +	u64 status;
> >>>> +	int msecs = 1000, ccpu = smp_processor_id();
> >>>> +
> >>>> +	if (ccpu == 0) {
> >>>> +		/* crash_save_cpu() will be done in the kexec path */
> >>>> +		cpu_emergency_stop_pt();	/* disable performance trace */
> >>>> +		atomic_inc(&crash_cpus_wait);
> >>>> +	} else {
> >>>> +		crash_save_cpu(regs, ccpu);
> >>>> +		cpu_emergency_stop_pt();	/* disable performance trace */
> >>>> +		atomic_inc(&crash_cpus_wait);
> >>>> +		for (;;);			/* cause no vmexits */
> >>>> +	}
> >>>> +
> >>>> +	while (atomic_read(&crash_cpus_wait) < num_online_cpus() && msecs--)
> >>>> +		mdelay(1);
> >>>> +
> >>>> +	stop_nmi();
> >>>> +	if (!hv_has_crashed)
> >>>> +		hv_notify_prepare_hyp();
> >>>> +
> >>>> +	if (crashing_cpu == -1)
> >>>> +		crashing_cpu = ccpu;		/* crash cmd uses this */
> >>>
> >>> Could just be "crashing_cpu = 0" since only the BSP gets here.
> >>
> >> a code change request has been open for while to remove the requirement
> >> of bsp..
> >>
> >>>> +
> >>>> +	hv_hvcrash_ctxt_save();
> >>>> +	hv_mark_tss_not_busy();
> >>>> +	hv_crash_fixup_kernpt();
> >>>> +
> >>>> +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >>>> +	memset(input, 0, sizeof(*input));
> >>>> +	input->rip = trampoline_pa;	/* PA of hv_crash_asm32 */
> >>>> +	input->arg = devirt_cr3arg;	/* PA of trampoline page table L4 */
> >>>
> >>> Is this comment correct? Isn't it the PA of struct hv_crash_tramp_data?
> >>> And just for clarification, Hyper-V treats this "arg" value as opaque and does
> >>> not access it. It only provides it in EDI when it invokes the trampoline
> >>> function, right?
> >>
> >> comment is correct. cr3 always points to l4 (or l5 if 5 level page tables).
> >
> > Yes, the comment matches the name of the "devirt_cr3arg" variable.
> > Unfortunately my previous comment was incomplete because the value
> > stored in the static variable "devirt_cr3arg" isn?t the address of an L4 page
> > table. It's not a CR3 value. The value stored in devirt_cr3arg is actually the
> > PA of struct hv_crash_tramp_data. The CR3 value is stored in the
> > tramp32_cr3 field (at offset 0) of that structure, so there's an additional level
> > of indirection. The (corrected) comment in the header to hv_crash_asm32()
> > describes EDI as containing "PA of struct hv_crash_tramp_data", which
> > ought to match what is described here. I'd say that "devirt_cr3arg" ought
> > to be renamed to "tramp_data_pa" or something else parallel to
> > "trampoline_pa".
> 
> hyp needs trampoline cr3 for transition, we pass it as an arg. we piggy
> back extra information for ourselves needed in trampoline.S. so it's
> all good.
> 

That's a pretty important "detail" that hasn't heretofore been mentioned.
It means the layout of struct hv_crash_tramp_data is not entirely at Linux's
discretion. The tramp32_cr3 field must be first so the hypervisor finds it
where it expects it. Please add code comments describing that the
hypervisor uses the tramp32_cr3 field.

With this new information, I agree the code works. But the devirt_cr3arg
variable is still named incorrectly, and the "PA of trampoline page table L4"
comment is still incorrect. The value in "devirt_cr3arg" is the PA of a memory
location in the trampoline page that contains the devirt CR3 (which itself is
the PA of trampoline page table L4). The CR3 value is in the tramp32_cr3 field
of struct hv_crash_tramp_data in the trampoline page. The CR3 value is
not in static variable devirt_cr3arg, which is why I object to the naming of that
variable.

So rename devirt_cr3arg to devirt_cr3arg_pa. And the comment
becomes "PA of PA of trampoline page table L4", which is rather unwieldy, so
could be shortened to "PA of devirt CR3 value" or something similar. You could
also use "PA of struct hv_crash_tramp_data" as the comment, as I suggested
previously.
 
Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ