linux-kernel - Re: [PATCH v2 5/6] x86/hyperv: Implement hypervisor RAM collection into vmcore

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <f6835c8e-70de-7bd0-f116-0a4eae0ef29c@linux.microsoft.com>
Date: Thu, 2 Oct 2025 15:07:28 -0700
From: Mukesh R <mrathor@...ux.microsoft.com>
To: Wei Liu <wei.liu@...nel.org>
Cc: linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-arch@...r.kernel.org, kys@...rosoft.com, haiyangz@...rosoft.com,
 decui@...rosoft.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, arnd@...db.de
Subject: Re: [PATCH v2 5/6] x86/hyperv: Implement hypervisor RAM collection
 into vmcore

On 10/2/25 14:42, Wei Liu wrote:
> On Tue, Sep 23, 2025 at 02:46:08PM -0700, Mukesh Rathor wrote:
> [...]
>> +
>> +/*
>> + * This is the C entry point from the asm glue code after the devirt hypercall.
> 
> devirt -> devirtualization
> 
>> + * We enter here in IA32-e long mode, ie, full 64bit mode running on kernel
>> + * page tables with our below 4G page identity mapped, but using a temporary
>> + * GDT. ds/fs/gs/es are null. ss is not usable. bp is null. stack is not
>> + * available. We restore kernel GDT, and rest of the context, and continue
>> + * to kexec.
>> + */
> [...]
>> +
>> +static noinline __noclone void crash_nmi_callback(struct pt_regs *regs)
>> +{
>> +	struct hv_input_disable_hyp_ex *input;
>> +	u64 status;
>> +	int msecs = 1000, ccpu = smp_processor_id();
>> +
>> +	if (ccpu == 0) {
>> +		/* crash_save_cpu() will be done in the kexec path */
>> +		cpu_emergency_stop_pt();	/* disable performance trace */
>> +		atomic_inc(&crash_cpus_wait);
>> +	} else {
>> +		crash_save_cpu(regs, ccpu);
>> +		cpu_emergency_stop_pt();	/* disable performance trace */
>> +		atomic_inc(&crash_cpus_wait);
>> +		for (;;)
>> +			cpu_relax();
>> +	}
>> +
>> +	while (atomic_read(&crash_cpus_wait) < num_online_cpus() && msecs--)
>> +		mdelay(1);
>> +
>> +	stop_nmi();
>> +	if (!hv_has_crashed)
>> +		hv_notify_prepare_hyp();
>> +
>> +	if (crashing_cpu == -1)
>> +		crashing_cpu = ccpu;		/* crash cmd uses this */
>> +
>> +	hv_hvcrash_ctxt_save();
>> +	hv_mark_tss_not_busy();
>> +	hv_crash_fixup_kernpt();
>> +
>> +	input = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> +	memset(input, 0, sizeof(*input));
>> +	input->rip = trampoline_pa;
>> +	input->arg = devirt_arg;
>> +
>> +	status = hv_do_hypercall(HVCALL_DISABLE_HYP_EX, input, NULL);
>> +
> 
> If I understand this correctly, after this call, upon return from the
> hypervisor, Linux will start executing the trampoline code.

correct.


>> +	hv_panic_timeout_reboot();
> 
> Why is this needed? Is it to catch the case when the hypercall fails?
  
correct.
 

> [...]
>> +static void __noclone hv_crash_stop_other_cpus(void)
>> +{
>> +	static bool crash_stop_done;
>> +	struct pt_regs lregs;
>> +	int ccpu = smp_processor_id();
>> +
>> +	if (hv_has_crashed)
>> +		return;		/* all cpus already in NMI handler path */
>> +
>> +	if (!kexec_crash_loaded()) {
>> +		hv_notify_prepare_hyp();
>> +		hv_panic_timeout_reboot();	/* no return */
>> +	}
>> +
>> +	/* If hyp crashes also, we could come here again before cpus_stopped is
> 
> hypervisor or hv (given the same term is used in the function)
> 
>> +	 * set in crash_smp_send_stop(). So use our own check.
>> +	 */
>> +	if (crash_stop_done)
>> +		return;
>> +	crash_stop_done = true;
>> +
>> +	/* Linux has crashed: hv is healthy, we can ipi safely */
> 
> IPI.
> 
>> +
>> +err_out:
>> +	unregister_nmi_handler(NMI_LOCAL, "hv_crash_nmi");
>> +	pr_err("Hyper-V: only linux (but not hyp) kdump support enabled\n");
> 
> hypervisor not hyp. This is a message for the user so we should be as
> clear as possible.

> Wei
> 
>> +}
>> -- 
>> 2.36.1.vfs.0.0
>>