linux-kernel - RE: [PATCH] Drivers: hv: Introduce mshv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <SN6PR02MB4157D124B1AF145E06431BD2D48BA@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 8 May 2025 04:03:49 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Roman Kisel <romank@...ux.microsoft.com>, Saurabh Singh Sengar
	<ssengar@...rosoft.com>, Naman Jain <namjain@...ux.microsoft.com>, KY
 Srinivasan <kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei
 Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>
CC: Anirudh Rayabharam <anrayabh@...ux.microsoft.com>, Saurabh Sengar
	<ssengar@...ux.microsoft.com>, Stanislav Kinsburskii
	<skinsburskii@...ux.microsoft.com>, Nuno Das Neves
	<nunodasneves@...ux.microsoft.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>
Subject: RE: [PATCH] Drivers: hv: Introduce mshv_vtl driver

From: Roman Kisel <romank@...ux.microsoft.com> Sent: Wednesday, May 7, 2025 12:21 PM
> 
> On 5/7/2025 6:02 AM, Saurabh Singh Sengar wrote:
> >
> [..]
> 
> >> +	}
> >> +
> >> +	local_irq_save(flags);
> >> +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >> +	out = *this_cpu_ptr(hyperv_pcpu_output_arg);
> >> +
> >> +	if (copy_from_user(in, (void __user *)hvcall.input_ptr,
> >> hvcall.input_size)) {
> >
> > Here is an issue related to usage of user copy functions when interrupt are disabled.
> > It was reported by Michael K here:
> >
> > https://github.com/microsoft/OHCL-Linux-Kernel/issues/33
> 
>  From the practical point of view, that memory will be touched by the
> user mode by virtue of Rust requiring initialization so the a possible
> page fault would be resolved before the IOCTL. OpenHCL runs without swap
> so the the memory will not be paged out to require page faults to be
> brought in back.
> 
> I do agree that might be turned into a footgun by the user land if
> they malloc a page w/o prefaulting (so it's just a VA range, not backed
> with the physical page), and then send its address straight over here
> right after w/o writing any data to it. Perhaps likelier with the output
> data. Anyway, yes, relying on the user land doing sane things isn't
> the best approach to the kernel programming.
> 
> If we're inclined to fix this, I'd encourage to take an approach that
> works for the confidential VMs as well so we don't have to fix that
> again when start upstreaming what we have for SNP and TDX. The
> allocation *must* be visible to the hypervisor in the confidential
> scenarios.
> 
> Or, maybe we could avoid the allocations by reading the first byte
> of the user land buffer to "pre-fault" the page outside of the
> scope that disables interrupts. Why allocate if we can avoid that?
> Could set up also the SMP remote calls to run this on the desired
> CPU.
> 
> Summarizing for the case you want to change this:
> 
> 1. Keep interrupts disabled when reading/writing to/from the Hyper-V
>     driver allocated input and output pages.
> 2. If you decide to allocate separate pages, make sure they are
>     visible to the hypervisor in the confidential scenarios. I know
>     we're not talking SNP and TDX here just yet but it would be
>     a waste of time imho to build something here and scrape that
>     later. The issues with allocations are:
>         a) If allocating on-demand, we might fail the hypercall
>            because of OOM. That's certainly bad as the whole VM
>            will break down.
>         b) If allocating for the whole lifetime of the VM,
>            let us remember that we avoid using hypercalls
>            due to their runtime cost. We'll be keeping around
>            2 pages per CPU for the few times we need them.
> 3. Consider reading a byte from the user land buffers to make the page
>     fault happen outside of disabling interrupts. There is no
>     outswap (maybe could have disabling swap in Kconfig) so the page
>     will stay in the memory.
> 
> If you're not changing this, feel free to keep my "Reviewed-by".
> 

Regardless of what might be done to prevent a page fault, I don't
see an option to not fix this. copy_from_user() contains a call to
might_fault(), which in turn calls might_sleep(). The intent of these
runtime "annotations" is precisely for the kernel to check for such
errors and complain about them. The complaining is suppressed unless
CONFIG_DEBUG_ATOMIC_SLEEP is set, but we want to be able to
set that option for debugging purposes and not have this code
generating complaints.

Michael