[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB4157D124B1AF145E06431BD2D48BA@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 8 May 2025 04:03:49 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Roman Kisel <romank@...ux.microsoft.com>, Saurabh Singh Sengar
<ssengar@...rosoft.com>, Naman Jain <namjain@...ux.microsoft.com>, KY
Srinivasan <kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei
Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>
CC: Anirudh Rayabharam <anrayabh@...ux.microsoft.com>, Saurabh Sengar
<ssengar@...ux.microsoft.com>, Stanislav Kinsburskii
<skinsburskii@...ux.microsoft.com>, Nuno Das Neves
<nunodasneves@...ux.microsoft.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>
Subject: RE: [PATCH] Drivers: hv: Introduce mshv_vtl driver
From: Roman Kisel <romank@...ux.microsoft.com> Sent: Wednesday, May 7, 2025 12:21 PM
>
> On 5/7/2025 6:02 AM, Saurabh Singh Sengar wrote:
> >
> [..]
>
> >> + }
> >> +
> >> + local_irq_save(flags);
> >> + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> >> + out = *this_cpu_ptr(hyperv_pcpu_output_arg);
> >> +
> >> + if (copy_from_user(in, (void __user *)hvcall.input_ptr,
> >> hvcall.input_size)) {
> >
> > Here is an issue related to usage of user copy functions when interrupt are disabled.
> > It was reported by Michael K here:
> >
> > https://github.com/microsoft/OHCL-Linux-Kernel/issues/33
>
> From the practical point of view, that memory will be touched by the
> user mode by virtue of Rust requiring initialization so the a possible
> page fault would be resolved before the IOCTL. OpenHCL runs without swap
> so the the memory will not be paged out to require page faults to be
> brought in back.
>
> I do agree that might be turned into a footgun by the user land if
> they malloc a page w/o prefaulting (so it's just a VA range, not backed
> with the physical page), and then send its address straight over here
> right after w/o writing any data to it. Perhaps likelier with the output
> data. Anyway, yes, relying on the user land doing sane things isn't
> the best approach to the kernel programming.
>
> If we're inclined to fix this, I'd encourage to take an approach that
> works for the confidential VMs as well so we don't have to fix that
> again when start upstreaming what we have for SNP and TDX. The
> allocation *must* be visible to the hypervisor in the confidential
> scenarios.
>
> Or, maybe we could avoid the allocations by reading the first byte
> of the user land buffer to "pre-fault" the page outside of the
> scope that disables interrupts. Why allocate if we can avoid that?
> Could set up also the SMP remote calls to run this on the desired
> CPU.
>
> Summarizing for the case you want to change this:
>
> 1. Keep interrupts disabled when reading/writing to/from the Hyper-V
> driver allocated input and output pages.
> 2. If you decide to allocate separate pages, make sure they are
> visible to the hypervisor in the confidential scenarios. I know
> we're not talking SNP and TDX here just yet but it would be
> a waste of time imho to build something here and scrape that
> later. The issues with allocations are:
> a) If allocating on-demand, we might fail the hypercall
> because of OOM. That's certainly bad as the whole VM
> will break down.
> b) If allocating for the whole lifetime of the VM,
> let us remember that we avoid using hypercalls
> due to their runtime cost. We'll be keeping around
> 2 pages per CPU for the few times we need them.
> 3. Consider reading a byte from the user land buffers to make the page
> fault happen outside of disabling interrupts. There is no
> outswap (maybe could have disabling swap in Kconfig) so the page
> will stay in the memory.
>
> If you're not changing this, feel free to keep my "Reviewed-by".
>
Regardless of what might be done to prevent a page fault, I don't
see an option to not fix this. copy_from_user() contains a call to
might_fault(), which in turn calls might_sleep(). The intent of these
runtime "annotations" is precisely for the kernel to check for such
errors and complain about them. The complaining is suppressed unless
CONFIG_DEBUG_ATOMIC_SLEEP is set, but we want to be able to
set that option for debugging purposes and not have this code
generating complaints.
Michael
Powered by blists - more mailing lists