[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8f83fbdb-0aee-4602-ad8a-58bbd22dbdc9@linux.microsoft.com>
Date: Wed, 7 May 2025 12:20:36 -0700
From: Roman Kisel <romank@...ux.microsoft.com>
To: Saurabh Singh Sengar <ssengar@...rosoft.com>,
Naman Jain <namjain@...ux.microsoft.com>, KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>
Cc: Anirudh Rayabharam <anrayabh@...ux.microsoft.com>,
Saurabh Sengar <ssengar@...ux.microsoft.com>,
Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>,
Nuno Das Neves <nunodasneves@...ux.microsoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>
Subject: Re: [PATCH] Drivers: hv: Introduce mshv_vtl driver
On 5/7/2025 6:02 AM, Saurabh Singh Sengar wrote:
>
[..]
>> + }
>> +
>> + local_irq_save(flags);
>> + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
>> + out = *this_cpu_ptr(hyperv_pcpu_output_arg);
>> +
>> + if (copy_from_user(in, (void __user *)hvcall.input_ptr,
>> hvcall.input_size)) {
>
> Here is an issue related to usage of user copy functions when interrupt are disabled.
> It was reported by Michael K here:
> https://github.com/microsoft/OHCL-Linux-Kernel/issues/33
From the practical point of view, that memory will be touched by the
user mode by virtue of Rust requiring initialization so the a possible
page fault would be resolved before the IOCTL. OpenHCL runs without swap
so the the memory will not be paged out to require page faults to be
brought in back.
I do agree that might be turned into a footgun by the user land if
they malloc a page w/o prefaulting (so it's just a VA range, not backed
with the physical page), and then send its address straight over here
right after w/o writing any data to it. Perhaps likelier with the output
data. Anyway, yes, relying on the user land doing sane things isn't
the best approach to the kernel programming.
If we're inclined to fix this, I'd encourage to take an approach that
works for the confidential VMs as well so we don't have to fix that
again when start upstreaming what we have for SNP and TDX. The
allocation *must* be visible to the hypervisor in the confidential
scenarios.
Or, maybe we could avoid the allocations by reading the first byte
of the user land buffer to "pre-fault" the page outside of the
scope that disables interrupts. Why allocate if we can avoid that?
Could set up also the SMP remote calls to run this on the desired
CPU.
Summarizing for the case you want to change this:
1. Keep interrupts disabled when reading/writing to/from the Hyper-V
driver allocated input and output pages.
2. If you decide to allocate separate pages, make sure they are
visible to the hypervisor in the confidential scenarios. I know
we're not talking SNP and TDX here just yet but it would be
a waste of time imho to build something here and scrape that
later. The issues with allocations are:
a) If allocating on-demand, we might fail the hypercall
because of OOM. That's certainly bad as the whole VM
will break down.
b) If allocating for the whole lifetime of the VM,
let us remember that we avoid using hypercalls
due to their runtime cost. We'll be keeping around
2 pages per CPU for the few times we need them.
3. Consider reading a byte from the user land buffers to make the page
fault happen outside of disabling interrupts. There is no
outswap (maybe could have disabling swap in Kconfig) so the page
will stay in the memory.
If you're not changing this, feel free to keep my "Reviewed-by".
>
> - Saurabh
--
Thank you,
Roman
Powered by blists - more mailing lists