[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <03e7fd98-0df0-f191-4739-8a87726478c0@gmail.com>
Date: Tue, 18 Apr 2023 21:24:07 +0800
From: Tianyu Lan <ltykernel@...il.com>
To: Dave Hansen <dave.hansen@...el.com>, luto@...nel.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
seanjc@...gle.com, pbonzini@...hat.com, jgross@...e.com,
tiala@...rosoft.com, kirill@...temov.name,
jiangshan.ljs@...group.com, peterz@...radead.org,
ashish.kalra@....com, srutherford@...gle.com,
akpm@...ux-foundation.org, anshuman.khandual@....com,
pawan.kumar.gupta@...ux.intel.com, adrian.hunter@...el.com,
daniel.sneddon@...ux.intel.com, alexander.shishkin@...ux.intel.com,
sandipan.das@....com, ray.huang@....com, brijesh.singh@....com,
michael.roth@....com, thomas.lendacky@....com,
venu.busireddy@...cle.com, sterritt@...gle.com,
tony.luck@...el.com, samitolvanen@...gle.com, fenghua.yu@...el.com
Cc: pangupta@....com, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, linux-hyperv@...r.kernel.org,
linux-arch@...r.kernel.org
Subject: Re: [RFC PATCH V4 08/17] x86/hyperv: Initialize cpu and memory for
sev-snp enlightened guest
On 4/12/2023 11:53 PM, Dave Hansen wrote:
>>
>> +static u32 processor_count;
>> +
>> +static __init void hv_snp_get_smp_config(unsigned int early)
>> +{
>> + if (!early) {
>> + while (num_processors < processor_count) {
>> + early_per_cpu(x86_cpu_to_apicid, num_processors) = num_processors;
>> + early_per_cpu(x86_bios_cpu_apicid, num_processors) = num_processors;
>> + physid_set(num_processors, phys_cpu_present_map);
>> + set_cpu_possible(num_processors, true);
>> + set_cpu_present(num_processors, true);
>> + num_processors++;
>> + }
>> + }
>> +}
> Folks, please minimize indentation:
>
> if (early)
> return;
>
> It would also be nice to see*some* explanation in the changelog or
> comments about why it's best and correct to just do nothing if early==1.
>
> Also, this_consumes_ data from hv_sev_init_mem_and_cpu(). It would
> make more sense to me to have them ordered the other way.
> hv_sev_init_mem_and_cpu() first, this second.
Hi Dave
Thanks for your review. Good suggestion! Will update in the next
verison.
>
>> u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
>> {
>> union hv_ghcb *hv_ghcb;
>> @@ -356,6 +377,63 @@ static bool hv_is_private_mmio(u64 addr)
>> return false;
>> }
>>
>> +__init void hv_sev_init_mem_and_cpu(void)
>> +{
>> + struct memory_map_entry *entry;
>> + struct e820_entry *e820_entry;
>> + u64 e820_end;
>> + u64 ram_end;
>> + u64 page;
>> +
>> + /*
>> + * Hyper-V enlightened snp guest boots kernel
>> + * directly without bootloader and so roms,
>> + * bios regions and reserve resources are not
>> + * available. Set these callback to NULL.
>> + */
>> + x86_platform.legacy.reserve_bios_regions = 0;
>> + x86_init.resources.probe_roms = x86_init_noop;
>> + x86_init.resources.reserve_resources = x86_init_noop;
>> + x86_init.mpparse.find_smp_config = x86_init_noop;
>> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
> This is one of those places that vertical alignment adds clarity:
>
>> + x86_init.resources.probe_roms = x86_init_noop;
>> + x86_init.resources.reserve_resources = x86_init_noop;
>> + x86_init.mpparse.find_smp_config = x86_init_noop;
>> + x86_init.mpparse.get_smp_config = hv_snp_get_smp_config;
> See? 3 noops and only one actual implemented function. Clear as day now.
>
Yes, this looks better. Will update.
>> + /*> + * Hyper-V SEV-SNP enlightened guest doesn't support ioapic
>> + * and legacy APIC page read/write. Switch to hv apic here.
>> + */
>> + disable_ioapic_support();
> Do these systems have X86_FEATURE_APIC set? Why is this needed in
> addition to the architectural enumeration that already exists?
>
X86_FEATURE_APIC is still set. Hyper-V provides parav-virtualized local
apic interface to replace APIC page opeartion. In the SEV-SNP guest.
> Is there any other place in the kernel that has this one-off disabling
> of the APIC?
In current kernel code, ioapic support still may be disabled when there
is no MP table or ACPI MADT configuration. Please see
__apic_intr_mode_select() and disable_smp() for detial where ioapic is
disabled.
>
>> + /* Read processor number and memory layout. */
>> + processor_count = *(u32 *)__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
>> + entry = (struct memory_map_entry *)(__va(EN_SEV_SNP_PROCESSOR_INFO_ADDR)
>> + + sizeof(struct memory_map_entry));
> Ick.
>
> There are a lot of ways to do this. But, this is an awfully ugly way.
>
> struct snp_processor_info {
> u32 processor_count;
> struct memory_map_entry[] entries;
> }
>
> struct snp_processor_info *snp_pi =
> __va(EN_SEV_SNP_PROCESSOR_INFO_ADDR);
> processor_count = snp_pi->processor_count;
>
> Then, have your for() loop through snp_pi->entries;
>
> Actually, I'm not_quite_ sure that processor_count and entries are next
> to each other. But, either way, I do think a struct makes sense.
Agree. Will update.
>
> Also, what guarantees that EN_SEV_SNP_PROCESSOR_INFO_ADDR is mapped?
> It's up above 8MB which I don't remember off the top of my head as being
> a special address.
This EN_SEV_SNP_PROCESSOR_INFO_ADDR is specified by hypervisor tool.
Hypervisor populates mem and cpu info to the page in the memory and
kernel may access it via adding PHYS_OFFSET_OFFSET directly.
>
>> + /*
>> + * E820 table in the memory just describes memory for
>> + * kernel, ACPI table, cmdline, boot params and ramdisk.
>> + * Hyper-V popoulates the rest memory layout in the EN_SEV_
>> + * SNP_PROCESSOR_INFO_ADDR.
>> + */
> Really? That is not very cool. We need a better explanation of why
> there was no way to use the decades-old e820 or EFI memory map and why
> this needs to be a special snowflake.
Agree. There should be a comment to describe that there is no virtual
Bios in the guest and hypervisor boots Linux kernel directly. So kernel
needs to populdate e820 tables which should be prepared by virtual Bios.
>
>> + for (; entry->numpages != 0; entry++) {
>> + e820_entry = &e820_table->entries[
>> + e820_table->nr_entries - 1];
>> + e820_end = e820_entry->addr + e820_entry->size;
>> + ram_end = (entry->starting_gpn +
>> + entry->numpages) * PAGE_SIZE;
>> +
>> + if (e820_end < entry->starting_gpn * PAGE_SIZE)
>> + e820_end = entry->starting_gpn * PAGE_SIZE;
>> +
>> + if (e820_end < ram_end) {
>> + pr_info("Hyper-V: add e820 entry [mem %#018Lx-%#018Lx]\n", e820_end, ram_end - 1);
>> + e820__range_add(e820_end, ram_end - e820_end,
>> + E820_TYPE_RAM);
>> + for (page = e820_end; page < ram_end; page += PAGE_SIZE)
>> + pvalidate((unsigned long)__va(page), RMP_PG_SIZE_4K, true);
>> + }
>> + }
>> +}
> Oh, is this just about having a pre-accepted area and a non-accepted
> area? Is this basically another one-off implementation of unaccepted
> memory ... that doesn't use the EFI standard?
No, there is no virtual EFI firmware inside VM and so kernel gets mem
and vcpu info directly from Hyper-V.
Powered by blists - more mailing lists