[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7cfe68c2-a84a-4416-a9ca-3bf5225190a1@maciej.szmigiero.name>
Date: Tue, 19 Aug 2025 15:34:54 +0200
From: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
To: Naveen N Rao <naveen@...nel.org>, Sean Christopherson <seanjc@...gle.com>
Cc: mlevitsk@...hat.com, Paolo Bonzini <pbonzini@...hat.com>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Vasant Hegde <vasant.hegde@....com>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Subject: Re: [EARLY RFC] KVM: SVM: Enable AVIC by default from Zen 4
On 18.07.2025 10:19, Naveen N Rao wrote:
> On Mon, Jul 07, 2025 at 04:21:53PM -0700, Sean Christopherson wrote:
>> On Fri, Jun 27, 2025, Naveen N Rao wrote:
>>>> Back when I implemented this, I just wanted to be a bit safer, a bit more explicit that
>>>> this uses an undocumented feature.
>>>>
>>>> It doesn't matter much though.
>>>>
>>>>>
>>>>> I don't see any reason to do major surgery, just give "avic" auto -1/0/1 behavior:
>>>
>>> I am wary of breaking existing users/deployments on Zen4/Zen5 enabling
>>> AVIC by specifying avic=on, or avic=true today. That's primarily the
>>> reason I chose not to change 'avic' into an integer. Also, post module
>>> load, sysfs reports the value for 'avic' as a 'Y' or 'N' today. So if
>>> there are scripts relying on that, those will break if we change 'avic'
>>> into an integer.
>>
>> That's easy enough to handle, e.g. see nx_huge_pages_ops for a very similar case
>> where KVM has "auto" behavior (and a "never" case too), but otherwise treats the
>> param like a bool.
>
> Nice! Looks like I can re-use existing callbacks for this too:
> static const struct kernel_param_ops avic_ops = {
> .flags = KERNEL_PARAM_OPS_FL_NOARG,
> .set = param_set_bint,
> .get = param_get_bool,
> };
>
> /* enable/disable AVIC (-1 = auto) */
> int avic = -1;
> module_param_cb(avic, &avic_ops, &avic, 0444);
> __MODULE_PARM_TYPE(avic, "bool");
>
>>
>>> For Zen1/Zen2, as I mentioned, it is unlikely that anyone today is
>>> enabling AVIC and expecting it to work since the workaround is only just
>>> hitting upstream. So, I'm hoping requiring force_avic=1 should be ok
>>> with the taint removed.
>>
>> But if that's the motivation, changing the semantics of force_avic doesn't make
>> any sense. Once the workaround lands, the only reason for force_avic to exist
>> is to allow forcing KVM to enable AVIC even when it's not supported.
>
> Indeed.
>
>>
>>> Longer term, once we get wider testing with the workaround on Zen1/Zen2,
>>> we can consider relaxing the need for force_avic, at which point AVIC
>>> can be default enabled
>>
>> I don't see why the default value for "avic" needs to be tied to force_avic.
>> If we're not confident that AVIC is 100% functional and a net positive for the
>> vast majority of setups/workloads on Zen1/Zen2, then simply leave "avic" off by
>> default for those CPUs. If we ever want to enable AVIC by default across the
>> board, we can simply change the default value of "avic".
>>
>> But to be honest, I don't see any reason to bother trying to enable AVIC by default
>> for Zen1/Zen2. There's a very real risk that doing so would regress existing users
>> that have been running setups for ~6 years, and we can't fudge around AVIC being
>> hidden on Zen3 (and the IOMMU not supporting it at all), i.e. enabling AVIC by
>> default only for Zen4+ provides a cleaner story for end users.
>
> Works for me. I completely agree with that.
Some Zen3 platforms (at least Ryzen SKUs) *do* enumerate AVIC in CPUID:
> $ cat /proc/cpuinfo | grep -E '(model[[:space:]]{2,})|family|step' | head -n3
> cpu family : 25
> model : 33
> stepping : 2
>> $ cat /proc/cpuinfo | grep avic | head -n1
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb
> rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid
> aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe
> popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy
> abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core
> perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs
> ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed
> adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc
> cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr
> rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
> decodeassists pausefilter pfthreshold -> avic <-
> v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov
> succor smca fsrm debug_swap
>
> $ cat /sys/module/kvm_amd/parameters/force_avic
> N
>
> $ dmesg | grep AVIC
> kvm_amd: AVIC enabled
As you can see, currently AVIC works there even without force_avic=1 so why
now hide it behind that parameter if errata #1235 is supposedly not present
on Zen3?
Also, this platform is apparently confident enough that the AVIC silicon is
working correctly there to expose it in CPUID - maybe because that's CPU
stepping 2 instead of the initial 0?
> Thanks,
> Naveen
Thanks,
Maciej
Powered by blists - more mailing lists