linux-kernel - Re: [RFC v2 05/17] x86/apic: Add update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ecc20053-42ae-43ba-b1b3-748abe9d5b3b@amd.com>
Date: Fri, 21 Mar 2025 21:05:56 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Cc: bp@...en8.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
 Thomas.Lendacky@....com, nikunj@....com, Santosh.Shukla@....com,
 Vasant.Hegde@....com, Suravee.Suthikulpanit@....com, David.Kaplan@....com,
 x86@...nel.org, hpa@...or.com, peterz@...radead.org, seanjc@...gle.com,
 pbonzini@...hat.com, kvm@...r.kernel.org, kirill.shutemov@...ux.intel.com,
 huibo.wang@....com, naveen.rao@....com
Subject: Re: [RFC v2 05/17] x86/apic: Add update_vector callback for Secure
 AVIC



On 3/21/2025 7:57 PM, Thomas Gleixner wrote:
> On Wed, Feb 26 2025 at 14:35, Neeraj Upadhyay wrote:
>> Add update_vector callback to set/clear ALLOWED_IRR field in
>> the APIC backing page. The ALLOWED_IRR field indicates the
>> interrupt vectors which the guest allows the hypervisor to
>> send (typically for emulated devices). Interrupt vectors used
>> exclusively by the guest itself (like IPI vectors) should not
>> be allowed to be injected into the guest for security reasons.
>> The update_vector callback is invoked from APIC vector domain
>> whenever a vector is allocated, freed or moved.
> 
> Your changelog tells a lot about the WHAT. Please read and follow the
> documentation, which describes how a change log should be structured.
> 
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog
> 

Ok

>> diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
>> index 72fa4bb78f0a..e0c9505e05f8 100644
>> --- a/arch/x86/kernel/apic/vector.c
>> +++ b/arch/x86/kernel/apic/vector.c
>> @@ -174,6 +174,8 @@ static void apic_update_vector(struct irq_data *irqd, unsigned int newvec,
>>  		apicd->prev_cpu = apicd->cpu;
>>  		WARN_ON_ONCE(apicd->cpu == newcpu);
>>  	} else {
>> +		if (apic->update_vector)
>> +			apic->update_vector(apicd->cpu, apicd->vector, false);
>>  		irq_matrix_free(vector_matrix, apicd->cpu, apicd->vector,
>>  				managed);
>>  	}
>> @@ -183,6 +185,8 @@ static void apic_update_vector(struct irq_data *irqd, unsigned int newvec,
>>  	apicd->cpu = newcpu;
>>  	BUG_ON(!IS_ERR_OR_NULL(per_cpu(vector_irq, newcpu)[newvec]));
>>  	per_cpu(vector_irq, newcpu)[newvec] = desc;
>> +	if (apic->update_vector)
>> +		apic->update_vector(apicd->cpu, apicd->vector, true);
> 
> A trivial
> 
> static inline void apic_update_vector(....)
> {
>         if (apic->update_vector)
>            ....
> }
> 
> would be too easy to read and add not enough line count, right?
> 

Yes.

>>  static void vector_assign_managed_shutdown(struct irq_data *irqd)
>> @@ -528,11 +532,15 @@ static bool vector_configure_legacy(unsigned int virq, struct irq_data *irqd,
>>  	if (irqd_is_activated(irqd)) {
>>  		trace_vector_setup(virq, true, 0);
>>  		apic_update_irq_cfg(irqd, apicd->vector, apicd->cpu);
>> +		if (apic->update_vector)
>> +			apic->update_vector(apicd->cpu, apicd->vector, true);
>>  	} else {
>>  		/* Release the vector */
>>  		apicd->can_reserve = true;
>>  		irqd_set_can_reserve(irqd);
>>  		clear_irq_vector(irqd);
>> +		if (apic->update_vector)
>> +			apic->update_vector(apicd->cpu, apicd->vector, false);
>>  		realloc = true;
> 
> This is as incomplete as it gets. None of the other code paths which
> invoke clear_irq_vector() nor those which invoke free_moved_vector() are
> mopping up the leftovers in the backing page.
> 
> And no, you don't sprinkle this nonsense all over the call sites. There
> is only a very limited number of functions which are involed in setting
> up and tearing down a vector. Doing this at the call sites is a
> guarantee for missing out as you demonstrated.
> 

This is the part where I was looking for guidance. As ALLOWED_IRR (which
tells if Hypervisor is allowed to inject a vector to guest vCPU) is per
CPU, intent was to call it at places where vector's CPU affinity changes.
I surely have missed cleaning up ALLOWED_IRR on previously affined CPU.
I will follow your suggestion to do it during setup/teardown of vector (need
to figure out those functions) and configure it for all CPUs in those
functions.

>> +#define VEC_POS(v)	((v) & (32 - 1))
>> +#define REG_POS(v)	(((v) >> 5) << 4)
> 
> This is unreadable, undocumented and incomprehensible garbage.
> 

I will update it.

>>  static DEFINE_PER_CPU(void *, apic_backing_page);
>>  
>>  struct apic_id_node {
>> @@ -192,6 +195,22 @@ static void x2apic_savic_send_IPI_mask_allbutself(const struct cpumask *mask, in
>>  	__send_IPI_mask(mask, vector, APIC_DEST_ALLBUT);
>>  }
>>  
>> +static void x2apic_savic_update_vector(unsigned int cpu, unsigned int vector, bool set)
>> +{
>> +	void *backing_page;
>> +	unsigned long *reg;
>> +	int reg_off;
>> +
>> +	backing_page = per_cpu(apic_backing_page, cpu);
>> +	reg_off = SAVIC_ALLOWED_IRR_OFFSET + REG_POS(vector);
>> +	reg = (unsigned long *)((char *)backing_page + reg_off);
>> +
>> +	if (set)
>> +		test_and_set_bit(VEC_POS(vector), reg);
>> +	else
>> +		test_and_clear_bit(VEC_POS(vector), reg);
>> +}
> 
> What's the test_and_ for if you ignore the return value anyway?
>

To not set it again if it already set. I will switch to set_bit/clear_bit()
as test_and_ is not necessary.

 
> Als I have no idea what SAVIC_ALLOWED_IRR_OFFSET means. Whether it's
> something from the datashit or a made up thing does not matter. It's
> patently non-informative.
> 

Ok, I had tried to give some details in the cover letter. These APIC
regs are at offset APIC_IRR(n) + 4 and are used by guest to configure the
interrupt vectors which can be injected by Hypervisor to Guest.


> Again:
> 
> struct apic_page {
> 	union {
> 		u32	regs[NR_APIC_REGS];
> 		u8	bytes[PAGE_SIZE];
> 	};
> };                
> 
>        struct apic_page *ap = this_cpu_ptr(apic_page);
>        unsigned long *sirr;
> 
>        /*
>         * apic_page.regs[SAVIC_ALLOWED_IRR_OFFSET...] is an array of
>         * consecutive 32-bit registers, which represents a vector bitmap.
>         */
>         sirr = (unsigned long *) &ap->regs[SAVIC_ALLOWED_IRR_OFFSET];
>         if (set)
>         	set_bit(sirr, vector);
>         else
>         	clear_bit(sirr, vector);
> 
> See how code suddenly becomes self explaining, obvious and
> comprehensible?
> 

Yes, thank you!

- Neeraj

> Thanks,
> 
>         tglx