linux-kernel - Re: [PATCH 09/15] x86/irq: Install posted MSI notification handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a4f169fa-663d-4a94-878b-d783f67d48c9@intel.com>
Date: Fri, 29 Mar 2024 15:32:00 +0800
From: Zeng Guang <guang.zeng@...el.com>
To: Jacob Pan <jacob.jun.pan@...ux.intel.com>,
 LKML <linux-kernel@...r.kernel.org>, X86 Kernel <x86@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>,
 "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
 Thomas Gleixner <tglx@...utronix.de>, Lu Baolu <baolu.lu@...ux.intel.com>,
 "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
 "Hansen, Dave" <dave.hansen@...el.com>, Joerg Roedel <joro@...tes.org>,
 "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
 Ingo Molnar <mingo@...hat.com>
Cc: "Luse, Paul E" <paul.e.luse@...el.com>,
 "Williams, Dan J" <dan.j.williams@...el.com>, Jens Axboe <axboe@...nel.dk>,
 "Raj, Ashok" <ashok.raj@...el.com>, "Tian, Kevin" <kevin.tian@...el.com>,
 "maz@...nel.org" <maz@...nel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
 Robin Murphy <robin.murphy@....com>
Subject: Re: [PATCH 09/15] x86/irq: Install posted MSI notification handler


On 1/27/2024 7:42 AM, Jacob Pan wrote:
> @@ -353,6 +360,111 @@ void intel_posted_msi_init(void)
>   	pid->nv = POSTED_MSI_NOTIFICATION_VECTOR;
>   	pid->ndst = this_cpu_read(x86_cpu_to_apicid);
>   }
> +
> +/*
> + * De-multiplexing posted interrupts is on the performance path, the code
> + * below is written to optimize the cache performance based on the following
> + * considerations:
> + * 1.Posted interrupt descriptor (PID) fits in a cache line that is frequently
> + *   accessed by both CPU and IOMMU.
> + * 2.During posted MSI processing, the CPU needs to do 64-bit read and xchg
> + *   for checking and clearing posted interrupt request (PIR), a 256 bit field
> + *   within the PID.
> + * 3.On the other side, the IOMMU does atomic swaps of the entire PID cache
> + *   line when posting interrupts and setting control bits.
> + * 4.The CPU can access the cache line a magnitude faster than the IOMMU.
> + * 5.Each time the IOMMU does interrupt posting to the PIR will evict the PID
> + *   cache line. The cache line states after each operation are as follows:
> + *   CPU		IOMMU			PID Cache line state
> + *   ---------------------------------------------------------------
> + *...read64					exclusive
> + *...lock xchg64				modified
> + *...			post/atomic swap	invalid
> + *...-------------------------------------------------------------
> + *
> + * To reduce L1 data cache miss, it is important to avoid contention with
> + * IOMMU's interrupt posting/atomic swap. Therefore, a copy of PIR is used
> + * to dispatch interrupt handlers.
> + *
> + * In addition, the code is trying to keep the cache line state consistent
> + * as much as possible. e.g. when making a copy and clearing the PIR
> + * (assuming non-zero PIR bits are present in the entire PIR), it does:
> + *		read, read, read, read, xchg, xchg, xchg, xchg
> + * instead of:
> + *		read, xchg, read, xchg, read, xchg, read, xchg
> + */
> +static __always_inline inline bool handle_pending_pir(u64 *pir, struct pt_regs *regs)
> +{
> +	int i, vec = FIRST_EXTERNAL_VECTOR;
> +	unsigned long pir_copy[4];
> +	bool handled = false;
> +
> +	for (i = 0; i < 4; i++)
> +		pir_copy[i] = pir[i];
> +
> +	for (i = 0; i < 4; i++) {
> +		if (!pir_copy[i])
> +			continue;
> +
> +		pir_copy[i] = arch_xchg(pir, 0);

Here is a problem that pir_copy[i] will always be written as pir[0]. 
This leads to handle spurious posted MSIs later.

> +		handled = true;
> +	}
> +
> +	if (handled) {
> +		for_each_set_bit_from(vec, pir_copy, FIRST_SYSTEM_VECTOR)
> +			call_irq_handler(vec, regs);
> +	}
> +
> +	return handled;
> +}
> +
> +/*
> + * Performance data shows that 3 is good enough to harvest 90+% of the benefit
> + * on high IRQ rate workload.
> + */
> +#define MAX_POSTED_MSI_COALESCING_LOOP 3
> +
> +/*
> + * For MSIs that are delivered as posted interrupts, the CPU notifications
> + * can be coalesced if the MSIs arrive in high frequency bursts.
> + */
> +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
> +{
> +	struct pt_regs *old_regs = set_irq_regs(regs);
> +	struct pi_desc *pid;
> +	int i = 0;
> +
> +	pid = this_cpu_ptr(&posted_interrupt_desc);
> +
> +	inc_irq_stat(posted_msi_notification_count);
> +	irq_enter();
> +
> +	/*
> +	 * Max coalescing count includes the extra round of handle_pending_pir
> +	 * after clearing the outstanding notification bit. Hence, at most
> +	 * MAX_POSTED_MSI_COALESCING_LOOP - 1 loops are executed here.
> +	 */
> +	while (++i < MAX_POSTED_MSI_COALESCING_LOOP) {
> +		if (!handle_pending_pir(pid->pir64, regs))
> +			break;
> +	}
> +
> +	/*
> +	 * Clear outstanding notification bit to allow new IRQ notifications,
> +	 * do this last to maximize the window of interrupt coalescing.
> +	 */
> +	pi_clear_on(pid);
> +
> +	/*
> +	 * There could be a race of PI notification and the clearing of ON bit,
> +	 * process PIR bits one last time such that handling the new interrupts
> +	 * are not delayed until the next IRQ.
> +	 */
> +	handle_pending_pir(pid->pir64, regs);
> +
> +	apic_eoi();
> +	irq_exit();
> +	set_irq_regs(old_regs);
>   }
>   #endif /* X86_POSTED_MSI */
>