linux-kernel - Re: [PATCH v6 28/31] irqchip/gic-v5: Add GICv5 ITS support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250702150624.00007ceb@huawei.com>
Date: Wed, 2 Jul 2025 15:06:24 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: Lorenzo Pieralisi <lpieralisi@...nel.org>
CC: Marc Zyngier <maz@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, "Rob
 Herring" <robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>, "Conor
 Dooley" <conor+dt@...nel.org>, Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>, Arnd Bergmann <arnd@...db.de>, "Sascha
 Bischoff" <sascha.bischoff@....com>, Timothy Hayes <timothy.hayes@....com>,
	Bjorn Helgaas <bhelgaas@...gle.com>, "Liam R. Howlett"
	<Liam.Howlett@...cle.com>, Peter Maydell <peter.maydell@...aro.org>, "Mark
 Rutland" <mark.rutland@....com>, Jiri Slaby <jirislaby@...nel.org>,
	<linux-arm-kernel@...ts.infradead.org>, <linux-kernel@...r.kernel.org>,
	<devicetree@...r.kernel.org>, <linux-pci@...r.kernel.org>
Subject: Re: [PATCH v6 28/31] irqchip/gic-v5: Add GICv5 ITS support

On Thu, 26 Jun 2025 12:26:19 +0200
Lorenzo Pieralisi <lpieralisi@...nel.org> wrote:

> The GICv5 architecture implements Interrupt Translation Service
> (ITS) components in order to translate events coming from peripherals
> into interrupt events delivered to the connected IRSes.
> 
> Events (ie MSI memory writes to ITS translate frame), are translated
> by the ITS using tables kept in memory.
> 
> ITS translation tables for peripherals is kept in memory storage
> (device table [DT] and Interrupt Translation Table [ITT]) that
> is allocated by the driver on boot.
> 
> Both tables can be 1- or 2-level; the structure is chosen by the
> driver after probing the ITS HW parameters and checking the
> allowed table splits and supported {device/event}_IDbits.
> 
> DT table entries are allocated on demand (ie when a device is
> probed); the DT table is sized using the number of supported
> deviceID bits in that that's a system design decision (ie the
> number of deviceID bits implemented should reflect the number
> of devices expected in a system) therefore it makes sense to
> allocate a DT table that can cater for the maximum number of
> devices.
> 
> DT and ITT tables are allocated using the kmalloc interface;
> the allocation size may be smaller than a page or larger,
> and must provide contiguous memory pages.
> 
> LPIs INTIDs backing the device events are allocated one-by-one
> and only upon Linux IRQ allocation; this to avoid preallocating
> a large number of LPIs to cover the HW device MSI vector
> size whereas few MSI entries are actually enabled by a device.
> 
> ITS cacheability/shareability attributes are programmed
> according to the provided firmware ITS description.
> 
> The GICv5 partially reuses the GICv3 ITS MSI parent infrastructure
> and adds functions required to retrieve the ITS translate frame
> addresses out of msi-map and msi-parent properties to implement
> the GICv5 ITS MSI parent callbacks.
> 
> Co-developed-by: Sascha Bischoff <sascha.bischoff@....com>
> Signed-off-by: Sascha Bischoff <sascha.bischoff@....com>
> Co-developed-by: Timothy Hayes <timothy.hayes@....com>
> Signed-off-by: Timothy Hayes <timothy.hayes@....com>
> Signed-off-by: Lorenzo Pieralisi <lpieralisi@...nel.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Marc Zyngier <maz@...nel.org>

Hi Lorenzo,

It almost certainly doesn't matter, but there are a couple of release
paths in here where things don't happen in the same order as the
equivalent error tear down paths (i.e. not reverse of setup).

There may well be a good reason for that but I couldn't immediately
spot what it was.  Also a follow up similar to earlier comment about
the table sizing code not matching the comments above it. Same thing
going on here.

Jonathan


git a/drivers/irqchip/irq-gic-v5-its.c b/drivers/irqchip/irq-gic-v5-its.c
> new file mode 100644
> index 000000000000..cba632eb0273
> --- /dev/null
> +++ b/drivers/irqchip/irq-gic-v5-its.c

> +/*
> + * Function to check whether the device table or ITT table support
> + * a two-level table and if so depending on the number of id_bits
> + * requested, determine whether a two-level table is required.
> + *
> + * Return the 2-level size value if a two level table is deemed
> + * necessary.
> + */
> +static bool gicv5_its_l2sz_two_level(bool devtab, u32 its_idr1, u8 id_bits, u8 *sz)
> +{
> +	unsigned int l2_bits, l2_sz;
> +
> +	if (devtab && !FIELD_GET(GICV5_ITS_IDR1_DT_LEVELS, its_idr1))
> +		return false;
> +
> +	if (!devtab && !FIELD_GET(GICV5_ITS_IDR1_ITT_LEVELS, its_idr1))
> +		return false;
> +
> +	/*
> +	 * Pick an L2 size that matches the pagesize; if a match
> +	 * is not found, go for the smallest supported l2 size granule.

Similar to before, this description is confusing.  If Page size is 64K
and 16 + 4 are supported we choose 16 which is not he smallest
supported (4 is).  The condition the comment refers to only applies
if only larger than pagesized things are supported.

> +	 *
> +	 * This ensures that we will always be able to allocate
> +	 * contiguous memory at L2.
> +	 */
> +	switch (PAGE_SIZE) {
> +	case SZ_64K:
> +		if (GICV5_ITS_IDR1_L2SZ_SUPPORT_64KB(its_idr1)) {
> +			l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_64k;
> +			break;
> +		}
> +		fallthrough;
> +	case SZ_16K:
> +		if (GICV5_ITS_IDR1_L2SZ_SUPPORT_16KB(its_idr1)) {
> +			l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_16k;
> +			break;
> +		}
> +		fallthrough;
> +	case SZ_4K:
> +		if (GICV5_ITS_IDR1_L2SZ_SUPPORT_4KB(its_idr1)) {
> +			l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_4k;
> +			break;
> +		}
> +		if (GICV5_ITS_IDR1_L2SZ_SUPPORT_16KB(its_idr1)) {
> +			l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_16k;
> +			break;
> +		}
> +		if (GICV5_ITS_IDR1_L2SZ_SUPPORT_64KB(its_idr1)) {
> +			l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_64k;
> +			break;
> +		}
> +
> +		l2_sz = GICV5_ITS_DT_ITT_CFGR_L2SZ_4k;
> +		break;
> +	}
> +
> +	l2_bits = gicv5_its_l2sz_to_l2_bits(l2_sz);
> +
> +	if (l2_bits > id_bits)
> +		return false;
> +
> +	*sz = l2_sz;
> +
> +	return true;
> +}



> +/*
> + * Register a new device in the device table. Allocate an ITT and
> + * program the L2DTE entry according to the ITT structure that
> + * was chosen.
> + */
> +static int gicv5_its_device_register(struct gicv5_its_chip_data *its,
> +				     struct gicv5_its_dev *its_dev)
> +{
> +	u8 event_id_bits, device_id_bits, itt_struct, itt_l2sz;
> +	phys_addr_t itt_phys_base;
> +	bool two_level_itt;
> +	u32 idr1, idr2;
> +	__le64 *dte;
> +	u64 val;
> +	int ret;
> +
> +	device_id_bits = devtab_cfgr_field(its, DEVICEID_BITS);
> +
> +	if (its_dev->device_id >= BIT(device_id_bits)) {
> +		pr_err("Supplied DeviceID (%u) outside of Device Table range (%u)!",
> +		       its_dev->device_id, (u32)GENMASK(device_id_bits - 1, 0));
> +		return -EINVAL;
> +	}
> +
> +	dte = gicv5_its_devtab_get_dte_ref(its, its_dev->device_id, true);
> +	if (!dte)
> +		return -ENOMEM;
> +
> +	if (FIELD_GET(GICV5_DTL2E_VALID, le64_to_cpu(*dte)))
> +		return -EBUSY;
> +
> +	/*
> +	 * Determine how many bits we need, validate those against the max.
> +	 * Based on these, determine if we should go for a 1- or 2-level ITT.
> +	 */
> +	event_id_bits = order_base_2(its_dev->num_events);
> +
> +	idr2 = its_readl_relaxed(its, GICV5_ITS_IDR2);
> +
> +	if (event_id_bits > FIELD_GET(GICV5_ITS_IDR2_EVENTID_BITS, idr2)) {
> +		pr_err("Required EventID bits (%u) larger than supported bits (%u)!",
> +		       event_id_bits,
> +		       (u8)FIELD_GET(GICV5_ITS_IDR2_EVENTID_BITS, idr2));
> +		return -EINVAL;
> +	}
> +
> +	idr1 = its_readl_relaxed(its, GICV5_ITS_IDR1);
> +
> +	/*
> +	 * L2 ITT size is programmed into the L2DTE regardless of
> +	 * whether a two-level or linear ITT is built, init it.
> +	 */
> +	itt_l2sz = 0;
> +
> +	two_level_itt = gicv5_its_l2sz_two_level(false, idr1, event_id_bits,
> +						  &itt_l2sz);
> +	if (two_level_itt)
> +		ret = gicv5_its_create_itt_two_level(its, its_dev, event_id_bits,
> +						     itt_l2sz,
> +						     its_dev->num_events);
> +	else
> +		ret = gicv5_its_create_itt_linear(its, its_dev, event_id_bits);
> +	if (ret)
> +		return ret;
> +
> +	itt_phys_base = two_level_itt ? virt_to_phys(its_dev->itt_cfg.l2.l1itt) :
> +					virt_to_phys(its_dev->itt_cfg.linear.itt);
> +
> +	itt_struct = two_level_itt ? GICV5_ITS_DT_ITT_CFGR_STRUCTURE_TWO_LEVEL :
> +				     GICV5_ITS_DT_ITT_CFGR_STRUCTURE_LINEAR;
> +
> +	val = FIELD_PREP(GICV5_DTL2E_EVENT_ID_BITS, event_id_bits)	|
> +	      FIELD_PREP(GICV5_DTL2E_ITT_STRUCTURE, itt_struct)		|
> +	      (itt_phys_base & GICV5_DTL2E_ITT_ADDR_MASK)		|
> +	      FIELD_PREP(GICV5_DTL2E_ITT_L2SZ, itt_l2sz)		|
> +	      FIELD_PREP(GICV5_DTL2E_VALID, 0x1);
> +
> +	its_write_table_entry(its, dte, val);
> +
> +	ret = gicv5_its_device_cache_inv(its, its_dev);
> +	if (ret) {
> +		gicv5_its_free_itt(its_dev);
> +		its_write_table_entry(its, dte, 0);

If it makes no difference, unwind in reverse order of setup so swap the
two lines above.

> +		return ret;
> +	}
> +
> +	return 0;
> +}

> +static struct gicv5_its_dev *gicv5_its_alloc_device(struct gicv5_its_chip_data *its, int nvec,
> +						    u32 dev_id)
> +{
> +	struct gicv5_its_dev *its_dev;
> +	void *entry;
> +	int ret;
> +
> +	its_dev = gicv5_its_find_device(its, dev_id);
> +	if (!IS_ERR(its_dev)) {
> +		pr_err("A device with this DeviceID (0x%x) has already been registered.\n",
> +		       dev_id);
> +
> +		return ERR_PTR(-EBUSY);
> +	}
> +
> +	its_dev = kzalloc(sizeof(*its_dev), GFP_KERNEL);
> +	if (!its_dev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	its_dev->device_id = dev_id;
> +	its_dev->num_events = nvec;
> +
> +	ret = gicv5_its_device_register(its, its_dev);
> +	if (ret) {
> +		pr_err("Failed to register the device\n");
> +		goto out_dev_free;
> +	}
> +
> +	gicv5_its_device_cache_inv(its, its_dev);
> +
> +	its_dev->its_node = its;
> +
> +	its_dev->event_map = (unsigned long *)bitmap_zalloc(its_dev->num_events, GFP_KERNEL);
> +	if (!its_dev->event_map) {
> +		ret = -ENOMEM;
> +		goto out_unregister;
> +	}
> +
> +	entry = xa_store(&its->its_devices, dev_id, its_dev, GFP_KERNEL);
> +	if (xa_is_err(entry)) {
> +		ret = xa_err(entry);
> +		goto out_bitmap_free;
> +	}
> +
> +	return its_dev;
> +
> +out_bitmap_free:
> +	bitmap_free(its_dev->event_map);
> +out_unregister:
> +	gicv5_its_device_unregister(its, its_dev);
> +out_dev_free:
> +	kfree(its_dev);
> +	return ERR_PTR(ret);
> +}
> +
> +static int gicv5_its_msi_prepare(struct irq_domain *domain, struct device *dev,
> +				 int nvec, msi_alloc_info_t *info)
> +{
> +	u32 dev_id = info->scratchpad[0].ul;
> +	struct msi_domain_info *msi_info;
> +	struct gicv5_its_chip_data *its;
> +	struct gicv5_its_dev *its_dev;
> +
> +	msi_info = msi_get_domain_info(domain);
> +	its = msi_info->data;
> +
> +	guard(mutex)(&its->dev_alloc_lock);
> +
> +	its_dev = gicv5_its_alloc_device(its, nvec, dev_id);
> +	if (IS_ERR(its_dev))
> +		return PTR_ERR(its_dev);
> +
> +	its_dev->its_trans_phys_base = info->scratchpad[1].ul;
> +	info->scratchpad[0].ptr = its_dev;
> +
> +	return 0;
> +}
> +
> +static void gicv5_its_msi_teardown(struct irq_domain *domain, msi_alloc_info_t *info)
> +{
> +	struct gicv5_its_dev *its_dev = info->scratchpad[0].ptr;
> +	struct msi_domain_info *msi_info;
> +	struct gicv5_its_chip_data *its;
> +
> +	msi_info = msi_get_domain_info(domain);
> +	its = msi_info->data;
> +
> +	guard(mutex)(&its->dev_alloc_lock);
> +
> +	if (WARN_ON_ONCE(!bitmap_empty(its_dev->event_map, its_dev->num_events)))
> +		return;
> +
> +	gicv5_its_device_unregister(its, its_dev);
> +	bitmap_free(its_dev->event_map);
> +	xa_erase(&its->its_devices, its_dev->device_id);

I was expecting this to be in reverse order of what happens in *msi_prepare (and *msi_alloc under
that). That would give the order

	xa_erase();
	bitmap_free();
	gicv5_its_device_unregister();
	kfree(its_dev);

If there is a reason for this ordering it might be good to add a comment calling it out.

 
> +	kfree(its_dev);
> +}