linux-kernel - Re: [PATCH] irqchip/gic-v4: Fix vcpus racing for vpe->col

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <868qylicj5.wl-maz@kernel.org>
Date: Mon, 01 Jul 2024 12:20:46 +0100
From: Marc Zyngier <maz@...nel.org>
To: Nianyao Tang <tangnianyao@...wei.com>
Cc: <tglx@...utronix.de>,
	<linux-arm-kernel@...ts.infradead.org>,
	<linux-kernel@...r.kernel.org>,
	<guoyang2@...wei.com>,
	<wangwudi@...ilicon.com>
Subject: Re: [PATCH] irqchip/gic-v4: Fix vcpus racing for vpe->col_idx in vmapp and vmovp

On Mon, 01 Jul 2024 08:23:05 +0100,
Nianyao Tang <tangnianyao@...wei.com> wrote:
> 
> its_map_vm may modify vpe->col_idx without holding vpe->vpe_lock.
> It would result in a vpe resident on one RD after vmovp to a different RD.
> Or, a vpe maybe vmovp to a RD same as it is current mapped in vpe table.
> 
> On a 2-ITS, GICv4 enabled system, 32 vcpus deployed on cpu of collection 0
> and 1. Two pci devices route VLPIs, using each of the ITS.
> VPE ready to reside on RD1 may have such unexpected case because another
> vcpu on other cpu is doing vmapp and modify his vpe->col_idx.
> 
> Unexpected Case 1:
> RD                0                              1
>                                            vcpu_load
>                                            lock vpe_lock
>                                            vpe->col_idx = 1
>             its_map_vm
>             lock vmovp_lock
>                                            waiting vmovp_lock
>             vpe->col_idx = 0
>             (cpu0 is first online cpu)
>             vmapp vpe on col0
>             unlock vmovp_lock
>                                            lock vmovp_lock
>                                            vmovp vpe to col0
>                                            unlock vmovp_lock
>                                            vpe resident here fail to
>                                              receive VLPI!
> 
> Unexpected Case 2:
> RD                0                              1
>             its_map_vm                     vcpu_load
>             lock vmovp_lock                lock vpe_lock
>             vpe->col_idx = 0
>                                            vpe->col_idx = 1
>             vmapp vpe on col1              waiting vmovp_lock
>             unlock vmovp_lock
>                                            lock vmovp_lock
>                                            vmovp vpe to col1
>                                            (target RD == source RD!)
>                                            unlock vmovp_lock

Why is this second case a problem? What is the conclusion? This commit
message doesn't explain how you are solving it.

> 
> 
> 
> Signed-off-by: Nianyao Tang <tangnianyao@...wei.com>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index f99c0a86320b..adda9824e0e7 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1794,11 +1794,15 @@ static bool gic_requires_eager_mapping(void)
>  static void its_map_vm(struct its_node *its, struct its_vm *vm)
>  {
>  	unsigned long flags;
> +	bool vm_mapped_on_any_its = false;
> +	int i;
>  
>  	if (gic_requires_eager_mapping())
>  		return;
>  
> -	raw_spin_lock_irqsave(&vmovp_lock, flags);
> +	for (i = 0; i < GICv4_ITS_LIST_MAX; i++)
> +		if (vm->vlpi_count[i] > 0)
> +			vm_mapped_on_any_its = true;

What makes you think that dropping the vmovp lock is a good idea? What
if you have a concurrent unmap?

>  
>  	/*
>  	 * If the VM wasn't mapped yet, iterate over the vpes and get
> @@ -1813,15 +1817,19 @@ static void its_map_vm(struct its_node *its, struct its_vm *vm)
>  			struct its_vpe *vpe = vm->vpes[i];
>  			struct irq_data *d = irq_get_irq_data(vpe->irq);
>  
> -			/* Map the VPE to the first possible CPU */
> -			vpe->col_idx = cpumask_first(cpu_online_mask);
> +			raw_spin_lock_irqsave(&vpe->vpe_lock, flags);
> +
> +			if (!vm_mapped_on_any_its) {
> +				/* Map the VPE to the first possible CPU */
> +				vpe->col_idx = cpumask_first(cpu_online_mask);
> +			}

If the issue is that the target RD isn't initialised before we issue a
VMAPP, then why not initialising it right from the start?

>  			its_send_vmapp(its, vpe, true);
>  			its_send_vinvall(its, vpe);
>  			irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx));
> +
> +			raw_spin_unlock_irqrestore(&vpe->vpe_lock, flags);
>  		}
>  	}
> -
> -	raw_spin_unlock_irqrestore(&vmovp_lock, flags);
>  }
>  
>  static void its_unmap_vm(struct its_node *its, struct its_vm *vm)

I don't think this patch makes much sense. It opens an even bigger
race between map and unmap, and I really don't think we want that. The
main issue is that you are trying to avoid fixing the root cause of
the problem...

The first course of action is to move the vpe->col_idx init and
affinity update to activation time, which would make all
implementations (v4 with and without VMOVP list, v4.1) behave the
same. On its own, this solves the biggest issue.

The other thing would be to ensure that this lazy VMAPP cannot take
place concurrently with vcpu_load(). We can't take the VPE lock after
the vmovp_lock, as this violates the lock ordering established in
its_vpe_set_affinity(), which locks the VPE before doing anything else.

A possibility would be to take *all* vpe locks *before* taking the
vmovp lock, ensuring that vmapp sees something consistent. But that's
potentially huge, and likely to cause stalls on a busy VM. Instead, a
per-VM lock would do the trick and allow each VPE lock to be taken in
turn.

With that, we can fix the locking correctly, and make sure that
vcpu_load and vmapp are mutually exclusive.

I've written the small patch series below (compile-tested only).
Please let me know how it fares for you.

	M.

From a9857d782fc649bc08db953477bed9b8c3421118 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@...nel.org>
Date: Mon, 1 Jul 2024 10:39:19 +0100
Subject: [PATCH 1/3] irqchip/gic-v4: Always configure affinity on VPE
 activation

We currently have two paths to set the initial affinity of a VPE:

- at activation time on GICv4 without the stupid VMOVP list, and
  on GICv4.1

- at map time for GICv4 with VMOVP list

The latter location may end-up modifying the affinity of VPE
that is currently running, making the results unpredictible.

Instead, unify the two paths, making sure we only set the
initial affinity at activation time.

Signed-off-by: Marc Zyngier <maz@...nel.org>
---
 drivers/irqchip/irq-gic-v3-its.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 3c755d5dad6e6..a00c5e8c4ea65 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1809,13 +1809,9 @@ static void its_map_vm(struct its_node *its, struct its_vm *vm)
 
 		for (i = 0; i < vm->nr_vpes; i++) {
 			struct its_vpe *vpe = vm->vpes[i];
-			struct irq_data *d = irq_get_irq_data(vpe->irq);
 
-			/* Map the VPE to the first possible CPU */
-			vpe->col_idx = cpumask_first(cpu_online_mask);
 			its_send_vmapp(its, vpe, true);
 			its_send_vinvall(its, vpe);
-			irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx));
 		}
 	}
 
@@ -4562,6 +4558,10 @@ static int its_vpe_irq_domain_activate(struct irq_domain *domain,
 	struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
 	struct its_node *its;
 
+	/* Map the VPE to the first possible CPU */
+	vpe->col_idx = cpumask_first(cpu_online_mask);
+	irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx));
+
 	/*
 	 * If we use the list map, we issue VMAPP on demand... Unless
 	 * we're on a GICv4.1 and we eagerly map the VPE on all ITSs
@@ -4570,9 +4570,6 @@ static int its_vpe_irq_domain_activate(struct irq_domain *domain,
 	if (!gic_requires_eager_mapping())
 		return 0;
 
-	/* Map the VPE to the first possible CPU */
-	vpe->col_idx = cpumask_first(cpu_online_mask);
-
 	list_for_each_entry(its, &its_nodes, entry) {
 		if (!is_v4(its))
 			continue;
@@ -4581,8 +4578,6 @@ static int its_vpe_irq_domain_activate(struct irq_domain *domain,
 		its_send_vinvall(its, vpe);
 	}
 
-	irq_data_update_effective_affinity(d, cpumask_of(vpe->col_idx));
-
 	return 0;
 }
 
-- 
2.39.2


From e3c53afb6b3b5e9e4dc5963f9f70350455e7e986 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@...nel.org>
Date: Mon, 1 Jul 2024 12:00:13 +0100
Subject: [PATCH 2/3] irqchip/gic-v4: Substitute vmovp_lock for a per-VM lock

vmovp_lock is abused in a number of cases to serialise updates
to vlpi_count[] and deal with map/unmap of a VM to ITSs.

Instead, provide such a lock and revisit the use of vlpi_count[]
so that it is always wrapped in this per-VM vmapp_lock.

Signed-off-by: Marc Zyngier <maz@...nel.org>
---
 drivers/irqchip/irq-gic-v3-its.c   | 21 +++++++++++----------
 include/linux/irqchip/arm-gic-v4.h |  5 +++++
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index a00c5e8c4ea65..7c37fbe97afbe 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1338,6 +1338,11 @@ static void its_send_vmovp(struct its_vpe *vpe)
 	 * Wall <-- Head.
 	 */
 	raw_spin_lock_irqsave(&vmovp_lock, flags);
+	/*
+	 * Protect against concurrent updates of the mapping state on
+	 * individual VMs.
+	 */
+	raw_spin_lock(&vpe->its_vm->vmapp_lock);
 
 	desc.its_vmovp_cmd.seq_num = vmovp_seq_num++;
 	desc.its_vmovp_cmd.its_list = get_its_list(vpe->its_vm);
@@ -1354,6 +1359,7 @@ static void its_send_vmovp(struct its_vpe *vpe)
 		its_send_single_vcommand(its, its_build_vmovp_cmd, &desc);
 	}
 
+	raw_spin_unlock(&vpe->its_vm->vmapp_lock);
 	raw_spin_unlock_irqrestore(&vmovp_lock, flags);
 }
 
@@ -1791,12 +1797,10 @@ static bool gic_requires_eager_mapping(void)
 
 static void its_map_vm(struct its_node *its, struct its_vm *vm)
 {
-	unsigned long flags;
-
 	if (gic_requires_eager_mapping())
 		return;
 
-	raw_spin_lock_irqsave(&vmovp_lock, flags);
+	guard(raw_spinlock_irqsave)(&vm->vmapp_lock);
 
 	/*
 	 * If the VM wasn't mapped yet, iterate over the vpes and get
@@ -1814,19 +1818,15 @@ static void its_map_vm(struct its_node *its, struct its_vm *vm)
 			its_send_vinvall(its, vpe);
 		}
 	}
-
-	raw_spin_unlock_irqrestore(&vmovp_lock, flags);
 }
 
 static void its_unmap_vm(struct its_node *its, struct its_vm *vm)
 {
-	unsigned long flags;
-
 	/* Not using the ITS list? Everything is always mapped. */
 	if (gic_requires_eager_mapping())
 		return;
 
-	raw_spin_lock_irqsave(&vmovp_lock, flags);
+	guard(raw_spinlock_irqsave)(&vm->vmapp_lock);
 
 	if (!--vm->vlpi_count[its->list_nr]) {
 		int i;
@@ -1834,8 +1834,6 @@ static void its_unmap_vm(struct its_node *its, struct its_vm *vm)
 		for (i = 0; i < vm->nr_vpes; i++)
 			its_send_vmapp(its, vm->vpes[i], false);
 	}
-
-	raw_spin_unlock_irqrestore(&vmovp_lock, flags);
 }
 
 static int its_vlpi_map(struct irq_data *d, struct its_cmd_info *info)
@@ -3922,6 +3920,8 @@ static void its_vpe_invall(struct its_vpe *vpe)
 {
 	struct its_node *its;
 
+	guard(raw_spinlock_irqsave)(&vpe->its_vm->vmapp_lock);
+
 	list_for_each_entry(its, &its_nodes, entry) {
 		if (!is_v4(its))
 			continue;
@@ -4527,6 +4527,7 @@ static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
 	vm->db_lpi_base = base;
 	vm->nr_db_lpis = nr_ids;
 	vm->vprop_page = vprop_page;
+	raw_spin_lock_init(&vm->vmapp_lock);
 
 	if (gic_rdists->has_rvpeid)
 		irqchip = &its_vpe_4_1_irq_chip;
diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h
index 2c63375bbd43f..ed879e03f4e64 100644
--- a/include/linux/irqchip/arm-gic-v4.h
+++ b/include/linux/irqchip/arm-gic-v4.h
@@ -25,6 +25,11 @@ struct its_vm {
 	irq_hw_number_t		db_lpi_base;
 	unsigned long		*db_bitmap;
 	int			nr_db_lpis;
+	/*
+	 * Ensures mutual exclusion between updates to vlpi_count[]
+	 * and map/unmap when using the ITSList mechanism.
+	 */
+	raw_spinlock_t		vmapp_lock;
 	u32			vlpi_count[GICv4_ITS_LIST_MAX];
 };
 
-- 
2.39.2


From b8092dce0312469a0ea03ada1da854f4ded80e2a Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz@...nel.org>
Date: Mon, 1 Jul 2024 12:03:22 +0100
Subject: [PATCH 3/3] irqchip/gic-v4: Make sure a VPE is locked when VMAPP is
 issued

In order to make sure that vpe->col_idx is correctly sampled
when a VMAPP command is issued, we must hold the lock for the
VPE. This is now possible since the introduction of the per-VM
vmapp_lock, which can be taken before vpe_lock in the locking
order.

Signed-off-by: Marc Zyngier <maz@...nel.org>
---
 drivers/irqchip/irq-gic-v3-its.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 7c37fbe97afbe..0e1583a82d4e7 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1814,7 +1814,9 @@ static void its_map_vm(struct its_node *its, struct its_vm *vm)
 		for (i = 0; i < vm->nr_vpes; i++) {
 			struct its_vpe *vpe = vm->vpes[i];
 
+			raw_spin_lock(&vpe->vpe_lock);
 			its_send_vmapp(its, vpe, true);
+			raw_spin_unlock(&vpe->vpe_lock);
 			its_send_vinvall(its, vpe);
 		}
 	}
@@ -1831,8 +1833,10 @@ static void its_unmap_vm(struct its_node *its, struct its_vm *vm)
 	if (!--vm->vlpi_count[its->list_nr]) {
 		int i;
 
-		for (i = 0; i < vm->nr_vpes; i++)
+		for (i = 0; i < vm->nr_vpes; i++) {
+			guard(raw_spinlock)(&vm->vpes[i]->vpe_lock);
 			its_send_vmapp(its, vm->vpes[i], false);
+		}
 	}
 }
 
-- 
2.39.2


-- 
Without deviation from the norm, progress is not possible.