[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230722031729.3913953-1-longman@redhat.com>
Date: Fri, 21 Jul 2023 23:17:28 -0400
From: Waiman Long <longman@...hat.com>
To: Will Deacon <will@...nel.org>, Mark Rutland <mark.rutland@....com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Robin Murphy <robin.murphy@....com>,
Waiman Long <longman@...hat.com>
Subject: [PATCH v3] perf/arm-dmc620: Fix dmc620_pmu_irqs_lock/cpu_hotplug_lock circular lock dependency
The following circular locking dependency was reported when running
cpus online/offline test on an arm64 system.
[ 84.195923] Chain exists of:
dmc620_pmu_irqs_lock --> cpu_hotplug_lock --> cpuhp_state-down
[ 84.207305] Possible unsafe locking scenario:
[ 84.213212] CPU0 CPU1
[ 84.217729] ---- ----
[ 84.222247] lock(cpuhp_state-down);
[ 84.225899] lock(cpu_hotplug_lock);
[ 84.232068] lock(cpuhp_state-down);
[ 84.238237] lock(dmc620_pmu_irqs_lock);
[ 84.242236]
*** DEADLOCK ***
The problematic locking order seems to be
lock(dmc620_pmu_irqs_lock) --> lock(cpu_hotplug_lock)
This locking order happens when dmc620_pmu_get_irq() is called from
dmc620_pmu_device_probe(). Since dmc620_pmu_irqs_lock is used for
protecting the dmc620_pmu_irqs structure only, we don't actually need
to hold the lock when adding a new instance to the CPU hotplug subsystem.
Fix this possible deadlock scenario by releasing the lock before
calling cpuhp_state_add_instance_nocalls() and reacquiring it afterward.
To avoid the possibility of 2 racing dmc620_pmu_get_irq() calls inserting
duplicated dmc620_pmu_irq structures with the same irq number, a dummy
entry is inserted before releasing the lock which will block a competing
thread from inserting another irq structure of the same irq number.
Suggested-by: Robin Murphy <robin.murphy@....com>
Signed-off-by: Waiman Long <longman@...hat.com>
---
drivers/perf/arm_dmc620_pmu.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/perf/arm_dmc620_pmu.c b/drivers/perf/arm_dmc620_pmu.c
index 9d0f01c4455a..7cafd4dd4522 100644
--- a/drivers/perf/arm_dmc620_pmu.c
+++ b/drivers/perf/arm_dmc620_pmu.c
@@ -76,6 +76,7 @@ struct dmc620_pmu_irq {
refcount_t refcount;
unsigned int irq_num;
unsigned int cpu;
+ unsigned int valid;
};
struct dmc620_pmu {
@@ -423,9 +424,14 @@ static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
struct dmc620_pmu_irq *irq;
int ret;
- list_for_each_entry(irq, &dmc620_pmu_irqs, irqs_node)
- if (irq->irq_num == irq_num && refcount_inc_not_zero(&irq->refcount))
+ list_for_each_entry(irq, &dmc620_pmu_irqs, irqs_node) {
+ if (irq->irq_num != irq_num)
+ continue;
+ if (!irq->valid)
+ return ERR_PTR(-EAGAIN); /* Try again later */
+ if (refcount_inc_not_zero(&irq->refcount))
return irq;
+ }
irq = kzalloc(sizeof(*irq), GFP_KERNEL);
if (!irq)
@@ -447,13 +453,23 @@ static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
if (ret)
goto out_free_irq;
- ret = cpuhp_state_add_instance_nocalls(cpuhp_state_num, &irq->node);
- if (ret)
- goto out_free_irq;
-
irq->irq_num = irq_num;
list_add(&irq->irqs_node, &dmc620_pmu_irqs);
+ /*
+ * Release dmc620_pmu_irqs_lock before calling
+ * cpuhp_state_add_instance_nocalls() and reacquire it afterward.
+ */
+ mutex_unlock(&dmc620_pmu_irqs_lock);
+ ret = cpuhp_state_add_instance_nocalls(cpuhp_state_num, &irq->node);
+ mutex_lock(&dmc620_pmu_irqs_lock);
+
+ if (ret) {
+ list_del(&irq->irqs_node);
+ goto out_free_irq;
+ }
+
+ irq->valid = true;
return irq;
out_free_irq:
--
2.31.1
Powered by blists - more mailing lists