[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <458ac4d2-bb8a-0359-f198-dd53f4c84bd3@redhat.com>
Date: Fri, 4 Aug 2023 12:51:47 -0400
From: Waiman Long <longman@...hat.com>
To: Will Deacon <will@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Robin Murphy <robin.murphy@....com>
Subject: Re: [PATCH v3] perf/arm-dmc620: Fix
dmc620_pmu_irqs_lock/cpu_hotplug_lock circular lock dependency
On 8/4/23 12:28, Will Deacon wrote:
>>>> struct dmc620_pmu {
>>>> @@ -423,9 +424,14 @@ static struct dmc620_pmu_irq *__dmc620_pmu_get_irq(int irq_num)
>>>> struct dmc620_pmu_irq *irq;
>>>> int ret;
>>>> - list_for_each_entry(irq, &dmc620_pmu_irqs, irqs_node)
>>>> - if (irq->irq_num == irq_num && refcount_inc_not_zero(&irq->refcount))
>>>> + list_for_each_entry(irq, &dmc620_pmu_irqs, irqs_node) {
>>>> + if (irq->irq_num != irq_num)
>>>> + continue;
>>>> + if (!irq->valid)
>>>> + return ERR_PTR(-EAGAIN); /* Try again later */
>>> It looks like this can bubble up to the probe() routine. Does the driver
>>> core handle -EAGAIN coming back from a probe routine?
>> Right, I should add code to handle this error condition. I think it can be
>> handled in dmc620_pmu_get_irq(). The important thing is to release the
>> mutex, wait a few ms and try again. What do you think?
> I don't really follow, but waiting a few ms and trying again sounds like
> a really nasty hack for something which doesn't appear to be constrained
> by broken hardware. In other words, we got ourselves into this mess, so
> we should be able to resolve it properly.
From my point of view, the proper way to solve the problem is to
reverse the locking order. Since you don't to add a EXPORT statement to
the core kernel code, we will have to find a way around it by not
holding the dmc620_pmu_irqs_lock when cpuhp_state_add_instance_nocalls()
is called. Another alternative that I can think of is to add one more
mutex that we will hold just for the entirety of __dmc620_pmu_get_irq()
and take dmc620_pmu_irqs_lock only when the linked list is being
modified. That will eliminate the need to introduce arbitrary wait as
other caller of __dmc620_pmu_get_irq() will wait in the new mutex. Will
this work for you?
Cheers,
Longman
Powered by blists - more mailing lists