[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a83931f-13bd-27c2-4050-4a21be74c49b@redhat.com>
Date: Sun, 7 Mar 2021 20:04:42 +0100
From: Hans de Goede <hdegoede@...hat.com>
To: Pavel Machek <pavel@....cz>, Marc Kleine-Budde <mkl@...gutronix.de>
Cc: Andrea Righi <andrea.righi@...onical.com>,
Boqun Feng <boqun.feng@...il.com>, Dan Murphy <dmurphy@...com>,
linux-leds@...r.kernel.org, linux-kernel@...r.kernel.org,
kernel@...gutronix.de, schuchmann@...leissheimer.de
Subject: Re: [PATCH] leds: trigger: fix potential deadlock with libata
Hi,
On 3/7/21 5:13 PM, Pavel Machek wrote:
> Hi!
>
>>> --- a/drivers/leds/led-triggers.c
>>> +++ b/drivers/leds/led-triggers.c
>>> @@ -378,14 +378,15 @@ void led_trigger_event(struct led_trigger *trig,
>>> enum led_brightness brightness)
>>> {
>>> struct led_classdev *led_cdev;
>>> + unsigned long flags;
>>>
>>> if (!trig)
>>> return;
>>>
>>> - read_lock(&trig->leddev_list_lock);
>>> + read_lock_irqsave(&trig->leddev_list_lock, flags);
>>> list_for_each_entry(led_cdev, &trig->led_cdevs, trig_list)
>>> led_set_brightness(led_cdev, brightness);
>>> - read_unlock(&trig->leddev_list_lock);
>>> + read_unlock_irqrestore(&trig->leddev_list_lock, flags);
>>> }
>>> EXPORT_SYMBOL_GPL(led_trigger_event)
>>
>> meanwhile this patch hit v5.10.x stable and caused a performance
>> degradation on our use case:
>>
>> It's an embedded ARM system, 4x Cortex A53, with an SPI attached CAN
>> controller. CAN stands for Controller Area Network and here used to
>> connect to some automotive equipment. Over CAN an ISOTP (a CAN-specific
>> Transport Protocol) transfer is running. With this patch, we see CAN
>> frames delayed for ~6ms, the usual gap between CAN frames is 240µs.
>>
>> Reverting this patch, restores the old performance.
>>
>> What is the best way to solve this dilemma? Identify the critical path
>> in our use case? Is there a way we can get around the irqsave in
>> led_trigger_event()?
>
> Hans was pushing for this patch, perhaps he has some ideas...
I was not pushing for this particular fix, I was asking about a fix
for the lockdep identified potential deadlock.
And you replied that this was already fixed in your for-next branch
when I asked, so all in all, other then reporting the potential deadlock
(after it was already fixed) I have very little do to with this patch.
With that all said, I must say that I'm surprised that switching from
read_lock() to read_lock_irqsave() causes such a hefty penalty, so I
wonder what is really going on here. Using the irqsave version disables
interrupts, but AFAIK only on the current core and only for the duration
of the led_set_brightness() call(s) .
Is the system perhaps pinning IRQs to a specific CPU in combination with
a led_set_brightness() somehow taking much longer then it should?
Note that led_set_brightness() calls are not allowed to block, if they
block they should use the brightness_set_blocking callback in their
led_class_dev struct not the regular brightness_set callback. In which case
the LED-core will defer the actually setting of the LED to a workqueue.
So one thing which might be worthwhile to check is if any of the LED
drivers on the system in question are using the brightness_set callback,
where they should be using the blocking one.
Regards,
Hans
Powered by blists - more mailing lists