[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ed6a6f6-a6e7-fbda-7658-9f645c95326a@roeck-us.net>
Date: Fri, 28 Oct 2022 08:30:15 -0700
From: Guenter Roeck <linux@...ck-us.net>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Stephen Boyd <sboyd@...nel.org>
Subject: Re: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before
freeing timers
On 10/27/22 15:58, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 15:24:04 -0700
> Guenter Roeck <linux@...ck-us.net> wrote:
>
>> On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
>>>
>>> Back in April, I posted an RFC patch set to help mitigate a common issue
>>> where a timer gets armed just before it is freed, and when the timer
>>> goes off, it crashes in the timer code without any evidence of who the
>>> culprit was. I got side tracked and never finished up on that patch set.
>>> Since this type of crash is still our #1 crash we are seeing in the field,
>>> it has become a priority again to finish it.
>>>
>>> This is v2 of that patch set. Thomas Gleixner posted an untested version
>>> that makes timer->function NULL as the flag that it is shutdown. I took that
>>> code, tested it (fixed it up), added more comments, and changed the
>>> name to del_timer_shutdown() as Linus had asked. I also converted it to use
>>> WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
>>>
>>
>> Here are various warnings and crashes. Complete logs are at
>>
>> https://kerneltests.org/builders
>>
>> in the "testing" column of the qemu test results.
>>
>> This is with the published patch set plus the fixups in
>> timer_fixup_init() and timer_fixup_free().
>>
>> Guenter
>>
>> ---
[ ... ]
>>
>> WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: init active (active state 0) object type: timer_list hint: tulip_timer+0x0/0x38
>>
The problem is that the tulip code calls timer_setup() repeatedly (and
unnecessarily). Apparently either the new timer code and/or the associated
ODEBUG code doesn't like that. The patch below fixes the problem.
I think there needs to be a means to handle that situation gracefully.
The parport code has the same problem (see second patch below), and
I am sure there are others.
Thanks,
Guenter
---
tulip:
diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c
index ecfad43df45a..0c86066929d3 100644
--- a/drivers/net/ethernet/dec/tulip/tulip_core.c
+++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
@@ -770,8 +770,6 @@ static void tulip_down (struct net_device *dev)
spin_unlock_irqrestore (&tp->lock, flags);
- timer_setup(&tp->timer, tulip_tbl[tp->chip_id].media_timer, 0);
-
dev->if_port = tp->saved_if_port;
/* Leave the driver in snooze, not sleep, mode. */
@@ -1869,10 +1867,14 @@ static int __maybe_unused tulip_resume(struct device *dev_d)
static void tulip_remove_one(struct pci_dev *pdev)
{
struct net_device *dev = pci_get_drvdata (pdev);
+ struct tulip_private *tp;
if (!dev)
return;
+ tp = netdev_priv(dev);
+ del_timer_shutdown(&tp->timer);
+
unregister_netdev(dev);
}
---
parport:
diff --git a/drivers/parport/ieee1284.c b/drivers/parport/ieee1284.c
index 4547ac44c8d4..50dbd2ea23fc 100644
--- a/drivers/parport/ieee1284.c
+++ b/drivers/parport/ieee1284.c
@@ -73,7 +73,7 @@ int parport_wait_event (struct parport *port, signed long timeout)
timer_setup(&port->timer, timeout_waiting_on_port, 0);
mod_timer(&port->timer, jiffies + timeout);
ret = down_interruptible (&port->physport->ieee1284.irq);
- if (!del_timer_sync(&port->timer) && !ret)
+ if (!del_timer_shutdown(&port->timer) && !ret)
/* Timed out. */
ret = 1;
Powered by blists - more mailing lists