linux-kernel - Re: [RFC][PATCH v2 00/31] timers: Use del_timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4ed6a6f6-a6e7-fbda-7658-9f645c95326a@roeck-us.net>
Date:   Fri, 28 Oct 2022 08:30:15 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     linux-kernel@...r.kernel.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Stephen Boyd <sboyd@...nel.org>
Subject: Re: [RFC][PATCH v2 00/31] timers: Use del_timer_shutdown() before
 freeing timers

On 10/27/22 15:58, Steven Rostedt wrote:
> On Thu, 27 Oct 2022 15:24:04 -0700
> Guenter Roeck <linux@...ck-us.net> wrote:
> 
>> On Thu, Oct 27, 2022 at 11:05:25AM -0400, Steven Rostedt wrote:
>>>
>>> Back in April, I posted an RFC patch set to help mitigate a common issue
>>> where a timer gets armed just before it is freed, and when the timer
>>> goes off, it crashes in the timer code without any evidence of who the
>>> culprit was. I got side tracked and never finished up on that patch set.
>>> Since this type of crash is still our #1 crash we are seeing in the field,
>>> it has become a priority again to finish it.
>>>
>>> This is v2 of that patch set. Thomas Gleixner posted an untested version
>>> that makes timer->function NULL as the flag that it is shutdown. I took that
>>> code, tested it (fixed it up), added more comments, and changed the
>>> name to del_timer_shutdown() as Linus had asked. I also converted it to use
>>> WARN_ON_ONCE() instead of just WARN_ON() as Linus asked for that too.
>>>    
>>
>> Here are various warnings and crashes. Complete logs are at
>>
>> https://kerneltests.org/builders
>>
>> in the "testing" column of the qemu test results.
>>
>> This is with the published patch set plus the fixups in
>> timer_fixup_init() and timer_fixup_free().
>>
>> Guenter
>>
>> ---
[ ... ]

>>
>> WARNING: CPU: 0 PID: 280 at lib/debugobjects.c:502 debug_print_object+0xa4/0xd8
>> ODEBUG: init active (active state 0) object type: timer_list hint: tulip_timer+0x0/0x38
>>


The problem is that the tulip code calls timer_setup() repeatedly (and
unnecessarily). Apparently either the new timer code and/or the associated
ODEBUG code doesn't like that. The patch below fixes the problem.

I think there needs to be a means to handle that situation gracefully.
The parport code has the same problem (see second patch below), and
I am sure there are others.

Thanks,
Guenter

---
tulip:

diff --git a/drivers/net/ethernet/dec/tulip/tulip_core.c b/drivers/net/ethernet/dec/tulip/tulip_core.c
index ecfad43df45a..0c86066929d3 100644
--- a/drivers/net/ethernet/dec/tulip/tulip_core.c
+++ b/drivers/net/ethernet/dec/tulip/tulip_core.c
@@ -770,8 +770,6 @@ static void tulip_down (struct net_device *dev)

         spin_unlock_irqrestore (&tp->lock, flags);

-       timer_setup(&tp->timer, tulip_tbl[tp->chip_id].media_timer, 0);
-
         dev->if_port = tp->saved_if_port;

         /* Leave the driver in snooze, not sleep, mode. */
@@ -1869,10 +1867,14 @@ static int __maybe_unused tulip_resume(struct device *dev_d)
  static void tulip_remove_one(struct pci_dev *pdev)
  {
         struct net_device *dev = pci_get_drvdata (pdev);
+       struct tulip_private *tp;

         if (!dev)
                 return;

+       tp = netdev_priv(dev);
+       del_timer_shutdown(&tp->timer);
+
         unregister_netdev(dev);
  }


---
parport:

diff --git a/drivers/parport/ieee1284.c b/drivers/parport/ieee1284.c
index 4547ac44c8d4..50dbd2ea23fc 100644
--- a/drivers/parport/ieee1284.c
+++ b/drivers/parport/ieee1284.c
@@ -73,7 +73,7 @@ int parport_wait_event (struct parport *port, signed long timeout)
         timer_setup(&port->timer, timeout_waiting_on_port, 0);
         mod_timer(&port->timer, jiffies + timeout);
         ret = down_interruptible (&port->physport->ieee1284.irq);
-       if (!del_timer_sync(&port->timer) && !ret)
+       if (!del_timer_shutdown(&port->timer) && !ret)
                 /* Timed out. */
                 ret = 1;