lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875y2gi33n.ffs@tglx>
Date:   Sun, 05 Nov 2023 17:20:28 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Ben Greear <greearb@...delatech.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc:     Rodolfo Giometti <giometti@...eenne.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH/RFC] debugobjects/slub: Print slab info and backtrace.

On Thu, Nov 02 2023 at 18:49, Ben Greear wrote:
> And here is resulting splat from wireless-next tree I've been
> debugging.
>
> Note the subsequent splats from slub are due to some memory poisoning, for
> one reason or another.  Maybe slub changes should not be included in this patch, not
> sure if it can provide useful info in other cases though.
>
> If I understand this correctly, then it appears the bug is related to
> the pps driver.
>
> 16140 Nov 02 17:28:25 ct523c-2103 kernel: ODEBUG: debugobjects: debug_obj allocated at:
> 16141 Nov 02 17:28:25 ct523c-2103 kernel:  init_timer_key+0x24/0x160
> 16142 Nov 02 17:28:25 ct523c-2103 kernel:  kobject_put+0x14f/0x190
> 16143 Nov 02 17:28:25 ct523c-2103 kernel:  pps_device_destruct+0x26/0xb0
> 16144 Nov 02 17:28:25 ct523c-2103 kernel:  device_release+0x57/0x100
> 16145 Nov 02 17:28:25 ct523c-2103 kernel:  kobject_delayed_cleanup+0xdf/0x140
> 16146 Nov 02 17:28:25 ct523c-2103 kernel:  process_one_work+0x475/0x920
> 16147 Nov 02 17:28:25 ct523c-2103 kernel:  worker_thread+0x38a/0x680

Can you please provide proper kernel dmesg output next time instead of
this mess?

>  ODEBUG: free active (active state 0) object: ffff888181c029a0 object type: timer_list hint: kobject_delayed_cleanup+0x0/0x140
>  WARNING: CPU: 1 PID: 104 at lib/debugobjects.c:549 debug_print_object+0xf0/0x170
>  CPU: 1 PID: 104 Comm: kworker/1:10 Tainted: G        W          6.6.0-rc7+ #17
>  Workqueue: events kobject_delayed_cleanup
>  RIP: 0010:debug_print_object+0xf0/0x170
>   debug_check_no_obj_freed+0x261/0x2b0
>   __kmem_cache_free+0x185/0x200
>   device_release+0x57/0x100
>   kobject_delayed_cleanup+0xdf/0x140
>   process_one_work+0x475/0x920
>   worker_thread+0x38a/0x680

So what happens is:

pps_unregister_cdev()
  device_destroy()
    put_device()
     device_unregister()
       device_del()
       put_device() <- Drops final reference to dev->kobj
         schedule_delayed_work()

worker thread:
  kobject_delayed_cleanup()
    device_release()
      pps_device_destruct()
        cdev_del(&pps->cdev)
          kobject_put(&cdev->kobj) <- Drops final reference
            schedule_delayed_work()
              init_timer(&cdev->kobj.release.timer);
              start_timer();
       ...
       kfree(dev);
       kfree(pps); <- Debug object detects the active timer to be freed
                      because cdev and its kobject are embedded in
                      struct pps_device.

pps_device_destruct() is unfortunately not on the call trace of the
debug objects splat anymore stack because kfree(pps) is a tail call.

So yes, that collected stacktrace is helpful.

>> To try to improve this, store the backtrace of where the
>> debug_obj was created and print that out when problems
>> are found.
<SNIP>

Please trim your replies.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ