[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z8ns4T6pas9DEy8B@pluto>
Date: Thu, 6 Mar 2025 18:43:45 +0000
From: Cristian Marussi <cristian.marussi@....com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Cristian Marussi <cristian.marussi@....com>,
Alice Ryhl <aliceryhl@...gle.com>,
Sudeep Holla <sudeep.holla@....com>,
linux-arm-kernel@...ts.infradead.org, arm-scmi@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [Bug report] Memory leak in scmi_device_create
On Thu, Mar 06, 2025 at 04:18:38PM +0000, Catalin Marinas wrote:
> On Thu, Mar 06, 2025 at 03:47:27PM +0000, Cristian Marussi wrote:
> > On Thu, Mar 06, 2025 at 02:36:16PM +0000, Catalin Marinas wrote:
> > > This loop in scmi_device_create() looks strange:
> > >
> > > list_for_each_entry(rdev, phead, node) {
> > > struct scmi_device *sdev;
> > >
> > > sdev = __scmi_device_create(np, parent,
> > > rdev->id_table->protocol_id,
> > > rdev->id_table->name);
> > > /* Report errors and carry on... */
> > > if (sdev)
> > > scmi_dev = sdev;
> > > else
> > > pr_err("(%s) Failed to create device for protocol 0x%x (%s)\n",
> > > of_node_full_name(parent->of_node),
> > > rdev->id_table->protocol_id,
> > > rdev->id_table->name);
> > > }
> > >
> > > We can override scmi_dev a few times in the loop and lose the previous
> > > sdev allocations. Is this intended?
> >
> > Yes...it is weird..but by design I would say :P ...
> >
> > ...because this is called to instantiate one single device OR instantiate at
> > once all the multiple devices needed for a protocol: in this latter case it
> > returns just one of the created devices to signal success or NULL if all the
> > devices' creation failed....we dont need to keep the allocated devices references
> > anyway here since on success those devices are now referenced and kept on the
> > SCMI bus, so they can be searched/scanned/destroyed from there.
>
> Not sure why the pointer isn't found, device_add() should link it with
> the parent. Unless something else fails, the parent is freed and the
> linked devices unreachable. I'm not familiar at all with this code, I
> just saw kmemleak and thought of replying.
>
> The loop is still weird, scmi_chan_setup() seems to use the pointer to
> scmi_device for something more meaningful than a pass/fail check. Also
> the overall result is based only on what the last __scmi_device_create()
> return value was, irrespective of the previous iterations of the loop.
> You do have a pr_err() but no early bailing out of the loop on failure.
> I'm curious if there are any SCMI errors in the Alice's kernel log.
>
Yes, the weirdness comes from the fact such function is used alternatively
to create a single named device (and make some use of it, like in
scmi_chan_setup) OR to create a bunch of devices for the same protocol
when no specific device is asked for (name==NULL)...anyway the case at
hand that kmemleak complains about does NOT pass through that weird loop...
...good news is, I was able to reproduce a similar report consistently
with a load/unload/load sequence....the culprit is that when looking for
a device to destroy on unload, the SCMI bus uses device_find_child()
and that bumps the device refcnt implicitly...as a result when the device
is destroyed the refcnt is NEVER found as zero and so NO device_release
is ever called...this results in dev->p private_data to be never released
and that is what kmemleak spotted (at the start of teh chain):
unreferenced object 0xffff00000f583800 (size 512):
comm "insmod", pid 227, jiffies 4294912190
hex dump (first 32 bytes):
00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N..........
ff ff ff ff ff ff ff ff 60 36 1d 8a 00 80 ff ff ........`6......
backtrace (crc 114e2eed):
kmemleak_alloc+0xbc/0xd8
__kmalloc_cache_noprof+0x2dc/0x398
device_add+0x954/0x12d0
device_register+0x28/0x40
__scmi_device_create.part.0+0x1bc/0x380
scmi_device_create+0x2d0/0x390
scmi_create_protocol_devices+0x74/0xf8
scmi_device_request_notifier+0x1f8/0x2a8
notifier_call_chain+0x110/0x3b0
blocking_notifier_call_chain+0x70/0xb0
scmi_driver_register+0x350/0x7f0
0xffff80000a3b3038
do_one_initcall+0x12c/0x730
do_init_module+0x1dc/0x640
load_module+0x4b20/0x5b70
init_module_from_file+0xec/0x158
$ ./scripts/faddr2line ./vmlinux device_add+0x954/0x12d0
device_add+0x954/0x12d0:
kmalloc_noprof at include/linux/slab.h:901
(inlined by) kzalloc_noprof at include/linux/slab.h:1037
(inlined by) device_private_init at drivers/base/core.c:3510
(inlined by) device_add at drivers/base/core.c:3561
I am posting a fix.
Thanks for the report and the help.
Any feedback and testing is much welcomed :D
Cristian
Powered by blists - more mailing lists