lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8nDj129ZVeZBVSp@pluto>
Date: Thu, 6 Mar 2025 15:47:27 +0000
From: Cristian Marussi <cristian.marussi@....com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Alice Ryhl <aliceryhl@...gle.com>,
	Cristian Marussi <cristian.marussi@....com>,
	Sudeep Holla <sudeep.holla@....com>,
	linux-arm-kernel@...ts.infradead.org, arm-scmi@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [Bug report] Memory leak in scmi_device_create

On Thu, Mar 06, 2025 at 02:36:16PM +0000, Catalin Marinas wrote:
> On Thu, Mar 06, 2025 at 11:09:33AM +0000, Alice Ryhl wrote:
> > On Wed, Mar 05, 2025 at 05:10:16PM +0000, Cristian Marussi wrote:
> > > On Wed, Mar 05, 2025 at 11:59:58AM +0000, Alice Ryhl wrote:
> > > > This was with a kernel running v6.13-rc3, but as far as I can tell, no
> > > > relevant changes have landed since v6.13-rc3. My tree *does* include
> > > > commit 295416091e44 ("firmware: arm_scmi: Fix slab-use-after-free in
> > > > scmi_bus_notifier()"). I've only seen this kmemleak report once, so it's
> > > > not happening consistently.
> > > > 
> > > > See below for the full kmemleak report.
> > > > 
> > > > Alice
> > > > 
> > > > $ sudo cat /sys/kernel/debug/kmemleak
> > > > unreferenced object 0xffffff8106c86000 (size 2048):
> > > >   comm "swapper/0", pid 1, jiffies 4294893094
> > > >   hex dump (first 32 bytes):
> > > >     02 00 00 00 10 00 00 00 c0 01 bc 03 81 ff ff ff  ................
> > > >     60 67 ba 03 81 ff ff ff 18 60 c8 06 81 ff ff ff  `g.......`......
> > > >   backtrace (crc feae9680):
> > > >     [<00000000197aa008>] kmemleak_alloc+0x34/0xa0
> > > >     [<0000000056fe02c9>] __kmalloc_cache_noprof+0x1e0/0x450
> > > >     [<00000000a8b3dfe1>] __scmi_device_create+0xb4/0x2b4
> > > >     [<000000008714917b>] scmi_device_create+0x40/0x194
> > > >     [<000000001818f3cf>] scmi_chan_setup+0x144/0x3b8
> > > >     [<00000000970bad38>] scmi_probe+0x584/0xa78
> > > >     [<000000002600d2fd>] platform_probe+0xbc/0xf0
> > > >     [<00000000f6f556b4>] really_probe+0x1b8/0x520
> > > >     [<00000000eed93d59>] __driver_probe_device+0xe0/0x1d8
> > > >     [<00000000d613b754>] driver_probe_device+0x6c/0x208
> > > >     [<00000000187a9170>] __driver_attach+0x168/0x328
> > > >     [<00000000e3ff1834>] bus_for_each_dev+0x14c/0x178
> > > >     [<00000000984a3176>] driver_attach+0x34/0x44
> > > >     [<00000000fc35bf2a>] bus_add_driver+0x1bc/0x358
> > > >     [<00000000747fce19>] driver_register+0xc0/0x1a0
> > > >     [<0000000081cb8754>] __platform_driver_register+0x40/0x50
> > > > unreferenced object 0xffffff8103bc01c0 (size 32):
> > > 
> > > I could not reproduce on my setup, even though I run a system with
> > > all the existent SCMI protocols (and related drivers) enabled (and
> > > so a lot of device creations) and a downstream test driver that causes
> > > even more SCMI devices to be created/destroyed at load/unload.
> > > 
> > > Coming down the path from scmi_chan_setup(), it seems something around
> > > transport devices creation, but it is not obvious to me where the leak
> > > could hide....
> > > 
> > > ...any particular setup on your side ? ...using LKMs, loading/unloading,
> > > any usage pattern that could help me reproduce ?
> > 
> > I looked into this a bit more, and actually it does happen consistently.
> > It's just that kmemleak doesn't report it until 10 minutes after
> > booting, so I did not notice it.
> 
> You can force the scanning with:
> 
>   echo scan > /sys/kernel/debug/kmemleak
> 
> Just do it a couple of times after boot, no need to wait 10 min for the
> default background scanning.
> 
> > user@...588-ci:~$ sudo cat /sys/kernel/debug/kmemleak
> > unreferenced object 0xffffff81068c0000 (size 2048):
> >   comm "swapper/0", pid 1, jiffies 4294893128
> >   hex dump (first 32 bytes):
> >     02 00 00 00 10 00 00 00 40 a3 7a 03 81 ff ff ff  ........@.......
> >     60 c8 79 03 81 ff ff ff 18 00 8c 06 81 ff ff ff  `.y.............
> >   backtrace (crc 60df30fb):
> >     kmemleak_alloc+0x34/0xa0
> >     __kmalloc_cache_noprof+0x1e0/0x450
> >     __scmi_device_create+0xb4/0x2b4
> 
> Is this the kzalloc() for sizeof(*scmi_dev)? It's surprisingly large, I
> thought it would go for the kmalloc-1k slab as struct device is below
> this side, at least for my builds. Anyway...
> 
> >     scmi_device_create+0x40/0x194
> >     scmi_chan_setup+0x144/0x3b8
> >     scmi_probe+0x51c/0x9fc
> >     platform_probe+0xbc/0xf0
> >     really_probe+0x1b8/0x520
> >     __driver_probe_device+0xe0/0x1d8
> >     driver_probe_device+0x6c/0x208
> >     __driver_attach+0x168/0x328
> >     bus_for_each_dev+0x14c/0x178
> >     driver_attach+0x34/0x44
> >     bus_add_driver+0x1bc/0x358
> >     driver_register+0xc0/0x1a0
> >     __platform_driver_register+0x40/0x50
> > unreferenced object 0xffffff81037aa340 (size 32):
> >   comm "swapper/0", pid 1, jiffies 4294893128
> >   hex dump (first 32 bytes):
> >     5f 5f 73 63 6d 69 5f 74 72 61 6e 73 70 6f 72 74  __scmi_transport
> >     5f 64 65 76 69 63 65 5f 72 78 5f 31 30 00 ff ff  _device_rx_10...
> >   backtrace (crc 8dab7ca7):
> >     kmemleak_alloc+0x34/0xa0
> >     __kmalloc_node_track_caller_noprof+0x234/0x528
> >     kstrdup+0x48/0x80
> >     kstrdup_const+0x30/0x3c
> 
> These are referenced from the main structure above, so they'd be
> reported as leaks as well.
> 
> This loop in scmi_device_create() looks strange:
> 
> 	list_for_each_entry(rdev, phead, node) {
> 		struct scmi_device *sdev;
> 
> 		sdev = __scmi_device_create(np, parent,
> 					    rdev->id_table->protocol_id,
> 					    rdev->id_table->name);
> 		/* Report errors and carry on... */
> 		if (sdev)
> 			scmi_dev = sdev;
> 		else
> 			pr_err("(%s) Failed to create device for protocol 0x%x (%s)\n",
> 			       of_node_full_name(parent->of_node),
> 			       rdev->id_table->protocol_id,
> 			       rdev->id_table->name);
> 	}
> 
> We can override scmi_dev a few times in the loop and lose the previous
> sdev allocations. Is this intended?

Yes...it is weird..but by design I would say :P ...

...because this is called to instantiate one single device OR instantiate at
once all the multiple devices needed for a protocol: in this latter case it
returns just one of the created devices to signal success or NULL if all the
devices' creation failed....we dont need to keep the allocated devices references
anyway here since on success those devices are now referenced and kept on the
SCMI bus, so they can be searched/scanned/destroyed from there.

But maybe this is the crux of the matter, or what fools kmemleak...I
will try to reproduce again.

Thanks,
Cristian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ