[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240719142037-93bd4395-1f6b-490a-8a14-50e7bcc756d1@linutronix.de>
Date: Fri, 19 Jul 2024 14:32:15 +0200
From: Thomas Weißschuh <thomas.weissschuh@...utronix.de>
To: Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>
Cc: Max Dubois <makemehappy@...ketmail.com>,
"ilpo.jarvinen@...ux.intel.com" <ilpo.jarvinen@...ux.intel.com>, "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
Dan Carpenter <error27@...il.com>, Dan Carpenter <dan.carpenter@...aro.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: Bug related with a 6.6.24 platform/x86 commit signed by you -
Enormous memory leak
On Fri, Jul 19, 2024 at 05:34:23PM GMT, Harshit Mogalapalli wrote:
> Hi Max,
>
>
> On 19/07/24 15:29, Max Dubois wrote:
> > Hello,
> >
> > I write to you becouse you signed off this buggy commitment long ago.
> >
> > I don't know how to report it. This is a nasty bug and I think it is
> > related to this committed on 6.6.24 and it is still present from that
> > kernel to even 6.6.10 only in 32 Linux machines with over 32 bit kernels
> > (tested by me on virtualbox and Vmware guests, I don't have real 32 bit
> > machines to test it):
> >
> > commit 9a98ab01e3acba830cb0917296a13192fd23f305
> > Author: Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>
> > Date: Mon Nov 13 12:07:39 2023 -0800
> >
> > platform/x86: hp-bioscfg: Fix error handling in
> > hp_add_other_attributes()
> >
> > commit f40f939917b2b4cbf18450096c0ce1c58ed59fae upstream.
> >
> > 'attr_name_kobj' is allocated using kzalloc, but on all the error paths
> > it is not freed, hence we have a memory leak.
> >
> > Fix the error path before kobject_init_and_add() by adding kfree().
> >
> > kobject_put() must be always called after passing the object to
> > kobject_init_and_add(). Only the error path which is immediately next
> > to kobject_init_and_add() calls kobject_put() and not any other error
> > path after it.
> >
> > Fix the error handling after kobject_init_and_add() by moving the
> > kobject_put() into the goto label err_other_attr_init that is already
> > used by all the error paths after kobject_init_and_add().
> >
> > Fixes: a34fc329b189 ("platform/x86: hp-bioscfg: bioscfg")
> > Cc: stable@...r.kernel.org # 6.6.x: c5dbf0416000: platform/x86:
> > hp-bioscfg: Simplify return check in hp_add_other_attributes()
> > Cc: stable@...r.kernel.org # 6.6.x: 5736aa9537c9: platform/x86:
> > hp-bioscfg: move mutex_lock() down in hp_add_other_attributes()
> > Reported-by: kernel test robot <lkp@...el.com>
> > Reported-by: Dan Carpenter <error27@...il.com>
> > Closes: https://lore.kernel.org/r/202309201412.on0VXJGo-lkp@intel.com/
> > Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@...cle.com>
> > [ij: Added the stable dep tags]
> > Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
> > Link: https://lore.kernel.org/r/20231113200742.3593548-3-harshit.m.mogalapalli@oracle.com
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
> > Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> >
> > I reported this on Gentoo forums in this discussion:
> >
> > https://forums.gentoo.org/viewtopic-p-8834077.html#8834077
> > <https://forums.gentoo.org/viewtopic-p-8834077.html#8834077>
> >
> > In this days 32 bit machines are pretty much unused and I think this is
> > the reason becouse no one reported it.
> >
> > The bug wasn't present in kernels before 6.6.24 (example: 6.6.23 is ok).
> >
>
> Thanks for reporting and sendine me an email.
>
> The commit you pointed out which is authored by me is in:
>
> v6.6.4 - 9a98ab01e3ac platform/x86: hp-bioscfg: Fix error handling in
> hp_add_other_attributes()
>
> So you should have seen this in 6.6.4 as well ?
>
> > The bug wasn't present in kernels before 6.6.24 (example: 6.6.23 is ok).
>
> This confused me, as the commit that you pointed out is present since 6.6.4
Given that the commit under discussion is for a HP BIOS driver and the
issue is reproducible in a VM guest without that hardware,
I'd argue it's highly unlikely that this commit is the culprit.
(Or has anything to do with the issue for that matter)
> > I tested it in various VMware and Virtualbox guests and it is very easy
> > to reproduce it.
> >
> > You just need a VM with x86 emulated processor, over 1 GB of RAM and run
> > some applications like few terminals, a web browser and audio player.
> >
> > In the log you will see a lot of complains related to vmalloc
> > allocations not present on working kernels before 6.6.24 and this
> > commitment.
> >
> > Increasing vmalloc like suggested in the log, doesn't help.
> >
> > Starting from this point the VM become unresponsive, it close apps, in
> > doesn't open others, terminals can't execute simple commands. Sometimes
> > you are even unable to reboot and sometimes the machines freeze,
> > sometimes they go in total kernel exception.
> >
> > This happen 100 per 100 of the time, it is easy to reproduce it
> > everytime on any kernel 6.6.24 or more (6.7, 6.8, 6.9 and 6.10 are all
> > affected).
> >
> > Considering the kernel is supposed to support 32 bit I think this is
> > something to fix it then I don't know how and to who point this bug too.
The reporting really should figure out which specific release or commit
is introducing the issue. And if mainline or 6.6.41 are also affected.
The linked gentoo forum thread has some actual kernel logs:
Jul 16 00:01:10 [kernel] alloc_vmap_area: 133 callbacks suppressed
Jul 16 00:01:10 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
- Last output repeated 9 times -
Jul 16 00:01:15 [kernel] alloc_vmap_area: 240 callbacks suppressed
Jul 16 00:01:15 [kernel] vmap allocation for size 20480 failed: use vmalloc=<size> to increase size
- Last output repeated 9 times -
Jul 16 00:01:17 [kernel] warn_alloc: 3 callbacks suppressed
Jul 16 00:01:17 [kernel] Web Content: vmalloc error: size 8192, vm_struct allocation failed, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
Jul 16 00:01:17 [kernel] CPU: 1 PID: 2761 Comm: Web Content Not tainted 6.6.38-gentoo #1
Jul 16 00:01:17 [kernel] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B32.2305221830 05/22/2023
Jul 16 00:01:17 [kernel] Call Trace:
Jul 16 00:01:17 [kernel] dump_stack_lvl+0x32/0x41
Jul 16 00:01:17 [kernel] dump_stack+0xd/0x10
Jul 16 00:01:17 [kernel] warn_alloc+0xab/0x111
Jul 16 00:01:17 [kernel] __vmalloc_node_range+0x73/0x345
Jul 16 00:01:17 [kernel] __vmalloc_node+0x55/0x5d
Jul 16 00:01:17 [kernel] ? bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 00:01:17 [kernel] __vmalloc+0x14/0x16
Jul 16 00:01:17 [kernel] ? bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 00:01:17 [kernel] bpf_prog_alloc_no_stats+0x1f/0xcd
Jul 16 00:01:17 [kernel] bpf_prog_alloc+0x13/0x9f
Jul 16 00:01:17 [kernel] bpf_prog_create_from_user+0x47/0xbd
Jul 16 00:01:17 [kernel] ? kprobe_free_init_mem+0x4c/0x4c
Jul 16 00:01:17 [kernel] do_seccomp+0x176/0x7ac
Jul 16 00:01:17 [kernel] ? __ia32_sys_prctl+0x47/0x5bf
Jul 16 00:01:17 [kernel] __ia32_sys_seccomp+0x10/0x12
Jul 16 00:01:17 [kernel] ia32_sys_call+0xd09/0x1063
Jul 16 00:01:17 [kernel] __do_fast_syscall_32+0x7a/0x99
Jul 16 00:01:17 [kernel] do_fast_syscall_32+0x29/0x5b
Jul 16 00:01:17 [kernel] do_SYSENTER_32+0x15/0x17
Jul 16 00:01:17 [kernel] entry_SYSENTER_32+0x98/0xf8
Jul 16 00:01:17 [kernel] EIP: 0xb7fc856d
The lines with "size 20480" repeat *a lot*, it could be the issue.
Thomas
Powered by blists - more mailing lists