lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2025101512-overlap-maggot-441c@gregkh>
Date: Wed, 15 Oct 2025 10:45:45 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Wen Yang <wen.yang@...ux.dev>, Jon Hunter <jonathanh@...dia.com>
Cc: stable@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 6.1 0/6] fix invalid sleeping in detect_cache_attributes()

On Wed, Oct 15, 2025 at 10:43:01AM +0200, Greg Kroah-Hartman wrote:
> On Wed, Oct 01, 2025 at 01:27:25AM +0800, Wen Yang wrote:
> > commit 3fcbf1c77d08 ("arch_topology: Fix cache attributes detection
> > in the CPU hotplug path")
> > adds a call to detect_cache_attributes() to populate the cacheinfo
> > before updating the siblings mask. detect_cache_attributes() allocates
> > memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT
> > kernels, on secondary CPUs, this triggers a:
> >   'BUG: sleeping function called from invalid context'
> > as the code is executed with preemption and interrupts disabled:
> > 
> >  | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> >  | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111
> >  | preempt_count: 1, expected: 0
> >  | RCU nest depth: 1, expected: 1
> >  | 3 locks held by swapper/111/0:
> >  |  #0:  (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8
> >  |  #1:  (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0
> >  |  #2:  (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80
> >  | irq event stamp: 0
> >  | hardirqs last  enabled at (0):  0x0
> >  | hardirqs last disabled at (0):  copy_process+0x5dc/0x1ab8
> >  | softirqs last  enabled at (0):  copy_process+0x5dc/0x1ab8
> >  | softirqs last disabled at (0):  0x0
> >  | Preemption disabled at:
> >  |  migrate_enable+0x30/0x130
> >  | CPU: 111 PID: 0 Comm: swapper/111 Tainted: G        W          6.0.0-rc4-rt6-[...]
> >  | Call trace:
> >  |  __kmalloc+0xbc/0x1e8
> >  |  detect_cache_attributes+0x2d4/0x5f0
> >  |  update_siblings_masks+0x30/0x368
> >  |  store_cpu_topology+0x78/0xb8
> >  |  secondary_start_kernel+0xd0/0x198
> >  |  __secondary_switched+0xb0/0xb4
> > 
> > 
> > Pierre fixed this issue in the upstream 6.3 and the original series is follows:
> > https://lore.kernel.org/all/167404285593.885445.6219705651301997538.b4-ty@arm.com/
> > 
> > We also encountered the same issue on 6.1 stable branch,  and need to backport this series.
> > 
> > Pierre Gondois (6):
> >   cacheinfo: Use RISC-V's init_cache_level() as generic OF
> >     implementation
> >   cacheinfo: Return error code in init_of_cache_level()
> >   cacheinfo: Check 'cache-unified' property to count cache leaves
> >   ACPI: PPTT: Remove acpi_find_cache_levels()
> >   ACPI: PPTT: Update acpi_find_last_cache_level() to
> >     acpi_get_cache_info()
> >   arch_topology: Build cacheinfo from primary CPU
> 
> This series seems to have broken existing systems, as reported here:
> 	https://lore.kernel.org/r/046f08cb-0610-48c9-af24-4804367df177@nvidia.com
> 
> so I'm going to drop it from the queue at this point in time.  Please
> work to resolve this before resubmitting it.

Also note that there was a non-trivial number of follow-on fixes for
patches in this series that I had to backport to the tree as well.
When/if you resubmit this, please also include all of those fixes also.

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ