lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <ZYvjiqX6EsL15moe@perf> Date: Wed, 27 Dec 2023 17:42:50 +0900 From: Youngmin Nam <youngmin.nam@...sung.com> To: rafael@...nel.org, len.brown@...el.com, pavel@....cz, gregkh@...uxfoundation.org Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org, d7271.choe@...sung.com, janghyuck.kim@...sung.com, hyesoo.yu@...sung.com, youngmin.nam@...sung.com Subject: [BUG] mutex deadlock of dpm_resume() in low memory situation Hi all. I'm reporting a issue which looks like a upstream kernel bug. We are using 6.1 but I think all of version can cause this issue. A mutex deadlock issue occured on low mem situation when the device runs dpm_resume(). Here's the problematic situation. #1. Currently, the device is on low mem situation as below. [4: binder:569_5:27109] SLUB: Unable to allocate memory on node -1, gfp=0xb20(GFP_ATOMIC|__GFP_ZERO) [4: binder:569_5:27109] cache: kmalloc-128, object size: 128, buffer size: 128, default order: 0, min order: 0 [4: binder:569_5:27109] node 0: slabs: 1865, objs: 59680, free: 0 [4: binder:569_5:27109] binder:569_5: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 [4: binder:569_5:27109] CPU: 4 PID: 27109 Comm: binder:569_5 Tainted: G S C E 6.1.43-android14-11-abS921BXXU1AWLB #1 [4: binder:569_5:27109] Hardware name: Samsung E1S EUR OPENX board based on S5E9945 (DT) [4: binder:569_5:27109] Call trace: [4: binder:569_5:27109] dump_backtrace+0xf4/0x118 [4: binder:569_5:27109] show_stack+0x18/0x24 [4: binder:569_5:27109] dump_stack_lvl+0x60/0x7c [4: binder:569_5:27109] dump_stack+0x18/0x38 [4: binder:569_5:27109] warn_alloc+0xf4/0x190 [4: binder:569_5:27109] __alloc_pages_slowpath+0x10ec/0x12ac [4: binder:569_5:27109] __alloc_pages+0x27c/0x2fc [4: binder:569_5:27109] new_slab+0x17c/0x4e0 [4: binder:569_5:27109] ___slab_alloc+0x4e4/0x8a8 [4: binder:569_5:27109] __slab_alloc+0x34/0x6c [4: binder:569_5:27109] __kmem_cache_alloc_node+0x1f4/0x260 [4: binder:569_5:27109] kmalloc_trace+0x4c/0x144 [4: binder:569_5:27109] async_schedule_node_domain+0x40/0x1ec [4: binder:569_5:27109] async_schedule_node+0x18/0x28 [4: binder:569_5:27109] dpm_suspend+0xfc/0x48c #2. The process A runs dpm_resume() and acquired "dpm_list_mtx" lock as below. 1000 void dpm_resume(pm_message_t state) 1001 { 1002 struct device *dev; 1003 ktime_t starttime = ktime_get(); 1004 1005 trace_suspend_resume(TPS("dpm_resume"), state.event, true); 1006 might_sleep(); 1007 1008 mutex_lock(&dpm_list_mtx); <-------- process A acquired the lock 1009 pm_transition = state; 1010 async_error = 0; 1011 1012 list_for_each_entry(dev, &dpm_suspended_list, power.entry) 1013 dpm_async_fn(dev, async_resume); #3. The process A continues to run below functions as below. dpm_async_fn() --> async_schedule_dev() --> async_schedule_node() --> async_schedule_node_domain() #4. The kzalloc() will be failed because of lowmen situation. 165 async_cookie_t async_schedule_node_domain(async_func_t func, void *data, 166 int node, struct async_domain *domain) 167 { 168 struct async_entry *entry; 169 unsigned long flags; 170 async_cookie_t newcookie; 171 172 /* allow irq-off callers */ 173 entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC); <--- Will be failied 174 175 /* 176 * If we're out of memory or if there's too much work 177 * pending already, we execute synchronously. 178 */ 179 if (!entry || atomic_read(&entry_count) > MAX_WORK) { 180 kfree(entry); 181 spin_lock_irqsave(&async_lock, flags); 182 newcookie = next_cookie++; 183 spin_unlock_irqrestore(&async_lock, flags); 184 185 /* low on memory.. run synchronously */ 186 func(data, newcookie); <--- the process A will run func() that is async_resume() 187 return newcookie; 5. The process A continues to run below functions as below. async_resume() --> device_resume() --> dpm_wait_for_superior() 6. The process A is trying to get the lock which was acquired by A. 278 static bool dpm_wait_for_superior(struct device *dev, bool async) 279 { 280 struct device *parent; 281 282 /* 283 * If the device is resumed asynchronously and the parent's callback 284 * deletes both the device and the parent itself, the parent object may 285 * be freed while this function is running, so avoid that by reference 286 * counting the parent once more unless the device has been deleted 287 * already (in which case return right away). 288 */ 289 mutex_lock(&dpm_list_mtx); So we think this situation can make mutex dead lock issue in suspend/resume sequence. Here's the process A's callstack in kernel log. (binder:569:5) I[4: swapper/4: 0] pid uTime sTime last_arrival last_queued stat cpu task_struct comm [wait channel] I[4: swapper/4: 0] 27109 0 1758074533 2396019774802 0 D( 2) 5 ffffff8044bc92c0 binder:569_5 [dpm_wait_for_superior] I[4: swapper/4: 0] Mutex: dpm_list_mtx+0x0/0x30: owner[0xffffff8044bc92c0 binder:569_5 :27109] I[4: swapper/4: 0] Call trace: I[4: swapper/4: 0] __switch_to+0x174/0x338 I[4: swapper/4: 0] __schedule+0x5ec/0x9cc I[4: swapper/4: 0] schedule+0x7c/0xe8 I[4: swapper/4: 0] schedule_preempt_disabled+0x24/0x40 I[4: swapper/4: 0] __mutex_lock+0x408/0xdac I[4: swapper/4: 0] __mutex_lock_slowpath+0x14/0x24 I[4: swapper/4: 0] mutex_lock+0x40/0xec <-------- trying to acquire dpm_list_mtx again I[4: swapper/4: 0] dpm_wait_for_superior+0x30/0x148 I[4: swapper/4: 0] device_resume+0x38/0x1e4 I[4: swapper/4: 0] async_resume+0x24/0xf4 I[4: swapper/4: 0] async_schedule_node_domain+0xb0/0x1ec I[4: swapper/4: 0] async_schedule_node+0x18/0x28 I[4: swapper/4: 0] dpm_resume+0xbc/0x578 <------- acquired dpm_list_mtx I[4: swapper/4: 0] dpm_resume_end+0x1c/0x38 I[4: swapper/4: 0] suspend_devices_and_enter+0x83c/0xb2c I[4: swapper/4: 0] pm_suspend+0x34c/0x618 I[4: swapper/4: 0] state_store+0x104/0x144 Could you look into this issue ?
Powered by blists - more mailing lists