linux-kernel - Re: [PATCH] bcachefs: Mark bch_inode_info as SLAB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <84de6cb1-57bd-42f7-8029-4203820ef0b4@linux.dev>
Date: Fri, 12 Jul 2024 10:24:55 +0800
From: Youling Tang <youling.tang@...ux.dev>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: linux-bcachefs@...r.kernel.org, linux-kernel@...r.kernel.org,
 Youling Tang <tangyouling@...inos.cn>
Subject: Re: [PATCH] bcachefs: Mark bch_inode_info as SLAB_ACCOUNT

Hi, Kent

On 12/07/2024 09:39, Youling Tang wrote:
> On 12/07/2024 08:03, Kent Overstreet wrote:
>> On Wed, Jul 03, 2024 at 03:09:55PM GMT, Youling Tang wrote:
>>> From: Youling Tang <tangyouling@...inos.cn>
>>>
>>> After commit 230e9fc28604 ("slab: add SLAB_ACCOUNT flag"), we need 
>>> to mark
>>> the inode cache as SLAB_ACCOUNT, similar to commit 5d097056c9a0 
>>> ("kmemcg:
>>> account for certain kmem allocations to memcg")
>>>
>>> Signed-off-by: Youling Tang <tangyouling@...inos.cn>
>> Turns out this was never tested with memcg enabled (!).
>>
>> I'm reverting it, please feel free to send me a fixed version.
> Sorry, my oversight.
>
> The following null pointer dereference is triggered after MEMCG 
> configuration is enabled.
> ```
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] SMP
> CPU: 5 PID: 1702 Comm: umount Not tainted 
> 6.10.0-rc7-ktest-00003-g557bd05b0d4c-dirty #12
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 
> 04/01/2014
> RIP: 0010:list_lru_add+0x83/0x100
> Code: 5f 5d c3 48 8b 45 d0 48 85 c0 74 13 41 80 7c 24 1c 00 48 63 b0 
> 68 06 00 00 74 04 85 f6 79 5e 4d 03 2c 24 49 83 c5 08 4c 89 ea <49> 8b 
> 45 08 49 89 5d 08 48 89 13 48 89 43 08 48 89 18 49 8b 45 10
> RSP: 0018:ffff8881178efd10 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff88810ec140f0 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000017 RDI: ffff8881178efcc8
> RBP: ffff8881178efd48 R08: ffff8881009de780 R09: ffffffff822e0de0
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff888102075c80
> R13: 0000000000000000 R14: ffff88810443e6c0 R15: 0000000000000000
> FS:  00007f9ed1840800(0000) GS:ffff888179940000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000008 CR3: 00000001062b9005 CR4: 0000000000370eb0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>
> Call Trace:
>  <TASK>
>  ? show_regs+0x69/0x70
>  ? __die+0x29/0x70
>  ? page_fault_oops+0x14f/0x3c0
>  ? do_user_addr_fault+0x2d0/0x5b0
>  ? default_wake_function+0x1e/0x30
>  ? exc_page_fault+0x6d/0x130
>  ? asm_exc_page_fault+0x2b/0x30
>  ? list_lru_add+0x83/0x100
>  list_lru_add_obj+0x4b/0x60
>  iput+0x1fe/0x220
>  dentry_unlink_inode+0xbd/0x120
>  __dentry_kill+0x78/0x180
>  dput+0xc7/0x170
>  shrink_dcache_for_umount+0xe8/0x120
>  generic_shutdown_super+0x23/0x150
>  bch2_kill_sb+0x1b/0x30
>  deactivate_locked_super+0x34/0xb0
>  deactivate_super+0x44/0x50
>  cleanup_mnt+0x105/0x160
>  __cleanup_mnt+0x16/0x20
>  task_work_run+0x63/0x90
>  syscall_exit_to_user_mode+0x10d/0x110
>  do_syscall_64+0x57/0x100
>  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f9ed1a7a6e7
> Code: 0c 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 31 f6 
> e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 
> 00 f0 ff ff 77 01 c3 48 8b 15 09 97 0c 00 f7 d8 64 89 02 b8
> RSP: 002b:00007ffef8a29128 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 000055f4671acad8 RCX: 00007f9ed1a7a6e7
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055f4671b1240
> RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ed1bc6244
> R13: 000055f4671b1240 R14: 000055f4671acde0 R15: 000055f4671ac9d0
>  </TASK>
> ```
The direct cause of the BUG is that the return value of 
list_lru_from_memcg_idx()
is NULL, and the execution of l->list will cause NULL pointer dereference.

The return value of list_lru_from_memcg_idx() needs to be determined, 
similar to
commit 5abc1e37afa0 ("mm: list_lru: allocate list_lru_one only when 
needed").

Modified as follows:
diff --git a/mm/list_lru.c b/mm/list_lru.c
index 3fd64736bc45..ee7424c3879d 100644
--- a/mm/list_lru.c
+++ b/mm/list_lru.c
@@ -94,6 +94,9 @@ bool list_lru_add(struct list_lru *lru, struct 
list_head *item, int nid,
         spin_lock(&nlru->lock);
         if (list_empty(item)) {
                 l = list_lru_from_memcg_idx(lru, nid, 
memcg_kmem_id(memcg));
+               if (!l)
+                       goto out;
+
                 list_add_tail(item, &l->list);
                 /* Set shrinker bit if the first element was added */
                 if (!l->nr_items++)
@@ -102,6 +105,7 @@ bool list_lru_add(struct list_lru *lru, struct 
list_head *item, int nid,
                 spin_unlock(&nlru->lock);
                 return true;
         }
+out:
         spin_unlock(&nlru->lock);
         return false;
  }


After ktest test tests/bcachefs/xfstests.ktest can continue to test 
(enable MEMCG
and MEMCG_KMEM).

Thanks,
Youling.