linux-kernel - 回复: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ef92dcc88e914c9c8d8dbcc3adbb06bb@honor.com>
Date: Tue, 18 Feb 2025 02:51:00 +0000
From: gaoxu <gaoxu2@...or.com>
To: Barry Song <21cnbao@...il.com>
CC: Andrew Morton <akpm@...ux-foundation.org>, "linux-mm@...ck.org"
	<linux-mm@...ck.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Suren Baghdasaryan <surenb@...gle.com>, Yosry
 Ahmed <yosry.ahmed@...ux.dev>, yipengxiang <yipengxiang@...or.com>
Subject: 回复: [PATCH v3] mm: Fix possible NULL pointer dereference in __swap_duplicate

> 
> On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@...or.com> wrote:
> >
> > Add a NULL check on the return value of swp_swap_info in
> > __swap_duplicate to prevent crashes caused by NULL pointer dereference.
> >
> > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > to CPU cache issues or DDR bit flips. The probability of this issue is
> > very small, and the stack info we encountered is as follows：
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 0000000000000058
> > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> >   ESR = 0x0000000096000005
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> >   FSC = 0x05: level 1 translation fault Data abort info:
> >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > pgd=0000000000000000, p4d=0000000000000000,
> > pud=0000000000000000
> > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > buffer dump for: 0x1609e0 ...
> > pc : swap_duplicate+0x44/0x164
> > lr : copy_page_range+0x508/0x1e78
> > sp : ffffffc0f2a699e0
> > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > trace:
> >  swap_duplicate+0x44/0x164
> >  copy_page_range+0x508/0x1e78
> 
> This is really strange since we already have a swap entry check before calling
> swap_duplicate().
> 
> copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>                 pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct
> *dst_vma,
>                 struct vm_area_struct *src_vma, unsigned long addr, int
> *rss) {
>         unsigned long vm_flags = dst_vma->vm_flags;
>         pte_t orig_pte = ptep_get(src_pte);
>         pte_t pte = orig_pte;
>         struct folio *folio;
>         struct page *page;
>         swp_entry_t entry = pte_to_swp_entry(orig_pte);
> 
>         if (likely(!non_swap_entry(entry))) {
>                 if (swap_duplicate(entry) < 0)
>                         return -EIO;
> ...
> }
> 
> likely the swap_type is larger than MAX_SWAPFILES so we get a NULL?
> 
> static struct swap_info_struct *swap_type_to_swap_info(int type) {
>         if (type >= MAX_SWAPFILES)
>                 return NULL;
> 
>         return READ_ONCE(swap_info[type]); /* rcu_dereference() */ }
> 
> But non_swap_entry() guarantees that swp_type is smaller than
> MAX_SWAPFILES.
> 
> static inline int non_swap_entry(swp_entry_t entry) {
>         return swp_type(entry) >= MAX_SWAPFILES; }
> 
> So another possibility is that we have an overflow of swap_info[] where type is <
> MAX_SWAPFILES but is not a valid existing swapfile?
In the log of this issue, there is a printed entry: get_swap_device:
Bad swap file entry 18000000002d2d2f.
It can be calculated that swp_type(18000000002d2d2f) = 6.
In the Android 15-linux6.6:
system: MAX_SWAPFILES = 28, nr_swapfiles = 1.
Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but greater
than nr_swapfiles, the value of this entry is abnormal.

static unsigned int nr_swapfiles;
static struct swap_info_struct *swap_info[MAX_SWAPFILES];
swap_info is a static array, with its values initialized to 0. 
The size of the array is MAX_SWAPFILES, and the size of valid values in the array is
nr_swapfiles. Therefore, when we validate the validity of swp_type(entry),
we should compare it with nr_swapfiles, not MAX_SWAPFILES.
The code for validating swp_type may need to be modified as follows:
static inline int non_swap_entry(swp_entry_t entry)
{
-	return swp_type(entry) >= MAX_SWAPFILES;
+	return swp_type(entry) >= nr_swapfiles;
}

static struct swap_info_struct *swap_type_to_swap_info(int type)
{
-	if (type >= MAX_SWAPFILES)
+	if (type >= nr_swapfiles)
		return NULL;

	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
}
> 
> I don't see how the current patch contributes to debugging or fixing anything
> related to this dumped stack. Can we dump swp_type() as well?
> 
> >  copy_process+0x1278/0x21cc
> >  kernel_clone+0x90/0x438
> >  __arm64_sys_clone+0x5c/0x8c
> >  invoke_syscall+0x58/0x110
> >  do_el0_svc+0x8c/0xe0
> >  el0_svc+0x38/0x9c
> >  el0t_64_sync_handler+0x44/0xec
> >  el0t_64_sync+0x1a8/0x1ac
> > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > exception
> > SMP: stopping secondary CPUs
> >
> > The patch seems to only provide a workaround, but there are no more
> > effective software solutions to handle the bit flips problem. This
> > path will change the issue from a system crash to a process exception,
> > thereby reducing the impact on the entire machine.
> >
> > Signed-off-by: gao xu <gaoxu2@...or.com>
> > ---
> > v1 -> v2:
> > - Add WARN_ON_ONCE.
> > - update the commit info.
> > v2 -> v3: Delete the review tags (This is my issue, and I apologize).
> > ---
> >
> > mm/swapfile.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94
> > 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry,
> unsigned char usage, int nr)
> >         int err, i;
> >
> >         si = swp_swap_info(entry);
> > +       if (WARN_ON_ONCE(!si))
> 
> I mean, printk something related to swp_type(). This is really strange, but the
> current stack won't help with debugging.
The log can find info related to "get_swap_device: Bad swap file entry xxx"
when an entry encounters an exception. 
Add a print info log like the following:
pr_err("%s%08d\n", Bad swap type, swp_type(entry));
> 
> > +               return -EINVAL;
> >
> >         offset = swp_offset(entry);
> >         VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> SWAPFILE_CLUSTER);
> > --
> > 2.17.1
> 
> Thanks
> Barry