lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <znbfqcbanvur2et7oy2yakkvjmfsnzvkphc3frohy7jh7hggcq@5e3nvk7hiijc>
Date: Mon, 28 Apr 2025 14:12:40 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Hailong Liu <hailong.liu@...o.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        "surenb@...gle.com" <surenb@...gle.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        黄朝阳 (Zhaoyang Huang) <zhaoyang.huang@...soc.com>,
        "zhangpeng.00@...edance.com" <zhangpeng.00@...edance.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [bug] [stable-6.6.66] [maple_tree] [mmap] mab_mas_cp+0xb0/0x278
 invalid maple_state for spanning

* Hailong Liu <hailong.liu@...o.com> [250428 10:34]:
> Hi:
> 
> After upgrade to kernel-6.6-y we face a panic on vma_merge()
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000078

Thank you for reporting this issue upstream.

There should also be a warning prior to this issue pointing out that
there are no nodes left from the preallocation?

> Mem abort info:
>   ESR = 0x0000000096000006
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x06: level 2 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 39-bit VAs, pgdp=0000000a381d0000
> [0000000000000078] pgd=0800000a381d1003, p4d=0800000a381d1003, pud=0800000a381d1003, pmd=0000000000000000
> Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
> Skip md ftrace buffer dump for: 0x1609e0
> CPU: 7 PID: 11563 Comm: x Tainted: G        W  OE      6.6.56-android15
> Hardware name: Qualcomm Technologies, Inc. Parrot QRD, Alpha-M (DT)
> pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : mab_mas_cp+0xb0/0x278
> lr : mas_spanning_rebalance+0x830/0xeb4
> sp : ffffffc0b2323410
> x29: ffffffc0b2323420 x28: 0000000000000009 x27: 0000000000000001
> x26: ffffff804c0ddb0c x25: 0000000000000001 x24: 0000000000000008
> x23: 000000000000000c x22: 000000000000000c x21: ffffffc0b23236d0
> x20: 0000000000000000 x19: ffffffc0b2323690 x18: ffffffc0ac5a2088
> x17: 0000000000000000 x16: ffffff89b7d68d48 x15: ffffff897cfb6898
> x14: ffffff89ca568000 x13: ffffff88082e74b0 x12: 0000000000000000
> x11: 000000000000000f x10: 0000000000000080 x9 : 000000000000000d
> x8 : 000000000000000d x7 : ffffff89d16690c8 x6 : ffffff87b3847bb8
> x5 : ffffff804c0ddbe8 x4 : 0000000000000000 x3 : ffffffc0b23234c8
> x2 : 0000000000000019 x1 : 000000000000000d x0 : 0000000000000080
> Call trace:
>  mab_mas_cp+0xb0/0x278
>  mas_spanning_rebalance+0x830/0xeb4
>  mas_wr_spanning_store+0x8ac/0xa58
>  mas_wr_store_entry+0x130/0x180
>  mas_store_prealloc+0x98/0x1bc
>  vma_iter_store+0x64/0x74
>  vma_merge+0x5e4/0x73c
>  mmap_region+0x8d8/0xa30
>  do_mmap+0x3e0/0x578
>  vm_mmap_pgoff+0x1a0/0x1f8
>  ksys_mmap_pgoff+0x78/0xf4
>  __arm64_sys_mmap+0x34/0x44
>  invoke_syscall+0x58/0x114
>  el0_svc_common+0x80/0xe0
>  do_el0_svc+0x1c/0x28
>  el0_svc+0x38/0x68
>  el0t_64_sync_handler+0x68/0xbc
>  el0t_64_sync+0x1a8/0x1ac
> 
> the issue introduced by bdc136e2b05f ("mm: resolve faulty mmap_region() error path behaviour")

I'm not entirely sure this is the commit to blame.

> 
> the reason is that call vma_iter_prealloc() twice and the maple_state is invalid. I write a reproducer here
> by cat /proc/maple_test_merge the patch in attachment(maple_tree_debug.patch).

Not quite correct.  The reason is that the vma iterator is adjusted by
the vma_merge() code to span the previous node without resetting the vma
iterator state prior to the second vma_iter_prealloc() call, but then
vma_iter_store() DOES reset the vma iterator prior to the store.

It should be fine to call vma_iter_prealloc() twice.  This code does not
reset the maple state; the reset is in vma_iter_store() via
vma_iter_invalidate().

The vma iterator is not reset during vma_iter_config() to avoid a
re-walk to the exact same place in the tree, for most cases.  This
optimisation would really be the root of the issue.

Looking into this further, the only reason we don't get a corrupt tree
is that the vma_iter_store() code detects the issue and resets the
iterator itself.  Unfortunately, we then run out of preallocated nodes.
There is a warning to detect this issue, but it is only enabled by
CONFIG_DEBUG_VM_MAPLE_TREE, which is heavy handed.

Spanning the next node is already checked in the mas_preallocate() code,
so the fix could go in the vma iterator, the vma code, or even the
underlying maple tree code.

Fixing the maple tree code directly means that we can still get into
this state in the future when people call vma_iter_config() between
vma_iter_prealloc() and vma_iter_store() - which should not be allowed
either.

> 
> the reproducer simulates vma mmap at frist. then mmap_region() vma_merge(), 
> 
> the code from stable-6.6.y
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/tree/mm/mmap.c?h=linux-6.6.y
> 
> the reason show as follows
> maple_tree
>                parent
>     leaf_1(..|c)   leaf_2(d..f|..)
> 
> __mmap_region(addr=d)
>     1. vma_iter_prealloc()
>         mas_state node=leaf_2 alloc=*mt_alloc_one* new_node_a
> 
>     2. mmap_file(file, vma)
>         the vm_flags changed by some driver (ashmem) https://android.googlesource.com/kernel/msm/+/android-6.0.1_r0.74/drivers/staging/android/ashmem.c#312

btw, this is an out of tree driver.

I also hate the optimisation here of trying to merge a second time after
a driver changes its flags.

> 
> vma_merge() (c can merge with d)
>     3. vma_prev()
>         mas_state node=leaf_1 alloc=new_node_a
>     5. vma_iter_config()
>         mas_state node=parent alloc=new_node_a
>     4. vma_iter_prealloc()
>         mas_state node=parent alloc=new_node_a
>     6. vma_iter_store() --> panic
>         use invalid new_node_a for spanning write
> 
> 
> IMO, this issue can be fixed by mmap side maybe conflict with the patch. if
> fix in maple_tree which need to destory the new_node_a

No.  Destroying new_node_a will make a larger regression.  The current
code will either use those nodes or add to them.  Any unused nodes are
freed accordingly.

I think we need to do several things to avoid this situation.

First, the maple tree needs to do a better job detecting when to reset
the maple state.  I don't like this because it's supposed to be the
advanced interface.

Unfortunately it is a bit more work than I would like - but that work is
necessary for the newer code that preallocates less anyways, so we're
going to eat into that savings a bit.  Most of this is cache-hot anyways
though.

Second, we should change the detection of the issue in the vma iterator
store code to a CONFIG_DEBUG_VM so that it is shown to more users.  This
should mean that it will become less likely to get into a situation
where the vma iterator has issues during a store.

Another part of the vma iterator fix would be commenting the iterator
code to point out potential misuses.

Third, we should add more testing for both the maple tree and vma
userspace code for these scenarios.

...

Thanks,
Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ