lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <07272a87-b409-464b-a2e6-283ffc903f11@nokia.com>
Date: Fri, 9 Aug 2024 15:43:56 +0200
From: Stefan Wiehler <stefan.wiehler@...ia.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: Print provenance information for addresses in
 registers

Hi Mark,

Did you have a chance to take a look at below reasoning for this change
and my proposal for a slightly less verbose version of this patch?

>> Regardless of what 32-bit ARM does, I don't think it makes sense to dump this
>> for all the GPRs. It's incredibly verbose and not all that helpful, e.g.
>> testing with the LKDTM WRITE_RO test I get:
>>
>> | # echo WRITE_RO > /sys/kernel/debug/provoke-crash/DIRECT
>> | lkdtm: Performing direct entry WRITE_RO
>> | lkdtm: attempting bad rodata write at ffffadee958c1bf0
>> | Unable to handle kernel write to read-only memory at virtual address ffffadee958c1bf0
>> | Mem abort info:
>> |   ESR = 0x000000009600004e
>> |   EC = 0x25: DABT (current EL), IL = 32 bits
>> |   SET = 0, FnV = 0
>> |   EA = 0, S1PTW = 0
>> |   FSC = 0x0e: level 2 permission fault
>> | Data abort info:
>> |   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
>> |   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>> |   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d6d000
>> | [ffffadee958c1bf0] pgd=100000043ffff803, p4d=100000043ffff803, pud=100000043fffe803, pmd=0060000041600f81
>> | Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
>> | Modules linked in:
>> | CPU: 4 PID: 147 Comm: sh Not tainted 6.8.0-rc3-00001-gc66454d88811 #1
>> | Hardware name: linux,dummy-virt (DT)
>> | pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>> | pc : lkdtm_WRITE_RO+0x3c/0x54
>> | lr : lkdtm_WRITE_RO+0x24/0x54
>> | sp : ffff800080913c80
>> | x29: ffff800080913c80 x28: ffff0000c09ca200 x27: 0000000000000000
>> | x26: 0000000000000000 x25: 0000000000000000 x24: 0000aaaaee418930
>> | x23: ffff0000c0645780 x22: ffff800080913df0 x21: ffffadee96c957a0
>> | x20: 0000000000000009 x19: ffffadee958c1000 x18: 0000000000000006
>> | x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080913720
>> | x14: 0000000000000000 x13: ffffadee969310b0 x12: 0000000000000432
>> | x11: 0000000000000166 x10: ffffadee969890b0 x9 : ffffadee969310b0
>> | x8 : 00000000ffffefff x7 : ffffadee969890b0 x6 : 80000000fffff000
>> | x5 : ffff0003fdfb5cc8 x4 : 0000000000000000 x3 : 0000000000000000
>> | x2 : 00000000abcd1234 x1 : 000000000198b861 x0 : ffffadee95e334a0
>> | Call trace:
>> |  lkdtm_WRITE_RO+0x3c/0x54
>> |  lkdtm_do_action+0x1c/0x30
>> |  direct_entry+0xbc/0x144
>> |  full_proxy_write+0x60/0xb4
>> |  vfs_write+0xd0/0x35c
>> |  ksys_write+0x70/0x104
>> |  __arm64_sys_write+0x1c/0x28
>> |  invoke_syscall+0x48/0x110
>> |  el0_svc_common.constprop.0+0x40/0xe0
>> |  do_el0_svc+0x1c/0x28
>> |  el0_svc+0x34/0xb4
>> |  el0t_64_sync_handler+0x120/0x12c
>> |  el0t_64_sync+0x190/0x194
>> | Register x0 information: 0-page vmalloc region starting at 0xffffadee95470000 allocated at paging_init+0x148/0x6f4
>> | Register x1 information: non-paged memory
>> | Register x2 information: non-paged memory
>> | Register x3 information: NULL pointer
>> | Register x4 information: NULL pointer
>> | Register x5 information: non-slab/vmalloc memory
>> | Register x6 information: non-paged memory
>> | Register x7 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
>> | Register x8 information: non-paged memory
>> | Register x9 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
>> | Register x10 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
>> | Register x11 information: non-paged memory
>> | Register x12 information: non-paged memory
>> | Register x13 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
>> | Register x14 information: NULL pointer
>> | Register x15 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
>> | Register x16 information: NULL pointer
>> | Register x17 information: NULL pointer
>> | Register x18 information: non-paged memory
>> | Register x19 information: 0-page vmalloc region starting at 0xffffadee95470000 allocated at paging_init+0x148/0x6f4
>> | Register x20 information: non-paged memory
>> | Register x21 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
>> | Register x22 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
>> | Register x23 information: slab dentry start ffff0000c0645780 pointer offset 0 size 192
>> | Register x24 information: non-paged memory
>> | Register x25 information: NULL pointer
>> | Register x26 information: NULL pointer
>> | Register x27 information: NULL pointer
>> | Register x28 information: slab task_struct start ffff0000c09ca200 pointer offset 0 size 4352
>> | Register x29 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
>> | Code: f2b579a2 900089c0 ca020021 91128000 (f905fa61)
>> | ---[ end trace 0000000000000000 ]---
>>
>> Looking at the dump, the vast majority of those "Register X?? information:"
>> lines are irrelevant, and it's difficult to see the wood for the trees.
>>
>> Why do we need this for (almost) all registers, rather than just the faulting
>> address?
> 
> We often get bug reports of rarely reproducing issues on customer sites
> where we would like to dump as much helpful information as possible.
> 
> Let me show you an example of an ARM32 oops where register provenance
> information proved to be quite useful (abbreviated):
> 
>   Unable to handle kernel NULL pointer dereference at virtual address 0000001c
>   pgd = ea66fa29
>   [0000001c] *pgd=8461ab003, *pmd=8f7d9c003
>   Internal error: Oops: 207 [#1] PREEMPT SMP ARM
>   ...
>   PC is at do_open_execat+0x180/0x228
>   LR is at 0x0
>   pc : [<c03f9640>]    lr : [<00000000>]    psr: 40070013
>   sp : c7c27ee8  ip : 00001020  fp : 0002fde4
>   r10: 0000000b  r9 : 00000001  r8 : 00000000
>   r7 : 00000100  r6 : c66aa0c0  r5 : c15f8000  r4 : 00000000
>   r3 : 000a801d  r2 : 00001020  r1 : 00020020  r0 : c417f990
>   Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
>   Control: 30c5387d  Table: 46572940  DAC: 55555555
>   Register r0 information: slab dentry start c417f990 pointer offset 0 size 36
>   Register r1 information: non-paged memory
>   Register r2 information: non-paged memory
>   Register r3 information: non-paged memory
>   Register r4 information: NULL pointer
>   Register r5 information: slab names_cache start c15f8000 pointer offset 0 size 4096
>   Register r6 information: slab filp start c66aa0c0 pointer offset 0
>   Register r7 information: non-paged memory
>   Register r8 information: NULL pointer
>   Register r9 information: non-paged memory
>   Register r10 information: non-paged memory
>   Register r11 information: non-paged memory
>   Register r12 information: non-paged memory
>   Process ddal (pid: 14765, stack limit = 0xdb737995)
>   Stack: (0xc7c27ee8 to 0xc7c28000)
>   ...
>   [<c03f9640>] (do_open_execat) from [<c03f987c>] (bprm_execve+0x150/0x6b4)
>   [<c03f987c>] (bprm_execve) from [<c03f9f8c>] (do_execveat_common+0x1ac/0x208)
>   [<c03f9f8c>] (do_execveat_common) from [<c03fae08>] (sys_execve+0x38/0x40)
>   [<c03fae08>] (sys_execve) from [<c0200140>] (ret_fast_syscall+0x0/0x40)
>   Exception stack(0xc7c27fa8 to 0xc7c27ff0)
>   ...
> 
> From the disassembly we can see that the null pointer comes from r6:
> 
>   (gdb) disassemble do_open_execat+0x180
>   ...
>      0xc03f9638 <+376>:   ldr     r0, [r6, #12]
>      0xc03f963c <+380>:   ldr     lr, [r0, #40]   ; 0x28
>      0xc03f9640 <+384>:   ldr     r3, [lr, #28]
>   ...
> 
> Thanks to register provenance information we know now that r6 points
> into the filp SLAB cache, and by following the offsets added we can find
> the null pointer entry.
> 
>> In die_kernel_fault() we already print some information regarding the psecific
>> faulting address:
>>
>> | static void die_kernel_fault(const char *msg, unsigned long addr,
>> |                              unsigned long esr, struct pt_regs *regs)
>> | {
>> |         bust_spinlocks(1);
>> |
>> |         pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
>> |                  addr);
>> |
>> |         kasan_non_canonical_hook(addr);
>> |
>> |         mem_abort_decode(esr);
>> |
>> |         show_pte(addr);
>> |         die("Oops", regs, esr);
>> |         bust_spinlocks(0);
>> |         make_task_dead(SIGKILL);
>> | }
>>
>> ... and IMO it'd make more sense to call mem_dump_obj() there, e.g.
>>
>> | static void die_kernel_fault(const char *msg, unsigned long addr,
>> |                              unsigned long esr, struct pt_regs *regs)
>> | {
>> |         bust_spinlocks(1);
>> |
>> |         pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
>> |                  addr);
>> |
>> |         pr_alert("Fauling address:");
>> |         mem_dump_obj((void *)addr);
>> |         kasan_non_canonical_hook(addr);
>> |
>> |         mem_abort_decode(esr);
>> |
>> |         show_pte(addr);
>> |         die("Oops", regs, esr);
>> |         bust_spinlocks(0);
>> |         make_task_dead(SIGKILL);
>> | }
>>
>> ... which produces significantly more legible output, e.g.
>>
>> | # echo WRITE_RO > /sys/kernel/debug/provoke-crash/DIRECT
>> | lkdtm: Performing direct entry WRITE_RO
>> | lkdtm: attempting bad rodata write at ffffa66f5e2c1bf0
>> | Unable to handle kernel write to read-only memory at virtual address ffffa66f5e2c1bf0
>> | Fauling address: 0-page vmalloc region starting at 0xffffa66f5de70000 allocated at paging_init+0x148/0x6f4
>> | Mem abort info:
>> |   ESR = 0x000000009600004e
>> |   EC = 0x25: DABT (current EL), IL = 32 bits
>> |   SET = 0, FnV = 0
>> |   EA = 0, S1PTW = 0
>> |   FSC = 0x0e: level 2 permission fault
>> | Data abort info:
>> |   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
>> |   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>> |   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d6d000
>> | [ffffa66f5e2c1bf0] pgd=100000043ffff803, p4d=100000043ffff803, pud=100000043fffe803, pmd=0060000041600f81
>> | Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
>> | Modules linked in:
>> | CPU: 3 PID: 148 Comm: sh Not tainted 6.8.0-rc3-00002-gf95e3bd06270-dirty #2
>> | Hardware name: linux,dummy-virt (DT)
>> | pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>> | pc : lkdtm_WRITE_RO+0x3c/0x54
>> | lr : lkdtm_WRITE_RO+0x24/0x54
>> | sp : ffff80008098bc80
>> | x29: ffff80008098bc80 x28: ffff0000c144a200 x27: 0000000000000000
>> | x26: 0000000000000000 x25: 0000000000000000 x24: 0000aaab235e21d0
>> | x23: ffff0000c06459c0 x22: ffff80008098bdf0 x21: ffffa66f5f6957a0
>> | x20: 0000000000000009 x19: ffffa66f5e2c1000 x18: 0000000000000006
>> | x17: 0000000000000000 x16: 0000000000000000 x15: ffff80008098b720
>> | x14: 0000000000000000 x13: ffffa66f5f3310b0 x12: 0000000000000432
>> | x11: 0000000000000166 x10: ffffa66f5f3890b0 x9 : ffffa66f5f3310b0
>> | x8 : 00000000ffffefff x7 : ffffa66f5f3890b0 x6 : 80000000fffff000
>> | x5 : ffff0003fdf9fcc8 x4 : 0000000000000000 x3 : 0000000000000000
>> | x2 : 00000000abcd1234 x1 : 000000000198b861 x0 : ffffa66f5e8334b0
>> | Call trace:
>> |  lkdtm_WRITE_RO+0x3c/0x54
>> |  lkdtm_do_action+0x1c/0x30
>> |  direct_entry+0xbc/0x144
>> |  full_proxy_write+0x60/0xb4
>> |  vfs_write+0xd0/0x35c
>> |  ksys_write+0x70/0x104
>> |  __arm64_sys_write+0x1c/0x28
>> |  invoke_syscall+0x48/0x110
>> |  el0_svc_common.constprop.0+0x40/0xe0
>> |  do_el0_svc+0x1c/0x28
>> |  el0_svc+0x34/0xb4
>> |  el0t_64_sync_handler+0x120/0x12c
>> |  el0t_64_sync+0x190/0x194
>> | Code: f2b579a2 900089c0 ca020021 9112c000 (f905fa61)
>> | ---[ end trace 0000000000000000 ]---
>>
>> ... and likewise even when we hit addresses without KASAN shadow:
>>
>> | # echo ACCESS_NULL > /sys/kernel/debug/provoke-crash/DIRECT
>> | lkdtm: Performing direct entry ACCESS_NULL
>> | lkdtm: attempting bad read at 0000000000000000
>> | Unable to handle kernel paging request at virtual address dfff800000000000
>> | Fauling address: non-paged memory
>> | KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
>> | Mem abort info:
>> |   ESR = 0x0000000096000005
>> |   EC = 0x25: DABT (current EL), IL = 32 bits
>> |   SET = 0, FnV = 0
>> |   EA = 0, S1PTW = 0
>> |   FSC = 0x05: level 1 translation fault
>> | Data abort info:
>> |   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>> |   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> |   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> | [dfff800000000000] address between user and kernel address ranges
>> | Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
>> | Modules linked in:
>> | CPU: 2 PID: 149 Comm: sh Not tainted 6.8.0-rc3-00002-gf95e3bd06270-dirty #3
>> | Hardware name: linux,dummy-virt (DT)
>> | pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>> | pc : lkdtm_ACCESS_NULL+0x2c/0xa4
>> | lr : lkdtm_ACCESS_NULL+0x24/0xa4
>> | sp : ffff800080d87ab0
>> | x29: ffff800080d87ab0 x28: ffff0000c5296f14 x27: 1fffe00018a52df5
>> | x26: ffff0000c5296fa0 x25: 1fffe00018a52df4 x24: ffff0000c8fcda00
>> | x23: 0000aaaadb0a7040 x22: ffff800080d87d70 x21: ffffcb98b050dae0
>> | x20: 000000000000000c x19: ffffcb98acd53f40 x18: ffff0003fdff56c0
>> | x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> | x14: 0000000000000000 x13: 3030303030303030 x12: ffff7000101b0ee1
>> | x11: 1ffff000101b0ee0 x10: ffff7000101b0ee0 x9 : dfff800000000000
>> | x8 : 00008fffefe4f120 x7 : ffff800080d87707 x6 : 0000000000000001
>> | x5 : ffff800080d87700 x4 : 1fffe000191f9b41 x3 : 0000000000000000
>> | x2 : 0000000000000000 x1 : ffff0000c8fcda00 x0 : dfff800000000000
>> | Call trace:
>> |  lkdtm_ACCESS_NULL+0x2c/0xa4
>> |  lkdtm_do_action+0x3c/0x68
>> |  direct_entry+0x130/0x228
>> |  full_proxy_write+0xd4/0x140
>> |  vfs_write+0x1e4/0x8ec
>> |  ksys_write+0xf4/0x1e0
>> |  __arm64_sys_write+0x6c/0x9c
>> |  invoke_syscall+0x6c/0x254
>> |  el0_svc_common.constprop.0+0xac/0x230
>> |  do_el0_svc+0x40/0x58
>> |  el0_svc+0x38/0x70
>> |  el0t_64_sync_handler+0x120/0x12c
>> |  el0t_64_sync+0x190/0x194
>> | Code: aa1303e0 97adbc11 d2d00000 f2fbffe0 (39c00000)
>> | ---[ end trace 0000000000000000 ]---
> 
> I understand that the proposal is currently too verbose, but printing
> provenance information only for the faulting address would not have
> helped in above example. What do you think about printing only registers
> that hold information useful for debugging, i.e. by excluding null
> pointers and non-paged areas?

Kind regards,

Stefan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ