lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <91c79fa5-ea6f-4fe1-8458-bfc0dc55a20e@nokia.com>
Date: Wed, 20 Mar 2024 15:13:01 +0100
From: Stefan Wiehler <stefan.wiehler@...ia.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: Print provenance information for addresses in
 registers

Hi Mark,

> Regardless of what 32-bit ARM does, I don't think it makes sense to dump this
> for all the GPRs. It's incredibly verbose and not all that helpful, e.g.
> testing with the LKDTM WRITE_RO test I get:
> 
> | # echo WRITE_RO > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry WRITE_RO
> | lkdtm: attempting bad rodata write at ffffadee958c1bf0
> | Unable to handle kernel write to read-only memory at virtual address ffffadee958c1bf0
> | Mem abort info:
> |   ESR = 0x000000009600004e
> |   EC = 0x25: DABT (current EL), IL = 32 bits
> |   SET = 0, FnV = 0
> |   EA = 0, S1PTW = 0
> |   FSC = 0x0e: level 2 permission fault
> | Data abort info:
> |   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
> |   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> |   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d6d000
> | [ffffadee958c1bf0] pgd=100000043ffff803, p4d=100000043ffff803, pud=100000043fffe803, pmd=0060000041600f81
> | Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
> | Modules linked in:
> | CPU: 4 PID: 147 Comm: sh Not tainted 6.8.0-rc3-00001-gc66454d88811 #1
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> | pc : lkdtm_WRITE_RO+0x3c/0x54
> | lr : lkdtm_WRITE_RO+0x24/0x54
> | sp : ffff800080913c80
> | x29: ffff800080913c80 x28: ffff0000c09ca200 x27: 0000000000000000
> | x26: 0000000000000000 x25: 0000000000000000 x24: 0000aaaaee418930
> | x23: ffff0000c0645780 x22: ffff800080913df0 x21: ffffadee96c957a0
> | x20: 0000000000000009 x19: ffffadee958c1000 x18: 0000000000000006
> | x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080913720
> | x14: 0000000000000000 x13: ffffadee969310b0 x12: 0000000000000432
> | x11: 0000000000000166 x10: ffffadee969890b0 x9 : ffffadee969310b0
> | x8 : 00000000ffffefff x7 : ffffadee969890b0 x6 : 80000000fffff000
> | x5 : ffff0003fdfb5cc8 x4 : 0000000000000000 x3 : 0000000000000000
> | x2 : 00000000abcd1234 x1 : 000000000198b861 x0 : ffffadee95e334a0
> | Call trace:
> |  lkdtm_WRITE_RO+0x3c/0x54
> |  lkdtm_do_action+0x1c/0x30
> |  direct_entry+0xbc/0x144
> |  full_proxy_write+0x60/0xb4
> |  vfs_write+0xd0/0x35c
> |  ksys_write+0x70/0x104
> |  __arm64_sys_write+0x1c/0x28
> |  invoke_syscall+0x48/0x110
> |  el0_svc_common.constprop.0+0x40/0xe0
> |  do_el0_svc+0x1c/0x28
> |  el0_svc+0x34/0xb4
> |  el0t_64_sync_handler+0x120/0x12c
> |  el0t_64_sync+0x190/0x194
> | Register x0 information: 0-page vmalloc region starting at 0xffffadee95470000 allocated at paging_init+0x148/0x6f4
> | Register x1 information: non-paged memory
> | Register x2 information: non-paged memory
> | Register x3 information: NULL pointer
> | Register x4 information: NULL pointer
> | Register x5 information: non-slab/vmalloc memory
> | Register x6 information: non-paged memory
> | Register x7 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
> | Register x8 information: non-paged memory
> | Register x9 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
> | Register x10 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
> | Register x11 information: non-paged memory
> | Register x12 information: non-paged memory
> | Register x13 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
> | Register x14 information: NULL pointer
> | Register x15 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
> | Register x16 information: NULL pointer
> | Register x17 information: NULL pointer
> | Register x18 information: non-paged memory
> | Register x19 information: 0-page vmalloc region starting at 0xffffadee95470000 allocated at paging_init+0x148/0x6f4
> | Register x20 information: non-paged memory
> | Register x21 information: 0-page vmalloc region starting at 0xffffadee96910000 allocated at paging_init+0x1d8/0x6f4
> | Register x22 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
> | Register x23 information: slab dentry start ffff0000c0645780 pointer offset 0 size 192
> | Register x24 information: non-paged memory
> | Register x25 information: NULL pointer
> | Register x26 information: NULL pointer
> | Register x27 information: NULL pointer
> | Register x28 information: slab task_struct start ffff0000c09ca200 pointer offset 0 size 4352
> | Register x29 information: 4-page vmalloc region starting at 0xffff800080910000 allocated at kernel_clone+0x64/0x360
> | Code: f2b579a2 900089c0 ca020021 91128000 (f905fa61)
> | ---[ end trace 0000000000000000 ]---
> 
> Looking at the dump, the vast majority of those "Register X?? information:"
> lines are irrelevant, and it's difficult to see the wood for the trees.
> 
> Why do we need this for (almost) all registers, rather than just the faulting
> address?

We often get bug reports of rarely reproducing issues on customer sites
where we would like to dump as much helpful information as possible.

Let me show you an example of an ARM32 oops where register provenance
information proved to be quite useful (abbreviated):

  Unable to handle kernel NULL pointer dereference at virtual address 0000001c
  pgd = ea66fa29
  [0000001c] *pgd=8461ab003, *pmd=8f7d9c003
  Internal error: Oops: 207 [#1] PREEMPT SMP ARM
  ...
  PC is at do_open_execat+0x180/0x228
  LR is at 0x0
  pc : [<c03f9640>]    lr : [<00000000>]    psr: 40070013
  sp : c7c27ee8  ip : 00001020  fp : 0002fde4
  r10: 0000000b  r9 : 00000001  r8 : 00000000
  r7 : 00000100  r6 : c66aa0c0  r5 : c15f8000  r4 : 00000000
  r3 : 000a801d  r2 : 00001020  r1 : 00020020  r0 : c417f990
  Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5387d  Table: 46572940  DAC: 55555555
  Register r0 information: slab dentry start c417f990 pointer offset 0 size 36
  Register r1 information: non-paged memory
  Register r2 information: non-paged memory
  Register r3 information: non-paged memory
  Register r4 information: NULL pointer
  Register r5 information: slab names_cache start c15f8000 pointer offset 0 size 4096
  Register r6 information: slab filp start c66aa0c0 pointer offset 0
  Register r7 information: non-paged memory
  Register r8 information: NULL pointer
  Register r9 information: non-paged memory
  Register r10 information: non-paged memory
  Register r11 information: non-paged memory
  Register r12 information: non-paged memory
  Process ddal (pid: 14765, stack limit = 0xdb737995)
  Stack: (0xc7c27ee8 to 0xc7c28000)
  ...
  [<c03f9640>] (do_open_execat) from [<c03f987c>] (bprm_execve+0x150/0x6b4)
  [<c03f987c>] (bprm_execve) from [<c03f9f8c>] (do_execveat_common+0x1ac/0x208)
  [<c03f9f8c>] (do_execveat_common) from [<c03fae08>] (sys_execve+0x38/0x40)
  [<c03fae08>] (sys_execve) from [<c0200140>] (ret_fast_syscall+0x0/0x40)
  Exception stack(0xc7c27fa8 to 0xc7c27ff0)
  ...

>From the disassembly we can see that the null pointer comes from r6:

  (gdb) disassemble do_open_execat+0x180
  ...
     0xc03f9638 <+376>:   ldr     r0, [r6, #12]
     0xc03f963c <+380>:   ldr     lr, [r0, #40]   ; 0x28
     0xc03f9640 <+384>:   ldr     r3, [lr, #28]
  ...

Thanks to register provenance information we know now that r6 points
into the filp SLAB cache, and by following the offsets added we can find
the null pointer entry.

> In die_kernel_fault() we already print some information regarding the psecific
> faulting address:
> 
> | static void die_kernel_fault(const char *msg, unsigned long addr,
> |                              unsigned long esr, struct pt_regs *regs)
> | {
> |         bust_spinlocks(1);
> |
> |         pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
> |                  addr);
> |
> |         kasan_non_canonical_hook(addr);
> |
> |         mem_abort_decode(esr);
> |
> |         show_pte(addr);
> |         die("Oops", regs, esr);
> |         bust_spinlocks(0);
> |         make_task_dead(SIGKILL);
> | }
> 
> ... and IMO it'd make more sense to call mem_dump_obj() there, e.g.
> 
> | static void die_kernel_fault(const char *msg, unsigned long addr,
> |                              unsigned long esr, struct pt_regs *regs)
> | {
> |         bust_spinlocks(1);
> |
> |         pr_alert("Unable to handle kernel %s at virtual address %016lx\n", msg,
> |                  addr);
> |
> |         pr_alert("Fauling address:");
> |         mem_dump_obj((void *)addr);
> |         kasan_non_canonical_hook(addr);
> |
> |         mem_abort_decode(esr);
> |
> |         show_pte(addr);
> |         die("Oops", regs, esr);
> |         bust_spinlocks(0);
> |         make_task_dead(SIGKILL);
> | }
> 
> ... which produces significantly more legible output, e.g.
> 
> | # echo WRITE_RO > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry WRITE_RO
> | lkdtm: attempting bad rodata write at ffffa66f5e2c1bf0
> | Unable to handle kernel write to read-only memory at virtual address ffffa66f5e2c1bf0
> | Fauling address: 0-page vmalloc region starting at 0xffffa66f5de70000 allocated at paging_init+0x148/0x6f4
> | Mem abort info:
> |   ESR = 0x000000009600004e
> |   EC = 0x25: DABT (current EL), IL = 32 bits
> |   SET = 0, FnV = 0
> |   EA = 0, S1PTW = 0
> |   FSC = 0x0e: level 2 permission fault
> | Data abort info:
> |   ISV = 0, ISS = 0x0000004e, ISS2 = 0x00000000
> |   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> |   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d6d000
> | [ffffa66f5e2c1bf0] pgd=100000043ffff803, p4d=100000043ffff803, pud=100000043fffe803, pmd=0060000041600f81
> | Internal error: Oops: 000000009600004e [#1] PREEMPT SMP
> | Modules linked in:
> | CPU: 3 PID: 148 Comm: sh Not tainted 6.8.0-rc3-00002-gf95e3bd06270-dirty #2
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> | pc : lkdtm_WRITE_RO+0x3c/0x54
> | lr : lkdtm_WRITE_RO+0x24/0x54
> | sp : ffff80008098bc80
> | x29: ffff80008098bc80 x28: ffff0000c144a200 x27: 0000000000000000
> | x26: 0000000000000000 x25: 0000000000000000 x24: 0000aaab235e21d0
> | x23: ffff0000c06459c0 x22: ffff80008098bdf0 x21: ffffa66f5f6957a0
> | x20: 0000000000000009 x19: ffffa66f5e2c1000 x18: 0000000000000006
> | x17: 0000000000000000 x16: 0000000000000000 x15: ffff80008098b720
> | x14: 0000000000000000 x13: ffffa66f5f3310b0 x12: 0000000000000432
> | x11: 0000000000000166 x10: ffffa66f5f3890b0 x9 : ffffa66f5f3310b0
> | x8 : 00000000ffffefff x7 : ffffa66f5f3890b0 x6 : 80000000fffff000
> | x5 : ffff0003fdf9fcc8 x4 : 0000000000000000 x3 : 0000000000000000
> | x2 : 00000000abcd1234 x1 : 000000000198b861 x0 : ffffa66f5e8334b0
> | Call trace:
> |  lkdtm_WRITE_RO+0x3c/0x54
> |  lkdtm_do_action+0x1c/0x30
> |  direct_entry+0xbc/0x144
> |  full_proxy_write+0x60/0xb4
> |  vfs_write+0xd0/0x35c
> |  ksys_write+0x70/0x104
> |  __arm64_sys_write+0x1c/0x28
> |  invoke_syscall+0x48/0x110
> |  el0_svc_common.constprop.0+0x40/0xe0
> |  do_el0_svc+0x1c/0x28
> |  el0_svc+0x34/0xb4
> |  el0t_64_sync_handler+0x120/0x12c
> |  el0t_64_sync+0x190/0x194
> | Code: f2b579a2 900089c0 ca020021 9112c000 (f905fa61)
> | ---[ end trace 0000000000000000 ]---
> 
> ... and likewise even when we hit addresses without KASAN shadow:
> 
> | # echo ACCESS_NULL > /sys/kernel/debug/provoke-crash/DIRECT
> | lkdtm: Performing direct entry ACCESS_NULL
> | lkdtm: attempting bad read at 0000000000000000
> | Unable to handle kernel paging request at virtual address dfff800000000000
> | Fauling address: non-paged memory
> | KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
> | Mem abort info:
> |   ESR = 0x0000000096000005
> |   EC = 0x25: DABT (current EL), IL = 32 bits
> |   SET = 0, FnV = 0
> |   EA = 0, S1PTW = 0
> |   FSC = 0x05: level 1 translation fault
> | Data abort info:
> |   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> |   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> |   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> | [dfff800000000000] address between user and kernel address ranges
> | Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
> | Modules linked in:
> | CPU: 2 PID: 149 Comm: sh Not tainted 6.8.0-rc3-00002-gf95e3bd06270-dirty #3
> | Hardware name: linux,dummy-virt (DT)
> | pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> | pc : lkdtm_ACCESS_NULL+0x2c/0xa4
> | lr : lkdtm_ACCESS_NULL+0x24/0xa4
> | sp : ffff800080d87ab0
> | x29: ffff800080d87ab0 x28: ffff0000c5296f14 x27: 1fffe00018a52df5
> | x26: ffff0000c5296fa0 x25: 1fffe00018a52df4 x24: ffff0000c8fcda00
> | x23: 0000aaaadb0a7040 x22: ffff800080d87d70 x21: ffffcb98b050dae0
> | x20: 000000000000000c x19: ffffcb98acd53f40 x18: ffff0003fdff56c0
> | x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> | x14: 0000000000000000 x13: 3030303030303030 x12: ffff7000101b0ee1
> | x11: 1ffff000101b0ee0 x10: ffff7000101b0ee0 x9 : dfff800000000000
> | x8 : 00008fffefe4f120 x7 : ffff800080d87707 x6 : 0000000000000001
> | x5 : ffff800080d87700 x4 : 1fffe000191f9b41 x3 : 0000000000000000
> | x2 : 0000000000000000 x1 : ffff0000c8fcda00 x0 : dfff800000000000
> | Call trace:
> |  lkdtm_ACCESS_NULL+0x2c/0xa4
> |  lkdtm_do_action+0x3c/0x68
> |  direct_entry+0x130/0x228
> |  full_proxy_write+0xd4/0x140
> |  vfs_write+0x1e4/0x8ec
> |  ksys_write+0xf4/0x1e0
> |  __arm64_sys_write+0x6c/0x9c
> |  invoke_syscall+0x6c/0x254
> |  el0_svc_common.constprop.0+0xac/0x230
> |  do_el0_svc+0x40/0x58
> |  el0_svc+0x38/0x70
> |  el0t_64_sync_handler+0x120/0x12c
> |  el0t_64_sync+0x190/0x194
> | Code: aa1303e0 97adbc11 d2d00000 f2fbffe0 (39c00000)
> | ---[ end trace 0000000000000000 ]---

I understand that the proposal is currently too verbose, but printing
provenance information only for the faulting address would not have
helped in above example. What do you think about printing only registers
that hold information useful for debugging, i.e. by excluding null
pointers and non-paged areas?

> I would also note that X30 can be used as a scratch register in the body of
> functions and might hold an interesting address, whereas x2 *should* always
> point to a frame record on the stack. So I don't understand why we'd both
> dumping x29 but skip x30.

Thanks for pointing this out, I must have misunderstood ARM64 register
usage.

Kind regards,

Stefan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ