lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAMvDr+R7mrp=Ypk-eQwfKKJ79s79P9gxLgR3REfu-tkqeQVGhw@mail.gmail.com>
Date:   Wed, 2 May 2018 18:37:18 -0400
From:   Yichao Yu <yyc1992@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: Internal kernel error on ZC702 Evaluation Kit with Xilinx kernel

Hi everyone,

I've recently got a few internal errors from the kernel on one of the
few boards that I'm maintaining. The error happens every few days and
only seem to happen on one board and not on another board that has the
same kernel but slightly different userspace/usage pattern/network
connection/external evironment. The location where this happen and the
error reported seems to be pretty consistent (see below). I am not
100% if this is a kernel bug, a xilinx bug, a hardware defect etc but
I'm asking here first hoping to get some advice on how I could debug
this in our setup.

Description of our setup:

The system is a ZC702 eval board.
The kenel is https://github.com/Xilinx/linux-xlnx/commit/9c2e29b2c81dbb1efb7ee4944b18e12226b97513
Config file is attached. Cross compiled from x64 host with gcc 7.2.
Binary files are also available if anyone want to have a look.
FPGA customization defined a memory mapped device which is accessed
from userspace directly with `/dev/kmem` and that doesn't seem to be
part of the problem from the backtrace.
Userspace side, it's running an archlinux-arm with mostly a zmq server
processing a few requests per second.

What I could see myself:
I mostly know only user space stuff including some arm
assembly/registers so from what I can tell from the register dump it
seems that someone is trying to call to `0xfffffffe` from `0xc04ea488`
(-4) which is `pcf8563_read_block_data+0x80/0x8c`? Assuming I
disassembled the same binary I'm running the disassemble of the
function is listed below.
(obtained with `arm-linux-gnueabihf-objdump -S --start-address=0x.... vmlinux`)

So the instruction this happened on doesn't seem like it'll jmp to
that random address. I'm guessing it could also be something in
`dev_err` (disassemble also included) although the range where `lr` is
still holding it's original value doesn't seem suspicious either...

Any comment on how to further debug this, if this is a known problem,
or to confirm whether this is a software or possibly hardware problem
is welcome.

Thanks.

Yichao Yu

serial port log:

cdns-i2c e0004000.i2c: timeout
waiting on completion
rtc-pcf8563 5-0051: pcf8563_read_block_data: read error
Unable to handle kernel paging request at virtual address fffffffe
pgd = c0004000
[fffffffe] *pgd=2fffd861, *pte=00000000, *ppte=00000000
Internal error: Oops - BUG: 80000007 [#1] PREEMPT SMP ARM
Modules linked in: knacs(O) ipv6
CPU: 0 PID: 3520 Comm: kworker/0:2 Tainted: G           O    4.9.0-3-nacs #1
Hardware name: Xilinx Zynq Platform
Workqueue: events rtc_timer_do_work
task: ef381840 task.stack: dfcb8000
PC is at 0xfffffffe
LR is at pcf8563_read_block_data+0x80/0x8c
pc : [<fffffffe>]    lr : [<c04ea488>]    psr: a0080033
sp : dfcb9e38  ip : 00000007  fp : ee9cc640
r10: ee9cc6f4  r9 : 152aad1d  r8 : 4284f200
r7 : ef296800  r6 : 00000000  r5 : ffffffff  r4 : ffffffff
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : fffffffb
Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment none
Control: 18c5387d  Table: 1fcbc04a  DAC: 00000051
Process kworker/0:2 (pid: 3520, stack limit = 0xdfcb8210)
Stack: (0xdfcb9e38 to 0xdfcba000)
9e20:                                                       ffffffff ffffffff
9e40: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
9e60: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff c0132cff
9e80: 0083427e ef6cf400 c0a0dc68 c0a0dc68 4284f200 152aad1d 1f99ec37 0000002d
9ea0: 0000002a 00000000 00000002 00000004 00000076 00000003 00000000 00000000
9ec0: ef296800 c0132cd0 ef00ea00 ef6d8b00 00000000 ef296800 00000000 c017c2d0
9ee0: 5ae90984 00000000 1f99ec37 ef6d2580 ef6d8700 00000000 ef296800 00000000
9f00: ee9cc6f4 00000001 ef296800 c0133cdc c0a02d00 00000000 00000000 ef6d2580
9f20: ef296818 ef6d2598 c0a02d00 00000000 00000000 c0134f4c ef39d300 ef296800
9f40: c0134f18 00000000 ef39d300 ef296800 c0134f18 00000000 00000000 00000000
9f60: 00000000 c01398b8 00000000 00000000 dfcb9f60 ef296800 00000000 00000000
9f80: dfcb9f80 dfcb9f80 00000000 00000000 dfcb9f90 dfcb9f90 dfcb9fac ef39d300
9fa0: c01397cc 00000000 00000000 c0108738 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c04ea488>] (pcf8563_read_block_data) from [<ffffffff>] (0xffffffff)
Code: bad PC value
---[ end trace 5fc36c943949af69 ]---
Unable to handle kernel paging request at virtual address ffffffec
pgd = c0004000
[ffffffec] *pgd=2fffd861, *pte=00000000, *ppte=00000000
Internal error: Oops - BUG: 37 [#2] PREEMPT SMP ARM
Modules linked in: knacs(O) ipv6
CPU: 0 PID: 3520 Comm: kworker/0:2 Tainted: G      D    O    4.9.0-3-nacs #1
Hardware name: Xilinx Zynq Platform
task: ef381840 task.stack: dfcb8000
PC is at kthread_data+0x4/0xc
LR is at wq_worker_sleeping+0x8/0x9c
pc : [<c013a058>]    lr : [<c013533c>]    psr: 20080193
sp : dfcb9c10  ip : ef6d2f20  fp : dfcb9c54
r10: c0142c90  r9 : 00000000  r8 : ef381ba8
r7 : c0a03a60  r6 : c0944a00  r5 : ef381840  r4 : ef6d2a00
r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : ef381840
Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: 18c5387d  Table: 1fcbc04a  DAC: 00000051
Process kworker/0:2 (pid: 3520, stack limit = 0xdfcb8210)
Stack: (0xdfcb9c10 to 0xdfcba000)
9c00:                                     ef6d2a00 c064adf4 00000000 c0120ae8
9c20: 00000011 ee82d540 c0a02040 c093f280 ef09bea8 dfcb9964 ef051740 ef381840
9c40: ef381b20 dfcb9c60 c07c2b00 00000000 dfcb9c5c c0142c90 00000020 c012240c
9c60: dfcb9c60 dfcb9c60 c07c2b08 0000000b 00000020 c010c3d8 dfcb8210 0000000b
9c80: 00000000 60080113 bf000000 00000004 62000000 50206461 61762043 0065756c
9ca0: 00000000 fffffffe dfcb8000 ee9cc6f4 ee9cc640 c0162078 c07c4700 dfcb9ce4
9cc0: c07fa1e4 c01b7a18 000001ff fffffffe dfcb9de8 80000007 00000000 fffffffe
9ce0: dfcb8000 ee9cc6f4 ee9cc640 c011a8ec 80000007 c0115ff0 00000000 dfcb9d0c
9d00: 53425553 45545359 32693d4d 45440063 45434956 32692b3d 2d353a63 31353030
9d20: f0958000 c0a087b4 00000007 c0115c7c fffffffe dfcb9de8 dfcb8000 ee9cc6f4
9d40: ee9cc640 c010136c 00000001 ef151018 00000002 c0a02d00 dfcb9df8 008240b1
9d60: c0a67220 00000001 ee9cc640 c04eca94 00000010 ee953b10 ef32ec00 00000000
9d80: ef151018 00000000 ee9bf200 00000000 ef296800 4284f200 152aad1d ee9cc6f4
9da0: ee9cc640 c0425ae4 008240b1 dfcb9dbc 00000001 c0425b3c c07fa1e4 c081086c
9dc0: ee923400 dfcb9dd8 00000001 c0425cc4 fffffffe a0080033 ffffffff dfcb9e1c
9de0: 4284f200 c010ce24 fffffffb 00000000 00000000 00000000 ffffffff ffffffff
9e00: 00000000 ef296800 4284f200 152aad1d ee9cc6f4 ee9cc640 00000007 dfcb9e38
9e20: c04ea488 fffffffe a0080033 ffffffff 00000051 00000000 ffffffff ffffffff
9e40: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
9e60: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff c0132cff
9e80: 0083427e ef6cf400 c0a0dc68 c0a0dc68 4284f200 152aad1d 1f99ec37 0000002d
9ea0: 0000002a 00000000 00000002 00000004 00000076 00000003 00000000 00000000
9ec0: ef296800 c0132cd0 ef00ea00 ef6d8b00 00000000 ef296800 00000000 c017c2d0
9ee0: 5ae90984 00000000 1f99ec37 ef6d2580 ef6d8700 00000000 ef296800 00000000
9f00: ee9cc6f4 00000001 ef296800 c0133cdc c0a02d00 00000000 00000000 ef6d2580
9f20: ef296818 ef6d2598 c0a02d00 00000000 00000000 c0134f4c ef39d300 ef296800
9f40: c0134f18 00000000 ef39d300 ef296800 c0134f18 00000000 00000000 00000000
9f60: 00000000 c01398b8 00000000 00000000 dfcb9f60 ef296800 00000000 00000000
9f80: dfcb9f80 dfcb9f80 00000001 00010001 dfcb9f90 dfcb9f90 dfcb9fac ef39d300
9fa0: c01397cc 00000000 00000000 c0108738 00000000 00000000 00000000 00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[<c013a058>] (kthread_data) from [<c013533c>] (wq_worker_sleeping+0x8/0x9c)
[<c013533c>] (wq_worker_sleeping) from [<c064adf4>] (__schedule+0x25c/0x47c)
[<c064adf4>] (__schedule) from [<c0142c90>] (do_task_dead+0x8c/0x90)
[<c0142c90>] (do_task_dead) from [<c012240c>] (do_exit+0x650/0x9d0)
[<c012240c>] (do_exit) from [<c010c3d8>] (die+0x22c/0x448)
[<c010c3d8>] (die) from [<c011a8ec>] (__do_kernel_fault.part.0+0x64/0x74)
[<c011a8ec>] (__do_kernel_fault.part.0) from [<c0115ff0>]
(do_page_fault+0x374/0x388)
[<c0115ff0>] (do_page_fault) from [<c010136c>] (do_PrefetchAbort+0x38/0x9c)
[<c010136c>] (do_PrefetchAbort) from [<c010ce24>] (__pabt_svc+0x64/0xa0)
Exception stack(0xdfcb9de8 to 0xdfcb9e30)
9de0:                   fffffffb 00000000 00000000 00000000 ffffffff ffffffff
9e00: 00000000 ef296800 4284f200 152aad1d ee9cc6f4 ee9cc640 00000007 dfcb9e38
9e20: c04ea488 fffffffe a0080033 ffffffff
[<c010ce24>] (__pabt_svc) from [<fffffffe>] (0xfffffffe)
Code: e1a00004 e8bd4070 ea02fddc e5903338 (e5130014)
---[ end trace 5fc36c943949af6a ]---
Fixing recursive fault but reboot is needed!

Disassemble of pcf8563_read_block_data:

c04ea408 <pcf8563_read_block_data>:
c04ea408:       e92d4030        push    {r4, r5, lr}
c04ea40c:       e1a05000        mov     r5, r0
c04ea410:       e1d000b2        ldrh    r0, [r0, #2]
c04ea414:       e24dd024        sub     sp, sp, #36     ; 0x24
c04ea418:       e3a04000        mov     r4, #0
c04ea41c:       e28dc007        add     ip, sp, #7
c04ea420:       e5cd1007        strb    r1, [sp, #7]
c04ea424:       e28d1008        add     r1, sp, #8
c04ea428:       e1cd21b8        strh    r2, [sp, #24]
c04ea42c:       e3a02002        mov     r2, #2
c04ea430:       e1cd00b8        strh    r0, [sp, #8]
c04ea434:       e1cd01b4        strh    r0, [sp, #20]
c04ea438:       e5950018        ldr     r0, [r5, #24]
c04ea43c:       e58d301c        str     r3, [sp, #28]
c04ea440:       e3a03001        mov     r3, #1
c04ea444:       e58d400e        str     r4, [sp, #14]
c04ea448:       e58d400a        str     r4, [sp, #10]
c04ea44c:       e1cd41ba        strh    r4, [sp, #26]
c04ea450:       e1cd30bc        strh    r3, [sp, #12]
c04ea454:       e1cd31b6        strh    r3, [sp, #22]
c04ea458:       e58dc010        str     ip, [sp, #16]
c04ea45c:       eb0009db        bl      c04ecbd0 <i2c_transfer>
c04ea460:       e3500002        cmp     r0, #2
c04ea464:       01a00004        moveq   r0, r4
c04ea468:       1a000001        bne     c04ea474 <pcf8563_read_block_data+0x6c>
c04ea46c:       e28dd024        add     sp, sp, #36     ; 0x24
c04ea470:       e8bd8030        pop     {r4, r5, pc}
c04ea474:       e2850020        add     r0, r5, #32
c04ea478:       e3001658        movw    r1, #1624       ; 0x658
c04ea47c:       e59f200c        ldr     r2, [pc, #12]   ; c04ea490
<pcf8563_read_block_data+0x88>
c04ea480:       e34c1081        movt    r1, #49281      ; 0xc081
c04ea484:       ebfcee00        bl      c0425c8c <dev_err>
c04ea488:       e3e00004        mvn     r0, #4
c04ea48c:       eafffff6        b       c04ea46c <pcf8563_read_block_data+0x64>
c04ea490:       c0749088        .word   0xc0749088

Disassemble of dev_err:

c0425c8c <dev_err>:
c0425c8c:       e92d000e        push    {r1, r2, r3}
c0425c90:       e1a01000        mov     r1, r0
c0425c94:       e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
c0425c98:       e24dd010        sub     sp, sp, #16
c0425c9c:       e28d2008        add     r2, sp, #8
c0425ca0:       e3020df4        movw    r0, #11764      ; 0x2df4
c0425ca4:       e59d3014        ldr     r3, [sp, #20]
c0425ca8:       e34c007d        movt    r0, #49277      ; 0xc07d
c0425cac:       e28dc018        add     ip, sp, #24
c0425cb0:       e58dc004        str     ip, [sp, #4]
c0425cb4:       e58d3008        str     r3, [sp, #8]
c0425cb8:       e28d3004        add     r3, sp, #4
c0425cbc:       e58d300c        str     r3, [sp, #12]
c0425cc0:       ebffff8b        bl      c0425af4 <__dev_printk>
c0425cc4:       e28dd010        add     sp, sp, #16
c0425cc8:       e49de004        pop     {lr}            ; (ldr lr, [sp], #4)
c0425ccc:       e28dd00c        add     sp, sp, #12
c0425cd0:       e12fff1e        bx      lr

Download attachment "config" of type "application/octet-stream" (104728 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ