[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4xggxtkk44m3xcfucbougtuhr66s2pc7tilpnju7xnyw45facb@odsrpqp57kiz>
Date: Tue, 20 Feb 2024 15:56:53 +0100
From: Maxime Ripard <mripard@...nel.org>
To: Christian König <christian.koenig@....com>
Cc: Bagas Sanjaya <bagasdotme@...il.com>,
Erhard Furtner <erhard_f@...lbox.org>, Linux DRI Development <dri-devel@...ts.freedesktop.org>,
Huang Rui <ray.huang@....com>, Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
Thomas Zimmermann <tzimmermann@...e.de>, David Airlie <airlied@...il.com>,
Daniel Vetter <daniel@...ll.ch>, Karolina Stolarek <karolina.stolarek@...el.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>, Andrew Morton <akpm@...ux-foundation.org>,
Zi Yan <ziy@...dia.com>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Memory Management List <linux-mm@...ck.org>
Subject: Re: Running ttm_device_test leads to list_add corruption. prev->next
should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b.
(prev=ffffa0b1a5c034f0) (kernel 6.7.5)
On Tue, Feb 20, 2024 at 02:28:53PM +0100, Christian König wrote:
> Am 20.02.24 um 10:12 schrieb Bagas Sanjaya:
> > On Mon, Feb 19, 2024 at 11:01:16PM +0100, Erhard Furtner wrote:
> > > Greetings!
> > >
> > > 'modprobe -v ttm-device-test' on my Ryzen 5950X amd64 box and on my Talos II (ppc64) leads to immediate list_add corruption.
> > >
> > > The machines stay useable via VNC but the issue seems to cause memory corruption which shows up later on when PAGE_POISONING is enabled:
> > >
> > > [...]
> > > KTAP version 1
> > > 1..1
> > > KTAP version 1
> > > # Subtest: ttm_device
> > > # module: ttm_device_test
> > > 1..5
> > > ok 1 ttm_device_init_basic
> > > # ttm_device_init_multiple: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_device_test.c:68
> > > Expected list_count_nodes(&ttm_devs[0].device_list) == num_dev, but
> > > list_count_nodes(&ttm_devs[0].device_list) == 4 (0x4)
> > > num_dev == 3 (0x3)
> > > not ok 2 ttm_device_init_multiple
> > > list_add corruption. prev->next should be next (ffffffffc05cd428), but was 6b6b6b6b6b6b6b6b. (prev=ffffa0b1a5c034f0).
> > > ------------[ cut here ]------------
> > > kernel BUG at lib/list_debug.c:32!
> > > invalid opcode: 0000 [#1] SMP NOPTI
> > > CPU: 6 PID: 2129 Comm: kunit_try_catch Tainted: G N 6.7.5-Zen3 #1
> > > Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> > > RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> > > Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> > > RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> > > RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> > > R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> > > FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
> > > Call Trace:
> > > <TASK>
> > > ? __die_body+0x15/0x65
> > > ? die+0x2f/0x48
> > > ? do_trap+0x76/0x109
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? do_error_trap+0x69/0xa6
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? exc_invalid_op+0x4d/0x71
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? asm_exc_invalid_op+0x1a/0x20
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ? __list_add_valid_or_report+0x67/0x9c
> > > ttm_device_init+0x10e/0x157 [ttm]
> > > ttm_device_kunit_init+0x3d/0x51 [ttm_kunit_helpers]
> > > ttm_device_fini_basic+0x6d/0x1b3 [ttm_device_test]
> > > ? timekeeping_get_ns+0x19/0x3b
> > > ? srso_alias_return_thunk+0x5/0xfbef5
> > > ? ktime_get_ts64+0x40/0x92
> > > kunit_try_run_case+0xaf/0x163 [kunit]
> > > ? kunit_try_catch_throw+0x1b/0x1b [kunit]
> > > ? kunit_try_catch_throw+0x1b/0x1b [kunit]
> > > kunit_generic_run_threadfn_adapter+0x15/0x20 [kunit]
> > > kthread+0xcf/0xd7
> > > ? kthread_complete_and_exit+0x1a/0x1a
> > > ret_from_fork+0x23/0x35
> > > ? kthread_complete_and_exit+0x1a/0x1a
> > > ret_from_fork_asm+0x11/0x20
> > > </TASK>
> > > Modules linked in: ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> > > Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> > > RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> > > RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> > > R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> > > FS: 0000000000000000(0000) GS:ffffa0b85eb80000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007ff09c005038 CR3: 000000026ce14000 CR4: 0000000000b50ef0
> > > Key type dns_resolver registered
> > > NFS: Registering the id_resolver key type
> > > Key type id_resolver registered
> > > Key type id_legacy registered
> > > # ttm_device_fini_basic: try timed out
> > > general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#2] SMP NOPTI
> > > CPU: 26 PID: 2119 Comm: modprobe Tainted: G D N 6.7.5-Zen3 #1
> > > Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> > > RIP: 0010:kthread_stop+0x3c/0x78
> > > Code: f0 0f c1 43 28 be 02 00 00 00 85 c0 74 0c 8d 50 01 09 c2 79 0a be 01 00 00 00 e8 f5 31 37 00 48 89 df e8 35 f1 ff ff 48 89 c5 <f0> 80 08 02 48 89 df e8 6a ff ff ff f0 80 4b 02 02 48 89 df e8 f6
> > > RSP: 0018:ffffb23b01fff938 EFLAGS: 00010246
> > > RAX: 6b6b6b6b6b6b6b6b RBX: ffffa0b170ab6040 RCX: 0000000000000000
> > > RDX: 000000006b6b6b6f RSI: 0000000000000002 RDI: 0000000000000000
> > > RBP: 6b6b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b170ab6040
> > > R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> > > FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
> > > Call Trace:
> > > <TASK>
> > > ? __die_body+0x15/0x65
> > > ? die_addr+0x37/0x50
> > > ? exc_general_protection+0x1b6/0x1ec
> > > ? asm_exc_general_protection+0x26/0x30
> > > ? kthread_stop+0x3c/0x78
> > > ? kthread_stop+0x39/0x78
> > > kunit_try_catch_run+0xc9/0x155 [kunit]
> > > kunit_run_case_catch_errors+0x3f/0x93 [kunit]
> > > kunit_run_tests+0x182/0x516 [kunit]
> > > ? kunit_try_run_case_cleanup+0x39/0x39 [kunit]
> > > ? kunit_catch_run_case_cleanup+0x85/0x85 [kunit]
> > > __kunit_test_suites_init+0x64/0x83 [kunit]
> > > kunit_module_notify+0xda/0x177 [kunit]
> > > notifier_call_chain+0x5a/0x92
> > > blocking_notifier_call_chain+0x3e/0x60
> > > do_init_module+0xcb/0x218
> > > init_module_from_file+0x7a/0x99
> > > __do_sys_finit_module+0x162/0x223
> > > do_syscall_64+0x6e/0xd8
> > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > RIP: 0033:0x7f9321f7a479
> > > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 87 89 0c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007ffe2e350908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > > RAX: ffffffffffffffda RBX: 00005590b57cef40 RCX: 00007f9321f7a479
> > > RDX: 0000000000000000 RSI: 00005590b5100c7c RDI: 0000000000000007
> > > RBP: 0000000000000000 R08: 00007f9322043b20 R09: 0000000000000000
> > > R10: 0000000000000050 R11: 0000000000000246 R12: 0000000000040000
> > > R13: 00005590b5100c7c R14: 00005590b57cefe0 R15: 0000000000000000
> > > </TASK>
> > > Modules linked in: nfsv4 dns_resolver nfs lockd grace ttm_device_test ttm_kunit_helpers drm_kunit_helpers kunit rfkill dm_crypt nhpoly1305_avx2 nhpoly1305 chacha_generic chacha_x86_64 libchacha adiantum libpoly1305 algif_skcipher input_leds joydev hid_generic usbhid hid amdgpu snd_hda_codec_hdmi amd64_edac snd_hda_intel amdxcp mfd_core snd_intel_dspcfg edac_mce_amd gpu_sched snd_hda_codec video snd_hwdep drm_suballoc_helper snd_hda_core i2c_algo_bit drm_ttm_helper snd_pcm wmi_bmof ttm snd_timer evdev drm_exec snd drm_display_helper soundcore kvm_amd k10temp drm_buddy rapl wmi gpio_amdpt gpio_generic button lz4 lz4_compress lz4_decompress zram sg nct6775 nct6775_core hwmon_vid hwmon loop configfs sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 sha1_generic aesni_intel libaes crypto_simd cryptd xhci_pci xhci_hcd ccp usbcore usb_common sunrpc dm_mod pkcs8_key_parser efivarfs
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:__list_add_valid_or_report+0x67/0x9c
> > > Code: c7 c7 26 ff c4 90 48 89 c6 e8 2f 32 ca ff 0f 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 78 ff c4 90 4c 89 c2 e8 13 32 ca ff <0f> 0b 48 39 d7 74 05 4c 39 c7 75 17 48 89 f1 48 89 c2 48 89 fe 48
> > > RSP: 0018:ffffb23b05d27df8 EFLAGS: 00010246
> > > RAX: 0000000000000075 RBX: 0000000000000000 RCX: 0000000000000000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: ffffa0b1a5c034f0 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0b1843b2628
> > > R13: ffffa0b1b7c1f478 R14: ffffffffc0696480 R15: ffffa0b1a5c11000
> > > FS: 00007f9321e6ec40(0000) GS:ffffa0b85f080000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00005592ea51ef40 CR3: 0000000189590000 CR4: 0000000000b50ef0
> > > =============================================================================
> > > BUG task_struct (Tainted: G D N): Poison overwritten
> > > -----------------------------------------------------------------------------
> > >
> > > 0xffffa0b170ab6068-0xffffa0b170ab6068 @offset=24680. First byte 0x6c instead of 0x6b
> > > Slab 0xffffea8944c2ac00 objects=8 used=8 fp=0x0000000000000000 flags=0x4000000000000840(slab|head|zone=1)
> > > Object 0xffffa0b170ab6040 @offset=24640 fp=0x0000000000000000
> > >
> > > Redzone ffffa0b170ab6000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone ffffa0b170ab6010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone ffffa0b170ab6020: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone ffffa0b170ab6030: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Object ffffa0b170ab6040: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object ffffa0b170ab6050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object ffffa0b170ab6060: 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b 6b 6b 6b kkkkkkkklkkkkkkk
> > > Object ffffa0b170ab6070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > [...]
> > > Object ffffa0b170ab6fb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object ffffa0b170ab6fc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
> > > Redzone ffffa0b170ab6fd0: bb bb bb bb bb bb bb bb ........
> > > Padding ffffa0b170ab6fe0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding ffffa0b170ab6ff0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > CPU: 13 PID: 2 Comm: kthreadd Tainted: G D N 6.7.5-Zen3 #1
> > > Hardware name: To Be Filled By O.E.M. B550M Pro4/B550M Pro4, BIOS P3.40 01/18/2024
> > > Call Trace:
> > > <TASK>
> > > dump_stack_lvl+0x37/0x52
> > > check_bytes_and_report+0xa7/0x107
> > > check_object+0x157/0x253
> > > alloc_debug_processing+0x5d/0x111
> > > ___slab_alloc+0x288/0x561
> > > ? copy_process+0x35f/0x2276
> > > ? kthread_is_per_cpu+0x22/0x22
> > > ret_from_fork+0x23/0x35
> > > ? kthread_is_per_cpu+0x22/0x22
> > > ret_from_fork_asm+0x11/0x20
> > > </TASK>
> > > FIX task_struct: Restoring Poison 0xffffa0b170ab6068-0xffffa0b170ab6068=0x6b
> > > FIX task_struct: Marking all objects used
> > >
> > >
> > > The Talos II ppc64 trace looks a bit different:
> > >
> > > [...]
> > > KTAP version 1
> > > 1..1
> > > KTAP version 1
> > > # Subtest: ttm_pool
> > > # module: ttm_pool_test
> > > 1..8
> > > KTAP version 1
> > > # Subtest: ttm_pool_alloc_basic
> > > ok 1 One page
> > > ok 2 More than one page
> > > ok 3 Above the allocation limit
> > > # ttm_pool_alloc_basic: ASSERTION FAILED at drivers/gpu/drm/ttm/tests/ttm_pool_test.c:162
> > > Expected err == 0, but
> > > err == -12 (0xfffffffffffffff4)
> > > not ok 4 One page, with coherent DMA mappings enabled
> > > list_add corruption. prev->next should be next (c00800000cf64fc0), but was 0000000000000000. (prev=c0002000061a4ad0).
> > > ------------[ cut here ]------------
> > > kernel BUG at lib/list_debug.c:32!
> > > Oops: Exception in kernel mode, sig: 5 [#1]
> > > BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
> > > Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
> > > CPU: 29 PID: 934 Comm: kunit_try_catch Tainted: G TN 6.7.5-gentoo-P9 #1
> > > Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> > > NIP: c000000000864744 LR: c000000000864740 CTR: 0000000000000000
> > > REGS: c000200015333a30 TRAP: 0700 Tainted: G TN (6.7.5-gentoo-P9)
> > > MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000222 XER: 00000000
> > > CFAR: c0000000001d5620 IRQMASK: 0
> > > GPR00: 0000000000000000 c000200015333cd0 c0000000011b4700 0000000000000075
> > > GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR12: 0000000000000000 c0002007fa4d5e00 c000000000182548 c0002000066aa1c0
> > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR24: 0000000000000000 c0002000061a4010 c00800000cf64fc0 c0002000061a4020
> > > GPR28: c0002000061a4ad0 c00800000cf64fa8 c00800000cf64fa0 c0002000061a4010
> > > NIP [c000000000864744] __list_add_valid_or_report+0xd4/0x120
> > > LR [c000000000864740] __list_add_valid_or_report+0xd0/0x120
> > > Call Trace:
> > > [c000200015333cd0] [c000000000864740] __list_add_valid_or_report+0xd0/0x120 (unreliable)
> > > [c000200015333d30] [c00800000cf5eed8] ttm_pool_type_init+0xa0/0x120 [ttm]
> > > [c000200015333d80] [c00800000cf5efec] ttm_pool_init+0x94/0x170 [ttm]
> > > [c000200015333de0] [c00800000cc6b324] ttm_pool_alloc_basic+0x9c/0x670 [ttm_pool_test]
> > > [c000200015333ea0] [c00800000bddf7f0] kunit_try_run_case+0xb8/0x220 [kunit]
> > > [c000200015333f60] [c00800000bde27c8] kunit_generic_run_threadfn_adapter+0x30/0x50 [kunit]
> > > [c000200015333f90] [c000000000182670] kthread+0x130/0x140
> > > [c000200015333fe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> > > Code: f8010070 4b970ea9 60000000 0fe00000 7c0802a6 3c62fff1 7d064378 7d244b78 38639600 f8010070 4b970e85 60000000 <0fe00000> 7c0802a6 3c62fff1 7ca62b78
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > note: kunit_try_catch[934] exited with irqs disabled
> > > # ttm_pool_alloc_basic: try timed out
> > > BUG: Unable to handle kernel data access at 0x6b6b6b6b6b6b6b6b
> > > Faulting instruction address: 0xc000000000181ae4
> > > Oops: Kernel access of bad area, sig: 11 [#2]
> > > BE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=32 NUMA PowerNV
> > > Modules linked in: ttm_pool_test ttm_kunit_helpers drm_kunit_helpers kunit snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore cfg80211 rfkill input_leds evdev hid_generic usbhid hid radeon xts xhci_pci ctr xhci_hcd drm_suballoc_helper i2c_algo_bit drm_ttm_helper cbc ttm aes_generic ofpart usbcore libaes powernv_flash drm_display_helper at24 vmx_crypto gf128mul mtd backlight usb_common regmap_i2c opal_prd ibmpowernv lz4 lz4_compress lz4_decompress zram pkcs8_key_parser powernv_cpufreq loop dm_mod configfs
> > > CPU: 17 PID: 921 Comm: modprobe Tainted: G D TN 6.7.5-gentoo-P9 #1
> > > Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> > > NIP: c000000000181ae4 LR: c00800000bde2a54 CTR: c000000000181a80
> > > REGS: c0002000153871b0 TRAP: 0380 Tainted: G D TN (6.7.5-gentoo-P9)
> > > MSR: 900000000280b032 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI> CR: 44422282 XER: 00000000
> > > CFAR: c00800000bde53ec IRQMASK: 0
> > > GPR00: c00800000bde2a54 c000200015387450 c0000000011b4700 c0000000b1e34d00
> > > GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR08: 0000000000000000 0000000000000000 000000006b6b6b6c c00800000bde53d8
> > > GPR12: c000000000181a80 c0002007fa4dd600 0000000020000000 0000000020000000
> > > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR20: 0000000000000002 0000000020000000 c0000000023d78f8 c0000000023d78a8
> > > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR28: c0002000153876c0 6b6b6b6b6b6b6b6b c0000000b1e34d00 c0000000b1e34eb8
> > > NIP [c000000000181ae4] kthread_stop+0x64/0x1c0
> > > LR [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
> > > Call Trace:
> > > [c000200015387450] [c0000000001d5934] vprintk+0x84/0xc0 (unreliable)
> > > [c000200015387490] [c00800000bde2a54] kunit_try_catch_run+0x26c/0x2c0 [kunit]
> > > [c000200015387540] [c00800000bde4f14] kunit_run_case_catch_errors+0x60/0xf0 [kunit]
> > > [c0002000153875a0] [c00800000bddf448] kunit_run_tests+0x560/0x680 [kunit]
> > > [c0002000153878d0] [c00800000bddf614] __kunit_test_suites_init+0xac/0x160 [kunit]
> > > [c000200015387970] [c00800000bde349c] kunit_exec_run_tests+0x44/0xb0 [kunit]
> > > [c0002000153879f0] [c00800000bddecbc] kunit_module_notify+0x4d4/0x590 [kunit]
> > > [c000200015387a90] [c0000000001842f0] notifier_call_chain+0xa0/0x190
> > > [c000200015387b30] [c00000000018480c] blocking_notifier_call_chain+0x5c/0xb0
> > > [c000200015387b70] [c00000000020cf64] do_init_module+0x234/0x330
> > > [c000200015387bf0] [c00000000021054c] init_module_from_file+0x9c/0xf0
> > > [c000200015387cc0] [c000000000210740] sys_finit_module+0x190/0x420
> > > [c000200015387d80] [c00000000002b808] system_call_exception+0x1b8/0x3a0
> > > [c000200015387e50] [c00000000000c270] system_call_vectored_common+0xf0/0x280
> > > --- interrupt: 3000 at 0x3fff9eb3d7c8
> > > NIP: 00003fff9eb3d7c8 LR: 0000000000000000 CTR: 0000000000000000
> > > REGS: c000200015387e80 TRAP: 3000 Tainted: G D TN (6.7.5-gentoo-P9)
> > > MSR: 900000000280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 48422244 XER: 00000000
> > > IRQMASK: 0
> > > GPR00: 0000000000000161 00003fffc80d3ab0 00003fff9ec37100 0000000000000007
> > > GPR04: 0000000134f6df90 0000000000000000 000000000000001f 0000000000000045
> > > GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > GPR12: 0000000000000000 00003fff9ef7fbe0 0000000020000000 0000000020000000
> > > GPR16: 0000000000000000 0000000000000000 0000000000000020 0000000020000000
> > > GPR20: 0000000161994850 0000000020000000 0000000000000000 0000000000000000
> > > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000161993f90
> > > GPR28: 0000000134f6df90 0000000000040000 0000000000000000 0000000161993cc0
> > > NIP [00003fff9eb3d7c8] 0x3fff9eb3d7c8
> > > LR [0000000000000000] 0x0
> > > --- interrupt: 3000
> > > Code: 40c2fff4 2c090000 41820164 39490001 7d494b78 2c090000 418000f4 813e01a8 6d290020 79295fe2 0b090000 ebbe0738 <7d20e8a8> 61290002 7d20e9ad 40c2fff4
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > note: modprobe[921] exited with irqs disabled
> > > =============================================================================
> > > BUG task_struct (Tainted: G D TN): Poison overwritten
> > > -----------------------------------------------------------------------------
> > >
> > > 0xc0000000b1e34ebb-0xc0000000b1e34ebb @offset=20155. First byte 0x6c instead of 0x6b
> > > Slab 0xc00c000002c78c00 objects=5 used=4 fp=0xc0000000b1e33380 flags=0x7ffc0000000840(slab|head|node=0|zone=0|lastcpupid=0x1fff)
> > > Object 0xc0000000b1e34d00 @offset=19712 fp=0xc0000000b1e33380
> > >
> > > Redzone c0000000b1e34c80: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34c90: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34ca0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cb0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cc0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cd0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34ce0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Redzone c0000000b1e34cf0: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
> > > Object c0000000b1e34d00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34d90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34da0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34db0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34dc0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34dd0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34de0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34df0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e00: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e10: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e20: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e30: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e40: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e50: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e60: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e70: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e80: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34e90: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34ea0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Object c0000000b1e34eb0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6c 6b 6b 6b 6b kkkkkkkkkkklkkkk
> > > Object c0000000b1e34ec0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > [...]
> > > Object c0000000b1e35cf0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
> > > Redzone c0000000b1e36580: bb bb bb bb bb bb bb bb ........
> > > Padding c0000000b1e36590: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365a0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365b0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365c0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365d0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365e0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > Padding c0000000b1e365f0: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
> > > CPU: 28 PID: 2 Comm: kthreadd Tainted: G D TN 6.7.5-gentoo-P9 #1
> > > Hardware name: T2P9D01 REV 1.01 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
> > > Call Trace:
> > > [c00000000593b890] [c000000000e8ecf8] dump_stack_lvl+0x6c/0xb0 (unreliable)
> > > [c00000000593b8c0] [c00000000041dad0] print_trailer+0x1e0/0x22c
> > > [c00000000593b940] [c0000000004155f4] check_bytes_and_report+0x224/0x240
> > > [c00000000593b9f0] [c00000000041596c] check_object+0x35c/0x4a0
> > > [c00000000593ba40] [c0000000004168dc] alloc_debug_processing+0xdc/0x270
> > > [c00000000593bac0] [c000000000416c8c] get_partial_node.part.0+0x21c/0x460
> > > [c00000000593bb80] [c000000000417148] ___slab_alloc+0x278/0xb20
> > > [c00000000593bc90] [c000000000417b3c] kmem_cache_alloc_node+0x14c/0x630
> > > [c00000000593bd20] [c000000000140618] copy_process+0x408/0x3270
> > > [c00000000593be00] [c0000000001435f4] kernel_clone+0xc4/0x5b0
> > > [c00000000593be80] [c000000000143dc4] kernel_thread+0x84/0xc0
> > > [c00000000593bf40] [c0000000001829bc] kthreadd+0x1ec/0x290
> > > [c00000000593bfe0] [c00000000000d030] start_kernel_thread+0x14/0x18
> > > FIX task_struct: Restoring Poison 0xc0000000b1e34ebb-0xc0000000b1e34ebb=0x6b
> > > FIX task_struct: Marking all objects used
> > >
> > >
> > > Full dmesg and kernel .config of both machines attached.
> > >
> > > Regards,
> > > Erhard
> > > [ 0.000000] Linux version 6.7.5-Zen3 (root@...ah) (gcc (Gentoo 13.2.1_p20240113-r1 p12) 13.2.1 20240113, GNU ld (Gentoo 2.41 p5) 2.41.0) #1 SMP Mon Feb 19 12:44:46 -00 2024
> > Is it vanilla kernel (i.e. no patches applied)? Can you also check current
> > mainline (v6.8-rc5)?
> >
> > Confused...
>
> Oh, that is most likely kind of expected behavior.
>
> This kunit test is not meant to be run on real hardware, but rather just as
> stand a long kunit tests within user mode linux. I was assuming that it
> doesn't even compiles on bare metal.
>
> We should probably either double check the kconfig options to prevent
> compiling it or modify the test so that it can run on real hardware as well.
I think any cross-compiled kunit run will be impossible to differentiate
from running on real hardware. We should just make it work there.
Maxime
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists