lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 7 Jul 2023 01:01:44 +0000
From:   "Chen, Guchun" <Guchun.Chen@....com>
To:     Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>,
        amd-gfx list <amd-gfx@...ts.freedesktop.org>,
        "Koenig, Christian" <Christian.Koenig@....com>,
        "Deucher, Alexander" <Alexander.Deucher@....com>,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: RE: [regression][6.5] KASAN: slab-out-of-bounds in
 amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX

[Public]

Hi Mike,

Yes, we are aware of this problem, and we are working on that. The problem is caused by recent code stores xcp_id to amdgpu bo for accounting memory usage and so on. However, not all VMs are attached to that like the case in amdgpu_mes_self_test.

Regards,
Guchun

> -----Original Message-----
> From: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
> Sent: Friday, July 7, 2023 5:34 AM
> To: amd-gfx list <amd-gfx@...ts.freedesktop.org>; Koenig, Christian
> <Christian.Koenig@....com>; Deucher, Alexander
> <Alexander.Deucher@....com>; Chen, Guchun <Guchun.Chen@....com>;
> Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
> Subject: [regression][6.5] KASAN: slab-out-of-bounds in
> amdgpu_vm_pt_create+0x555/0x670 [amdgpu] on Radeon 7900XTX
>
> Hi,
> On Radeon 7900XTX appeared issue "slab-out-of-bounds in
> amdgpu_vm_pt_create+0x555/0x670" between commits 3a8a670eeeaa and
> e55e5df193d2.
> Graphics cards with chips 6800M and 6900XT are unaffected.
>
> [   12.562762]
> ================================================================
> ==
> [   12.562775] BUG: KASAN: slab-out-of-bounds in
> amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563173] Read of size 4 at addr ffff8881347a8dc8 by task (udev-
> worker)/660
>
> [   12.563183] CPU: 0 PID: 660 Comm: (udev-worker) Tainted: G        W
>    L    -------  ---
> 6.5.0-0.rc0.20230630gite55e5df193d2.5.fc39.x86_64+debug #1
> [   12.563192] Hardware name: Micro-Star International Co., Ltd.
> MS-7D73/MPG B650I EDGE WIFI (MS-7D73), BIOS 1.30 05/24/2023
> [   12.563199] Call Trace:
> [   12.563203]  <TASK>
> [   12.563206]  dump_stack_lvl+0x76/0xd0
> [   12.563213]  print_report+0xcf/0x670
> [   12.563220]  ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563433]  kasan_report+0xa6/0xe0
> [   12.563436]  ? amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563637]  amdgpu_vm_pt_create+0x555/0x670 [amdgpu]
> [   12.563835]  ? __pfx_amdgpu_vm_pt_create+0x10/0x10 [amdgpu]
> [   12.564030]  ? __module_address+0x95/0x240
> [   12.564035]  ? lockdep_init_map_type+0x1a5/0x840
> [   12.564040]  ? __raw_spin_lock_init+0x3f/0x110
> [   12.564044]  amdgpu_vm_init+0x749/0x10c0 [amdgpu]
> [   12.564240]  ? __pfx_amdgpu_vm_init+0x10/0x10 [amdgpu]
> [   12.564441]  amdgpu_mes_self_test+0x16e/0x9e0 [amdgpu]
> [   12.564661]  ? lock_acquire+0x1a6/0x4f0
> [   12.564664]  ? __pfx_amdgpu_mes_self_test+0x10/0x10 [amdgpu]
> [   12.564871]  ? local_clock_noinstr+0xd/0xc0
> [   12.564876]  ? find_held_lock+0x34/0x120
> [   12.564882]  ? _raw_spin_unlock_irqrestore+0x4f/0x80
> [   12.564886]  ? amdgpu_irq_update+0x1b2/0x2c0 [amdgpu]
> [   12.565094]  mes_v11_0_late_init+0xb8/0xe0 [amdgpu]
> [   12.565304]  amdgpu_device_ip_late_init+0x100/0x7b0 [amdgpu]
> [   12.565509]  amdgpu_device_init+0x7569/0x8660 [amdgpu]
> [   12.565721]  ? __pfx_amdgpu_device_init+0x10/0x10 [amdgpu]
> [   12.565920]  ? __pfx_pci_bus_read_config_word+0x10/0x10
> [   12.565925]  ? do_pci_enable_device+0x22d/0x2a0
> [   12.565928]  ? pci_wait_for_pending+0xa1/0x110
> [   12.565933]  amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu]
> [   12.566131]  amdgpu_pci_probe+0x287/0x9e0 [amdgpu]
> [   12.566337]  ? __pfx_amdgpu_pci_probe+0x10/0x10 [amdgpu]
> [   12.566536]  local_pci_probe+0xda/0x190
> [   12.566540]  pci_device_probe+0x23a/0x770
> [   12.566544]  ? kernfs_add_one+0x326/0x490
> [   12.566548]  ? kernfs_get.part.0+0x4c/0x70
> [   12.566552]  ? __pfx_pci_device_probe+0x10/0x10
> [   12.566555]  ? kernfs_create_link+0x16b/0x230
> [   12.566559]  ? kernfs_put+0x1c/0x40
> [   12.566562]  ? sysfs_do_create_link_sd+0x8e/0x100
> [   12.566566]  really_probe+0x3df/0xb80
> [   12.566570]  __driver_probe_device+0x18c/0x450
> [   12.566573]  driver_probe_device+0x4a/0x120
> [   12.566576]  __driver_attach+0x1e5/0x4a0
> [   12.566579]  ? __pfx___driver_attach+0x10/0x10
> [   12.566582]  bus_for_each_dev+0x106/0x190
> [   12.566586]  ? __pfx_bus_for_each_dev+0x10/0x10
> [   12.566591]  bus_add_driver+0x2a1/0x570
> [   12.566594]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
> [   12.566794]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
> [   12.566993]  driver_register+0x134/0x460
> [   12.566996]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
> [   12.567193]  do_one_initcall+0xd2/0x430
> [   12.567197]  ? __pfx_do_one_initcall+0x10/0x10
> [   12.567202]  ? kasan_unpoison+0x44/0x70
> [   12.567206]  do_init_module+0x238/0x770
> [   12.567210]  load_module+0x5581/0x6f10
> [   12.567216]  ? __pfx_load_module+0x10/0x10
> [   12.567220]  ? find_held_lock+0x34/0x120
> [   12.567223]  ? local_clock_noinstr+0xd/0xc0
> [   12.567227]  ? __pfx___might_resched+0x10/0x10
> [   12.567232]  ? __do_sys_init_module+0x1f2/0x220
> [   12.567235]  __do_sys_init_module+0x1f2/0x220
> [   12.567238]  ? __pfx___do_sys_init_module+0x10/0x10
> [   12.567243]  do_syscall_64+0x5d/0x90
> [   12.567247]  ? asm_exc_page_fault+0x26/0x30
> [   12.567251]  ? lockdep_hardirqs_on+0x81/0x110
> [   12.567255]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [   12.567258] RIP: 0033:0x7fdb4e92b5de
> [   12.567267] Code: 48 8b 0d 55 08 12 00 f7 d8 64 89 01 48 83 c8 ff
> c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 22 08 12 00 f7 d8 64 89
> 01 48
> [   12.567274] RSP: 002b:00007ffe9ef35008 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000af
> [   12.567279] RAX: ffffffffffffffda RBX: 000055d8c8acb440 RCX:
> 00007fdb4e92b5de
> [   12.567282] RDX: 000055d8c8af3840 RSI: 0000000003c829ee RDI:
> 00007fdb46c16010
> [   12.567285] RBP: 00007ffe9ef350c0 R08: 000055d8c8ad5bd0 R09:
> ffffffdcab967160
> [   12.567289] R10: 000055dd95219e95 R11: 0000000000000246 R12:
> 000055d8c8af3840
> [   12.567292] R13: 0000000000020000 R14: 000055d8c8af0d30 R15:
> 000055d8c8af2740
> [   12.567297]  </TASK>
>
> [   12.567300] Allocated by task 660:
> [   12.567302]  kasan_save_stack+0x33/0x60
> [   12.567306]  kasan_set_track+0x25/0x30
> [   12.567309]  __kasan_kmalloc+0x8f/0xa0
> [   12.567312]  amdgpu_mes_self_test+0x157/0x9e0 [amdgpu]
> [   12.567529]  mes_v11_0_late_init+0xb8/0xe0 [amdgpu]
> [   12.567738]  amdgpu_device_ip_late_init+0x100/0x7b0 [amdgpu]
> [   12.567942]  amdgpu_device_init+0x7569/0x8660 [amdgpu]
> [   12.568142]  amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu]
> [   12.568343]  amdgpu_pci_probe+0x287/0x9e0 [amdgpu]
> [   12.568543]  local_pci_probe+0xda/0x190
> [   12.568546]  pci_device_probe+0x23a/0x770
> [   12.568550]  really_probe+0x3df/0xb80
> [   12.568552]  __driver_probe_device+0x18c/0x450
> [   12.568555]  driver_probe_device+0x4a/0x120
> [   12.568557]  __driver_attach+0x1e5/0x4a0
> [   12.568560]  bus_for_each_dev+0x106/0x190
> [   12.568563]  bus_add_driver+0x2a1/0x570
> [   12.568566]  driver_register+0x134/0x460
> [   12.568569]  do_one_initcall+0xd2/0x430
> [   12.568572]  do_init_module+0x238/0x770
> [   12.568574]  load_module+0x5581/0x6f10
> [   12.568577]  __do_sys_init_module+0x1f2/0x220
> [   12.568580]  do_syscall_64+0x5d/0x90
> [   12.568582]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
>
> [   12.568587] The buggy address belongs to the object at ffff8881347a8000
>                 which belongs to the cache kmalloc-4k of size 4096
> [   12.568593] The buggy address is located 608 bytes to the right of
>                 allocated 2920-byte region [ffff8881347a8000, ffff8881347a8b68)
>
> [   12.568600] The buggy address belongs to the physical page:
> [   12.568602] page:000000001bdef670 refcount:1 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0x1347a8
> [   12.568607] head:000000001bdef670 order:3 entire_mapcount:0
> nr_pages_mapped:0 pincount:0
> [   12.568611] flags:
> 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> [   12.568616] page_type: 0xffffffff()
> [   12.568619] raw: 0017ffffc0010200 ffff88810004d040 dead000000000122
> 0000000000000000
> [   12.568622] raw: 0000000000000000 0000000080040004 00000001ffffffff
> 0000000000000000
> [   12.568626] page dumped because: kasan: bad access detected
>
> [   12.568630] Memory state around the buggy address:
> [   12.568632]  ffff8881347a8c80: fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc
> [   12.568635]  ffff8881347a8d00: fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc
> [   12.568639] >ffff8881347a8d80: fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc
> [   12.568642]                                               ^
> [   12.568644]  ffff8881347a8e00: fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc
> [   12.568648]  ffff8881347a8e80: fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc fc fc
> [   12.568651]
> ================================================================
> ==
>
> I spended 6 day for bisecting this issue.
> But result it turned out not satisfact due to the fact on most commits the
> video card did not switch to graphics mode, and instead of "slab-out-of-
> bounds in amdgpu_vm_pt_create+0x555/0x670" I got error
> "KASAN: null-ptr-deref in range [0x00000000000003f0-
> 0x00000000000003f7]" because of this, all these commits were marked as
> "skip".
>
> The bisect results can be found in the attached file "bisect-log-slab-out-of-
> bounds-in-amdgpu _vm_pt_create.txt" all corresponding kernel logs of each
> bisect step packed in archive "dmesg-slab-out-of-bounds-in-
> amdgpu_vm_pt_create.zip".
>
> How else can I help here?
>
> --
> Best Regards,
> Mike Gavrilov.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ