lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2b7fe203-82f5-2726-cd64-01c7421560d3@amd.com>
Date:   Wed, 15 Mar 2023 20:51:49 +0100
From:   Christian König <christian.koenig@....com>
To:     Matthew Auld <matthew.william.auld@...il.com>,
        Christian König <ckoenig.leichtzumerken@...il.com>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@....com>,
        intel-gfx@...ts.freedesktop.org,
        LKML <linux-kernel@...r.kernel.org>,
        dri-devel@...ts.freedesktop.org, amd-gfx@...ts.freedesktop.org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Felix Kuehling <Felix.Kuehling@....com>,
        linux-media@...r.kernel.org
Subject: Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()



Am 15.03.23 um 20:15 schrieb Matthew Auld:
> On Wed, 15 Mar 2023 at 18:41, Christian König
> <ckoenig.leichtzumerken@...il.com> wrote:
>> Am 08.03.23 um 13:43 schrieb Steven Rostedt:
>>> On Wed, 8 Mar 2023 07:17:38 +0100
>>> Christian König <christian.koenig@....com> wrote:
>>>
>>>> What test case/environment do you run to trigger this?
>>> I'm running a 32bit x86 qemu instance. Attached is the config.
>>>
>>> The libvirt xml file is here: https://rostedt.org/vm-images/tracetest-32.xml
>>> and the VM image itself is here: https://rostedt.org/vm-images/tracetest-32.qcow2.bz2
>> I've started to download that, but it will take about an hour. So I
>> tried to avoid that for now.
>>
>> But looks like there isn't any other way to reproduce this, the code
>> seems to work with both amdgpu and radeon.
>>
>> My suspicion is that we just have a reference count issue in qxl or ttm
>> which was never noticed because it didn't caused any problems (except
>> for a minor memory corruption).
> Why does ttm_bo_cleanup_refs() do a bo_put() at the end?

Yeah, that's the bug. I own you a beer!

In the old model we had an extra reference while the BOs where on the 
deleted list and I've forgot to remove this put here.

Steven please try the attached patch.

Thanks,
Christian.


>   It doesn't
> make sense to me. Say if the BO is in the process of being delay freed
> (bo->deleted = true), and we just did the kref_init() in
> ttm_bo_release(), it might drop that ref hitting ttm_bo_release() yet
> again, this time doing the actual bo->destroy(), which frees the
> object. The worker then fires at some later point calling
> ttm_bo_delayed_delete(), but the BO has already been freed.
>
>> Now you get a rain of warnings because we try to grab the lock in the
>> delete worker.
>>
>> Christian.
>>
>>> It happened again in another test (it's not 100% reproducible).
>>>
>>> [   23.234838] ------------[ cut here ]------------
>>> [   23.236391] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
>>> [   23.236429] WARNING: CPU: 0 PID: 61 at kernel/locking/mutex.c:582 __ww_mutex_lock.constprop.0+0x566/0xfec
>>> [   23.240990] Modules linked in:
>>> [   23.242368] CPU: 0 PID: 61 Comm: kworker/0:1H Not tainted 6.3.0-rc1-test-00001-ga98bd42762ed-dirty #972
>>> [   23.245106] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
>>> [   23.247900] Workqueue: ttm ttm_bo_delayed_delete
>>> [   23.249642] EIP: __ww_mutex_lock.constprop.0+0x566/0xfec
>>> [   23.251563] Code: e8 2b 5a 95 ff 85 c0 0f 84 25 fb ff ff 8b 0d 18 71 3b c8 85 c9 0f 85 17 fb ff ff 68 c0 58 07 c8 68 07 77 05 c8 e8 e6 0a 40 ff <0f> 0b 58 5a e9 ff fa ff ff e8 f8 59 95 ff 85 c0 74 0e 8b 0d 18 71
>>> [   23.256901] EAX: 00000028 EBX: 00000000 ECX: c1847dd8 EDX: 00000002
>>> [   23.258849] ESI: 00000000 EDI: c12958bc EBP: c1847f00 ESP: c1847eac
>>> [   23.260786] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
>>> [   23.262840] CR0: 80050033 CR2: ffbff000 CR3: 0850e000 CR4: 00150ef0
>>> [   23.264781] Call Trace:
>>> [   23.265899]  ? lock_is_held_type+0xbe/0x10c
>>> [   23.267434]  ? ttm_bo_delayed_delete+0x30/0x94
>>> [   23.268971]  ww_mutex_lock+0x32/0x94
>>> [   23.270327]  ttm_bo_delayed_delete+0x30/0x94
>>> [   23.271818]  process_one_work+0x21a/0x538
>>> [   23.273242]  worker_thread+0x146/0x398
>>> [   23.274616]  kthread+0xea/0x10c
>>> [   23.275859]  ? process_one_work+0x538/0x538
>>> [   23.277312]  ? kthread_complete_and_exit+0x1c/0x1c
>>> [   23.278899]  ret_from_fork+0x1c/0x28
>>> [   23.280223] irq event stamp: 33
>>> [   23.281440] hardirqs last  enabled at (33): [<c7d28089>] _raw_spin_unlock_irqrestore+0x2d/0x58
>>> [   23.283860] hardirqs last disabled at (32): [<c71d4ba5>] kvfree_call_rcu+0x155/0x2ec
>>> [   23.286066] softirqs last  enabled at (0): [<c71217fd>] copy_process+0x989/0x2368
>>> [   23.288220] softirqs last disabled at (0): [<00000000>] 0x0
>>> [   23.289952] ---[ end trace 0000000000000000 ]---
>>> [   23.291501] ------------[ cut here ]------------
>>> [   23.293027] refcount_t: underflow; use-after-free.
>>> [   23.294644] WARNING: CPU: 0 PID: 61 at lib/refcount.c:28 refcount_warn_saturate+0xb6/0xfc
>>> [   23.296959] Modules linked in:
>>> [   23.298168] CPU: 0 PID: 61 Comm: kworker/0:1H Tainted: G        W          6.3.0-rc1-test-00001-ga98bd42762ed-dirty #972
>>> [   23.301073] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
>>> [   23.303642] Workqueue: ttm ttm_bo_delayed_delete
>>> [   23.305190] EIP: refcount_warn_saturate+0xb6/0xfc
>>> [   23.306767] Code: 68 70 e1 0c c8 e8 f6 d6 a9 ff 0f 0b 58 c9 c3 90 80 3d 8a 78 38 c8 00 75 8a c6 05 8a 78 38 c8 01 68 9c e1 0c c8 e8 d6 d6 a9 ff <0f> 0b 59 c9 c3 80 3d 88 78 38 c8 00 0f 85 67 ff ff ff c6 05 88 78
>>> [   23.311935] EAX: 00000026 EBX: c1295950 ECX: c1847e40 EDX: 00000002
>>> [   23.313884] ESI: c12958bc EDI: f7591100 EBP: c1847f18 ESP: c1847f14
>>> [   23.315840] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
>>> [   23.317887] CR0: 80050033 CR2: ffbff000 CR3: 0850e000 CR4: 00150ef0
>>> [   23.319859] Call Trace:
>>> [   23.320978]  ttm_bo_delayed_delete+0x8c/0x94
>>> [   23.322492]  process_one_work+0x21a/0x538
>>> [   23.323959]  worker_thread+0x146/0x398
>>> [   23.325353]  kthread+0xea/0x10c
>>> [   23.326609]  ? process_one_work+0x538/0x538
>>> [   23.328081]  ? kthread_complete_and_exit+0x1c/0x1c
>>> [   23.329683]  ret_from_fork+0x1c/0x28
>>> [   23.331011] irq event stamp: 33
>>> [   23.332251] hardirqs last  enabled at (33): [<c7d28089>] _raw_spin_unlock_irqrestore+0x2d/0x58
>>> [   23.334334] hardirqs last disabled at (32): [<c71d4ba5>] kvfree_call_rcu+0x155/0x2ec
>>> [   23.336176] softirqs last  enabled at (0): [<c71217fd>] copy_process+0x989/0x2368
>>> [   23.337904] softirqs last disabled at (0): [<00000000>] 0x0
>>> [   23.339313] ---[ end trace 0000000000000000 ]---
>>>
>>> -- Steve

View attachment "0001-drm-ttm-drop-extra-ttm_bo_put-in-ttm_bo_cleanup_refs.patch" of type "text/x-patch" (982 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ