lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 18 Feb 2013 09:37:20 +0800 From: Fengguang Wu <fengguang.wu@...el.com> To: Ben Skeggs <bskeggs@...hat.com> Cc: Maarten Lankhorst <maarten.lankhorst@...onical.com>, linux-kernel@...r.kernel.org Subject: WARNING: at kernel/mutex.c:386 __mutex_lock_common() Greetings, I got the below warning and the first bad commit is commit 11eb5a2ce9655ee552f705f2da6b8f96f466c0d4 Author: Maarten Lankhorst <maarten.lankhorst@...onical.com> Date: Wed Nov 7 11:53:51 2012 +0100 mutex: add support for reservation style locks GPU's do operations that commonly involve many buffers. Those buffers can be shared across contexts/processes, exist in different memory domains (for example VRAM vs system memory), and so on. And with PRIME / dmabuf, they can even be shared across devices. So there are a handful of situations where the driver needs to wait for buffers to become ready. If you think about this in terms of waiting on a buffer mutex for it to become available, this presents a problem because there is no way to guarantee that buffers appear in a execbuf/batch in the same order in all contexts. That is directly under control of userspace, and a result of the sequence of GL calls that an application makes. Which results in the potential for deadlock. The problem gets more complex when you consider that the kernel may need to migrate the buffer(s) into VRAM before the GPU operates on the buffer(s), which may in turn require evicting some other buffers (and you don't want to evict other buffers which are already queued up to the GPU), but for a simplified understanding of the problem you can ignore this. The algorithm that TTM came up with for dealing with this problem is quite simple. For each group of buffers (execbuf) that need to be locked, the caller would be assigned a unique reservation_id, from a global counter. In case of deadlock in the process of locking all the buffers associated with a execbuf, the one with the lowest reservation_id wins, and the one with the higher reservation_id unlocks all of the buffers that it has already locked, and then tries again. How it is used: --------------- A very simplified version: int lock_execbuf(execbuf) { struct buf *res_buf = NULL; /* acquiring locks, before queuing up to GPU: */ seqno = assign_global_seqno(); retry: for (buf in execbuf->buffers) { if (buf == res_buf) { res_buf = NULL; continue; } ret = mutex_reserve_lock(&buf->lock, seqno); if (ret < 0) goto err; } /* now everything is good to go, submit job to GPU: */ ... return 0; err: for (all buf2 before buf in execbuf->buffers) mutex_unreserve_unlock(&buf2->lock); if (res_buf) mutex_unreserve_unlock(&res_buf->lock); if (ret == -EAGAIN) { /* we lost out in a seqno race, lock and retry.. */ mutex_reserve_lock_slow(&buf->lock, seqno); res_buf = buf; goto retry; } return ret; } int unlock_execbuf(execbuf) { /* when GPU is finished; */ for (buf in execbuf->buffers) mutex_unreserve_unlock(&buf->lock); } Functions: ---------- mutex_reserve_lock, and mutex_reserve_lock_interruptible: Lock a buffer with a reservation_id set. reservation_id must not be set to 0, since this is a special value that means no reservation_id. Normally if reservation_id is not set, or is older than the reservation_id that's currently set on the mutex, the behavior will be to wait normally. However, if the reservation_id is newer than the current reservation_id, -EAGAIN will be returned. These functions will return -EDEADLK instead of -EAGAIN if reservation_id is the same as the reservation_id that's attempted to lock the mutex with, since in that case you presumably attempted to lock the same lock twice. mutex_reserve_lock_slow and mutex_reserve_lock_intr_slow: Similar to mutex_reserve_lock, except it won't backoff with -EAGAIN. This is useful when mutex_reserve_lock failed with -EAGAIN, and you unreserved all buffers so no deadlock can occur. mutex_unreserve_unlock: Unlock a buffer reserved with one of the mutex_reserve_*lock* calls. Missing at the moment, maybe TODO? * Check if lockdep warns if you unlock a lock that other locks were nested to. - spin_lock(m); spin_lock_nest_lock(a, m); spin_unlock(m); spin_unlock(a); It would be nice if this would give a splat on spin_unlock(m), I'm not 100% sure if it does right now, though.. * In the *_slow calls, maybe add a check to ensure no other locks of the same lockdep class are nested in the lock we have to nest to? - This is making sure that mutex_unreserve_unlock have been called on all other locks. Design: I chose for ticket_mutex to encapsulate struct mutex, so the extra memory usage and atomic set on init will only happen when you deliberately create a ticket lock. Since the mutexes are mostly meant to protect buffer object serialization in ttm, not much contention is expected. I could be slightly smarter with wakeups, but this would be at the expense at adding a field to struct mutex_waiter. This would add overhead to all cases where normal mutexes are used, and ticket_mutexes are less performance sensitive anyway since they only protect buffer objects. As a result I chose not to do this. I needed this in kernel/mutex.c because of the extensions to __lock_common, which are hopefully optimized away for all normal paths. It is not illegal to use mutex_lock and mutex_unlock on ticket mutexes. This will work, as long you don't mix lock/unlock calls. This is useful if you want to lock only a single buffer. All the mutex_reserve calls are nested into another lock. The idea is that the seqno ticket you use functions as a lockdep class you have locked too. This will prevent lockdep false positives on locking 2 objects of the same class. It's allowed because they're nested to the seqno ticket. There are extensive tests for this in the patch that introduces locking tests for reservation objects and reservation tickets. Changes since RFC patch v1: - Updated to use atomic_long instead of atomic, since the reservation_id was a long. - added mutex_reserve_lock_slow and mutex_reserve_lock_intr_slow - removed mutex_locked_set_reservation_id (or w/e it was called) Changes since RFC patch v2: - remove use of __mutex_lock_retval_arg, add warnings when using wrong combination of mutex_(,reserve_)lock/unlock. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@...onical.com> [ 0.000000] [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at /c/kernel-tests/src/tip/kernel/mutex.c:386 __mutex_lock_common+0x5a9/0x870() [ 0.000000] Hardware name: Bochs [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper/0 Not tainted 3.8.0-rc7-00071-g11eb5a2 #180 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81047102>] warn_slowpath_common+0xb2/0x120 [ 0.000000] [<ffffffff81047195>] warn_slowpath_null+0x25/0x30 [ 0.000000] [<ffffffff814ebdd9>] __mutex_lock_common+0x5a9/0x870 [ 0.000000] [<ffffffff81a66ef8>] ? rcu_cpu_notify+0xa8/0x451 [ 0.000000] [<ffffffff810bcb8f>] ? trace_hardirqs_off_caller+0xaf/0x120 [ 0.000000] [<ffffffff81a66ef8>] ? rcu_cpu_notify+0xa8/0x451 [ 0.000000] [<ffffffff810be38c>] ? lockdep_init_map+0xfc/0x230 [ 0.000000] [<ffffffff814ec621>] mutex_lock_nested+0x61/0x80 [ 0.000000] [<ffffffff810bcc1d>] ? trace_hardirqs_off+0x1d/0x30 [ 0.000000] [<ffffffff81a66ef8>] rcu_cpu_notify+0xa8/0x451 [ 0.000000] [<ffffffff814ecc31>] ? mutex_unlock+0x11/0x20 [ 0.000000] [<ffffffff81a3f952>] rcu_init+0x3b3/0x408 [ 0.000000] [<ffffffff81a21e0c>] start_kernel+0x34a/0x744 [ 0.000000] [<ffffffff81a216e6>] ? repair_env_string+0x81/0x81 [ 0.000000] [<ffffffff81a21120>] ? early_idt_handlers+0x120/0x120 [ 0.000000] [<ffffffff81a212fd>] x86_64_start_reservations+0x185/0x190 [ 0.000000] [<ffffffff81a214a8>] x86_64_start_kernel+0x1a0/0x1b6 [ 0.000000] ---[ end trace 8e966724b1809892 ]--- git bisect start 11eb5a2ce9655ee552f705f2da6b8f96f466c0d4 836dc9e3fbbab0c30aa6e664417225f5c1fb1c39 -- git bisect good 569104e21addbec215daa2d1d2c7b2a274ebcc27 # 1000 2013-02-15 00:30:51 drm/nouveau: share fence structures between nv10+ and nv50 implementations git bisect good dacf4c29d7e3b2e0903fdef6da8296fe51132dca # 1000 2013-02-15 02:49:28 drm/nouveau/fifo/nvc0: improve interrupt handler somewhat git bisect good 51fb70f904cf004272bd0a729959e4da76cb50dc # 1000 2013-02-15 05:08:37 drm/nouveau/gpio: pass number of on-die gpio lines to base git bisect good 38bc1653a04eeea372a4249ef58d554ce38bd0f5 # 1000 2013-02-15 07:27:46 drm/nv84-/fence: abstract class emit/sync functions to virt+sequence git bisect good ef80d98bdcaebca98ec9790ee584f903557ca149 # 1000 2013-02-15 09:47:41 drm/nouveau/fence: make internal hooks part of the context git bisect good 5fdd378c20bb56824d57905591f1ec93a02513b9 # 1000 2013-02-15 12:06:18 arch: make __mutex_fastpath_lock_retval return whether fastpath succeeded or not. git bisect good 5fdd378c20bb56824d57905591f1ec93a02513b9 # 3001 2013-02-15 18:58:19 arch: make __mutex_fastpath_lock_retval return whether fastpath succeeded or not. git bisect good 075198d07b7b64b50afb54853dc390ded9962ee4 # 3000 2013-02-16 02:12:51 Revert "mutex: add support for reservation style locks" git bisect good be03210f1bbdde34b397e86c115511667210fe98 # 3000 2013-02-18 05:35:58 Add linux-next specific files for 20130215 Thanks, Fengguang View attachment "dmesg-kvm-xbm-3830-2013-02-14-19-00-45-3.8.0-rc7-00071-g11eb5a2-180" of type "text/plain" (29906 bytes) View attachment "11eb5a2-bisect.log" of type "text/plain" (30335 bytes) View attachment ".config-bisect" of type "text/plain" (81978 bytes)
Powered by blists - more mailing lists