lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250513100730.12664-1-byungchul@sk.com>
Date: Tue, 13 May 2025 19:06:47 +0900
From: Byungchul Park <byungchul@...com>
To: linux-kernel@...r.kernel.org
Cc: kernel_team@...ynix.com,
	torvalds@...ux-foundation.org,
	damien.lemoal@...nsource.wdc.com,
	linux-ide@...r.kernel.org,
	adilger.kernel@...ger.ca,
	linux-ext4@...r.kernel.org,
	mingo@...hat.com,
	peterz@...radead.org,
	will@...nel.org,
	tglx@...utronix.de,
	rostedt@...dmis.org,
	joel@...lfernandes.org,
	sashal@...nel.org,
	daniel.vetter@...ll.ch,
	duyuyang@...il.com,
	johannes.berg@...el.com,
	tj@...nel.org,
	tytso@....edu,
	willy@...radead.org,
	david@...morbit.com,
	amir73il@...il.com,
	gregkh@...uxfoundation.org,
	kernel-team@....com,
	linux-mm@...ck.org,
	akpm@...ux-foundation.org,
	mhocko@...nel.org,
	minchan@...nel.org,
	hannes@...xchg.org,
	vdavydov.dev@...il.com,
	sj@...nel.org,
	jglisse@...hat.com,
	dennis@...nel.org,
	cl@...ux.com,
	penberg@...nel.org,
	rientjes@...gle.com,
	vbabka@...e.cz,
	ngupta@...are.org,
	linux-block@...r.kernel.org,
	josef@...icpanda.com,
	linux-fsdevel@...r.kernel.org,
	jack@...e.cz,
	jlayton@...nel.org,
	dan.j.williams@...el.com,
	hch@...radead.org,
	djwong@...nel.org,
	dri-devel@...ts.freedesktop.org,
	rodrigosiqueiramelo@...il.com,
	melissa.srw@...il.com,
	hamohammed.sa@...il.com,
	harry.yoo@...cle.com,
	chris.p.wilson@...el.com,
	gwan-gyeong.mun@...el.com,
	max.byungchul.park@...il.com,
	boqun.feng@...il.com,
	longman@...hat.com,
	yskelg@...il.com,
	yunseong.kim@...csson.com,
	yeoreum.yun@....com,
	netdev@...r.kernel.org,
	matthew.brost@...el.com,
	her0gyugyu@...il.com
Subject: [PATCH v15 00/43] DEPT(DEPendency Tracker)

I'm happy to see that dept reported a real problem in practice. See:

   https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
   https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/

I added a document describing dept, that would help you understand what
dept is and how dept works. You can use dept just with CONFIG_DEPT on
and by checking dmesg in runtime.

There are still false positives here and there and some of those are
already in progress to suppress and the efforts are essencial until it
gets more stable as lockdep experienced.

It's worth noting that EXPERIMENTAL in Kconfig is tagged.

---

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock acquisition order to try to cover
all synchonization machanisms.

Benefits:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with PG_locked.
	3. Works with swait/wakeup.
	4. Works with waitqueue.
	5. Works with wait_bit.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 7.
	9. Easy to tag any wait/event.

Future works:

	0. To make it more stable.
	1. To separates dept from lockdep.
	2. To improves performance in terms of time and space.
	3. To use dept as a dependency engine for lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate memory space for stack traces.

How to interpret reports:
(See the document in this patchset for more detail.)

	[S] the start of the event context
	[W] the wait disturbing the event from being triggered
	[E] the event that cannot be reachable

Thanks.

	Byungchul

---

Changes from v14:
	1. Rebase on the current latest, v6.15-rc6.
	2. Refactor dept code.
	3. With multi event sites for a single wait, even if an event
	   forms a circular dependency, the event can be recovered by
	   other event(or wake up) paths.  Even though informing the
	   circular dependency is worthy but it should be suppressed
	   once informing it, if it doesn't lead an actual deadlock.  So
	   introduce APIs to annotate the relationship between event
	   site and recover site, that are, event_site() and
	   dept_recover_event().
	4. wait_for_completion() worked with dept map embedded in struct
	   completion.  However, it generates a few false positves since
	   all the waits using the instance of struct completion, share
	   the map and key.  To avoid the false positves, make it not to
	   share the map and key but each wait_for_completion() caller
	   have its own key by default.  Of course, external maps also
	   can be used if needed.
	5. Fix a bug about hardirq on/off tracing.
	6. Implement basic unit test for dept.
	7. Add more supports for dma fence synchronization.
	8. Add emergency stop of dept e.g. on panic().
	9. Fix false positives by mmu_notifier_invalidate_*().
	10. Fix recursive call bug by DEPT_WARN_*() and DEPT_STOP().
	11. Fix trivial bugs in DEPT_WARN_*() and DEPT_STOP().
	12. Fix a bug that a spin lock, dept_pool_spin, is used in
	    both contexts of irq disabled and enabled without irq
	    disabled.
	13. Suppress reports with classes, any of that already have
	    been reported, even though they have different chains but
	    being barely meaningful.
	14. Print stacktrace of the wait that an event is now waking up,
	    not only stacktrace of the event.
	15. Make dept aware of lockdep_cmp_fn() that is used to avoid
	    false positives in lockdep so that dept can also avoid them.
	16. Do do_event() only if there are no ecxts have been
	    delimited.
	17. Fix a bug that was not synchronized for stage_m in struct
	    dept_task, using a spin lock, dept_task()->stage_lock.
	18. Fix a bug that dept didn't handle the case that multiple
	    ttwus for a single waiter can be called at the same time
	    e.i. a race issue.
	19. Distinguish each kernel context from others, not only by
	    system call but also by user oriented fault so that dept can
	    work with more accuracy information about kernel context.
	    That helps to avoid a few false positives.
	20. Limit dept's working to x86_64 and arm64.

Changes from v13:

	1. Rebase on the current latest version, v6.9-rc7.
	2. Add 'dept' documentation describing dept APIs.

Changes from v12:

	1. Refine the whole document for dept.
	2. Add 'Interpret dept report' section in the document, using a
	   deadlock report obtained in practice. Hope this version of
	   document helps guys understand dept better.

	   https://lore.kernel.org/lkml/6383cde5-cf4b-facf-6e07-1378a485657d@I-love.SAKURA.ne.jp/#t
	   https://lore.kernel.org/lkml/1674268856-31807-1-git-send-email-byungchul.park@lge.com/

Changes from v11:

	1. Add 'dept' documentation describing the concept of dept.
	2. Rewrite the commit messages of the following commits for
	   using weaker lockdep annotation, for better description.

	   fs/jbd2: Use a weaker annotation in journal handling
	   cpu/hotplug: Use a weaker annotation in AP thread

	   (feedbacked by Thomas Gleixner)

Changes from v10:

	1. Fix noinstr warning when building kernel source.
	2. dept has been reporting some false positives due to the folio
	   lock's unfairness. Reflect it and make dept work based on
	   dept annotaions instead of just wait and wake up primitives.
	3. Remove the support for PG_writeback while working on 2. I
	   will add the support later if needed.
	4. dept didn't print stacktrace for [S] if the participant of a
	   deadlock is not lock mechanism but general wait and event.
	   However, it made hard to interpret the report in that case.
	   So add support to print stacktrace of the requestor who asked
	   the event context to run - usually a waiter of the event does
	   it just before going to wait state.
	5. Give up tracking raw_local_irq_{disable,enable}() since it
	   totally messed up dept's irq tracking. So make it work in the
	   same way as lockdep does. I will consider it once any false
	   positives by those are observed again.
	6. Change the manual rwsem_acquire_read(->j_trans_commit_map)
	   annotation in fs/jbd2/transaction.c to the try version so
	   that it works as much as it exactly needs.
	7. Remove unnecessary 'inline' keyword in dept.c and add
	   '__maybe_unused' to a needed place.

Changes from v9:

	1. Fix a bug. SDT tracking didn't work well because of my big
	   mistake that I should've used waiter's map to indentify its
	   class but it had been working with waker's one. FYI,
	   PG_locked and PG_writeback weren't affected. They still
	   worked well. (reported by YoungJun)
	
Changes from v8:

	1. Fix build error by adding EXPORT_SYMBOL(PG_locked_map) and
	   EXPORT_SYMBOL(PG_writeback_map) for kernel module build -
	   appologize for that. (reported by kernel test robot)
	2. Fix build error by removing header file's circular dependency
	   that was caused by "atomic.h", "kernel.h" and "irqflags.h",
	   which I introduced - appolgize for that. (reported by kernel
	   test robot)

Changes from v7:

	1. Fix a bug that cannot track rwlock dependency properly,
	   introduced in v7. (reported by Boqun and lockdep selftest)
	2. Track wait/event of PG_{locked,writeback} more aggressively
	   assuming that when a bit of PG_{locked,writeback} is cleared
	   there might be waits on the bit. (reported by Linus, Hillf
	   and syzbot)
	3. Fix and clean bad style code e.i. unnecessarily introduced
	   a randome pattern and so on. (pointed out by Linux)
	4. Clean code for applying dept to wait_for_completion().

Changes from v6:

	1. Tie to task scheduler code to track sleep and try_to_wake_up()
	   assuming sleeps cause waits, try_to_wake_up()s would be the
	   events that those are waiting for, of course with proper dept
	   annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
	   and so on. For these cases, class is classified at sleep
	   entrance rather than the synchronization initialization code.
	   Which would extremely reduce false alarms.
	2. Remove the dept associated instance in each page struct for
	   tracking dependencies by PG_locked and PG_writeback thanks to
	   the 1. work above.
	3. Introduce CONFIG_dept_AGGRESIVE_TIMEOUT_WAIT to suppress
	   reports that waits with timeout set are involved, for those
	   who don't like verbose reporting.
	4. Add a mechanism to refill the internal memory pools on
	   running out so that dept could keep working as long as free
	   memory is available in the system.
	5. Re-enable tracking hashed-waitqueue wait. That's going to no
	   longer generate false positives because class is classified
	   at sleep entrance rather than the waitqueue initailization.
	6. Refactor to make it easier to port onto each new version of
	   the kernel.
	7. Apply dept to dma fence.
	8. Do trivial optimizaitions.

Changes from v5:

	1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
	   of internal resources because WARN_*() printing stacktrace is
	   too much for informing the lack. (feedback from Ted, Hyeonggon)
	2. Fix trivial bugs like missing initializing a struct before
	   using it.
	3. Assign a different class per task when handling onstack
	   variables for waitqueue or the like. Which makes dept
	   distinguish between onstack variables of different tasks so
	   as to prevent false positives. (reported by Hyeonggon)
	4. Make dept aware of even raw_local_irq_*() to prevent false
	   positives. (reported by Hyeonggon)
	5. Don't consider dependencies between the events that might be
	   triggered within __schedule() and the waits that requires
	    __schedule(), real ones. (reported by Hyeonggon)
	6. Unstage the staged wait that has prepare_to_wait_event()'ed
	   *and* yet to get to __schedule(), if we encounter __schedule()
	   in-between for another sleep, which is possible if e.g. a
	   mutex_lock() exists in 'condition' of ___wait_event().
	7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
	   on the hardirq and softirq entrance tracing to make dept more
	   portable for now.

Changes from v4:

	1. Fix some bugs that produce false alarms.
	2. Distinguish each syscall context from another *for arm64*.
	3. Make it not warn it but just print it in case dept ring
	   buffer gets exhausted. (feedback from Hyeonggon)
	4. Explicitely describe "EXPERIMENTAL" and "dept might produce
	   false positive reports" in Kconfig. (feedback from Ted)

Changes from v3:

	1. dept shouldn't create dependencies between different depths
	   of a class that were indicated by *_lock_nested(). dept
	   normally doesn't but it does once another lock class comes
	   in. So fixed it. (feedback from Hyeonggon)
	2. dept considered a wait as a real wait once getting to
	   __schedule() even if it has been set to TASK_RUNNING by wake
	   up sources in advance. Fixed it so that dept doesn't consider
	   the case as a real wait. (feedback from Jan Kara)
	3. Stop tracking dependencies with a map once the event
	   associated with the map has been handled. dept will start to
	   work with the map again, on the next sleep.

Changes from v2:

	1. Disable dept on bit_wait_table[] in sched/wait_bit.c
	   reporting a lot of false positives, which is my fault.
	   Wait/event for bit_wait_table[] should've been tagged in a
	   higher layer for better work, which is a future work.
	   (feedback from Jan Kara)
	2. Disable dept on crypto_larval's completion to prevent a false
	   positive.

Changes from v1:

	1. Fix coding style and typo. (feedback from Steven)
	2. Distinguish each work context from another in workqueue.
	3. Skip checking lock acquisition with nest_lock, which is about
	   correct lock usage that should be checked by lockdep.

Changes from RFC(v0):

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	   (feedback from Linus and Matthew)
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Byungchul Park (43):
  llist: move llist_{head,node} definition to types.h
  dept: implement DEPT(DEPendency Tracker)
  dept: add single event dependency tracker APIs
  dept: add lock dependency tracker APIs
  dept: tie to lockdep and IRQ tracing
  dept: add proc knobs to show stats and dependency graph
  dept: distinguish each kernel context from another
  x86_64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to x86_64
  arm64, dept: add support CONFIG_ARCH_HAS_DEPT_SUPPORT to arm64
  dept: distinguish each work from another
  dept: add a mechanism to refill the internal memory pools on running
    out
  dept: record the latest one out of consecutive waits of the same class
  dept: apply sdt_might_sleep_{start,end}() to
    wait_for_completion()/complete()
  dept: apply sdt_might_sleep_{start,end}() to swait
  dept: apply sdt_might_sleep_{start,end}() to waitqueue wait
  dept: apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait
  dept: apply sdt_might_sleep_{start,end}() to dma fence
  dept: track timeout waits separately with a new Kconfig
  dept: apply timeout consideration to wait_for_completion()/complete()
  dept: apply timeout consideration to swait
  dept: apply timeout consideration to waitqueue wait
  dept: apply timeout consideration to hashed-waitqueue wait
  dept: apply timeout consideration to dma fence wait
  dept: make dept able to work with an external wgen
  dept: track PG_locked with dept
  dept: print staged wait's stacktrace on report
  locking/lockdep: prevent various lockdep assertions when
    lockdep_off()'ed
  dept: suppress reports with classes that have been already reported
  dept: add documentation for dept
  cpu/hotplug: use a weaker annotation in AP thread
  fs/jbd2: use a weaker annotation in journal handling
  dept: assign dept map to mmu notifier invalidation synchronization
  dept: assign unique dept_key to each distinct dma fence caller
  dept: make dept aware of lockdep_set_lock_cmp_fn() annotation
  dept: make dept stop from working on debug_locks_off()
  i2c: rename wait_for_completion callback to wait_for_completion_cb
  dept: assign unique dept_key to each distinct wait_for_completion()
    caller
  completion, dept: introduce init_completion_dmap() API
  dept: introduce a new type of dependency tracking between multi event
    sites
  dept: add module support for struct dept_event_site and
    dept_event_site_dep
  dept: introduce event_site() to disable event tracking if it's
    recoverable
  dept: implement a basic unit test for dept
  dept: call dept_hardirqs_off() in local_irq_*() regardless of irq
    state

 Documentation/dependency/dept.txt     |  735 ++++++
 Documentation/dependency/dept_api.txt |  117 +
 arch/arm64/Kconfig                    |    1 +
 arch/arm64/kernel/syscall.c           |    7 +
 arch/arm64/mm/fault.c                 |    7 +
 arch/x86/Kconfig                      |    1 +
 arch/x86/entry/syscall_64.c           |    7 +
 arch/x86/mm/fault.c                   |    7 +
 drivers/dma-buf/dma-fence.c           |   17 +-
 drivers/i2c/algos/i2c-algo-pca.c      |    2 +-
 drivers/i2c/busses/i2c-pca-isa.c      |    2 +-
 drivers/i2c/busses/i2c-pca-platform.c |    2 +-
 fs/jbd2/transaction.c                 |    2 +-
 include/asm-generic/vmlinux.lds.h     |   13 +-
 include/linux/completion.h            |  124 +-
 include/linux/dept.h                  |  625 +++++
 include/linux/dept_ldt.h              |   77 +
 include/linux/dept_sdt.h              |   67 +
 include/linux/dept_unit_test.h        |   67 +
 include/linux/dma-fence.h             |   74 +-
 include/linux/hardirq.h               |    3 +
 include/linux/i2c-algo-pca.h          |    2 +-
 include/linux/irqflags.h              |   21 +-
 include/linux/llist.h                 |    8 -
 include/linux/local_lock_internal.h   |    1 +
 include/linux/lockdep.h               |  105 +-
 include/linux/lockdep_types.h         |    3 +
 include/linux/mm_types.h              |    2 +
 include/linux/mmu_notifier.h          |   26 +
 include/linux/module.h                |    5 +
 include/linux/mutex.h                 |    1 +
 include/linux/page-flags.h            |  125 +-
 include/linux/pagemap.h               |    7 +-
 include/linux/percpu-rwsem.h          |    2 +-
 include/linux/rtmutex.h               |    1 +
 include/linux/rwlock_types.h          |    1 +
 include/linux/rwsem.h                 |    1 +
 include/linux/sched.h                 |  120 +-
 include/linux/seqlock.h               |    2 +-
 include/linux/spinlock_types_raw.h    |    3 +
 include/linux/srcu.h                  |    2 +-
 include/linux/swait.h                 |    3 +
 include/linux/types.h                 |    8 +
 include/linux/wait.h                  |    3 +
 include/linux/wait_bit.h              |    3 +
 init/init_task.c                      |    2 +
 init/main.c                           |    2 +
 kernel/Makefile                       |    1 +
 kernel/cpu.c                          |    2 +-
 kernel/dependency/Makefile            |    5 +
 kernel/dependency/dept.c              | 3510 +++++++++++++++++++++++++
 kernel/dependency/dept_hash.h         |   10 +
 kernel/dependency/dept_internal.h     |   64 +
 kernel/dependency/dept_object.h       |   13 +
 kernel/dependency/dept_proc.c         |   93 +
 kernel/dependency/dept_unit_test.c    |  173 ++
 kernel/exit.c                         |    1 +
 kernel/fork.c                         |    2 +
 kernel/locking/lockdep.c              |   33 +
 kernel/module/main.c                  |   19 +
 kernel/sched/completion.c             |   62 +-
 kernel/sched/core.c                   |    8 +
 kernel/workqueue.c                    |    3 +
 lib/Kconfig.debug                     |   51 +
 lib/debug_locks.c                     |    2 +
 lib/locking-selftest.c                |    2 +
 mm/filemap.c                          |   26 +
 mm/mm_init.c                          |    2 +
 mm/mmu_notifier.c                     |   31 +-
 69 files changed, 6404 insertions(+), 125 deletions(-)
 create mode 100644 Documentation/dependency/dept.txt
 create mode 100644 Documentation/dependency/dept_api.txt
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_ldt.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 include/linux/dept_unit_test.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c
 create mode 100644 kernel/dependency/dept_unit_test.c


base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
-- 
2.17.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ