[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <489d941f-c4e8-4d1f-92ee-02074c713dd1@gmail.com>
Date: Sun, 24 Nov 2024 22:34:02 +0900
From: Yunseong Kim <yskelg@...il.com>
To: Byungchul Park <byungchul@...com>
Cc: LKML <linux-kernel@...r.kernel.org>, kernel_team@...ynix.com,
torvalds@...ux-foundation.org, damien.lemoal@...nsource.wdc.com,
linux-ide@...r.kernel.org, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, mingo@...hat.com, peterz@...radead.org,
will@...nel.org, tglx@...utronix.de, rostedt@...dmis.org,
joel@...lfernandes.org, sashal@...nel.org, daniel.vetter@...ll.ch,
duyuyang@...il.com, johannes.berg@...el.com, tj@...nel.org, tytso@....edu,
willy@...radead.org, david@...morbit.com, amir73il@...il.com,
gregkh@...uxfoundation.org, kernel-team@....com, linux-mm@...ck.org,
akpm@...ux-foundation.org, mhocko@...nel.org, minchan@...nel.org,
hannes@...xchg.org, vdavydov.dev@...il.com, sj@...nel.org,
jglisse@...hat.com, dennis@...nel.org, cl@...ux.com, penberg@...nel.org,
rientjes@...gle.com, vbabka@...e.cz, ngupta@...are.org,
linux-block@...r.kernel.org, josef@...icpanda.com,
linux-fsdevel@...r.kernel.org, jack@...e.cz, jlayton@...nel.org,
dan.j.williams@...el.com, hch@...radead.org, djwong@...nel.org,
dri-devel@...ts.freedesktop.org, rodrigosiqueiramelo@...il.com,
melissa.srw@...il.com, hamohammed.sa@...il.com, 42.hyeyoo@...il.com,
chris.p.wilson@...el.com, gwan-gyeong.mun@...el.com,
max.byungchul.park@...il.com, boqun.feng@...il.com, longman@...hat.com,
hdanton@...a.com, her0gyugyu@...il.com, Yeoreum Yun <yeoreum.yun@....com>
Subject: Re: [PATCH v14 2/28] dept: Implement Dept(Dependency Tracker)
Hi Byungchul,
Thank you for the great feature. Currently, DEPT has a bug in the
'dept_key_destroy()' function that must be fixed to ensure proper
operation in the upstream Linux kernel.
On 5/8/24 6:46 오후, Byungchul Park wrote:
> CURRENT STATUS
> --------------
> Lockdep tracks acquisition order of locks in order to detect deadlock,
> and IRQ and IRQ enable/disable state as well to take accident
> acquisitions into account.
>
> Lockdep should be turned off once it detects and reports a deadlock
> since the data structure and algorithm are not reusable after detection
> because of the complex design.
>
> PROBLEM
> -------
> *Waits* and their *events* that never reach eventually cause deadlock.
> However, Lockdep is only interested in lock acquisition order, forcing
> to emulate lock acqusition even for just waits and events that have
> nothing to do with real lock.
>
> Even worse, no one likes Lockdep's false positive detection because that
> prevents further one that might be more valuable. That's why all the
> kernel developers are sensitive to Lockdep's false positive.
>
> Besides those, by tracking acquisition order, it cannot correctly deal
> with read lock and cross-event e.g. wait_for_completion()/complete() for
> deadlock detection. Lockdep is no longer a good tool for that purpose.
>
> SOLUTION
> --------
> Again, *waits* and their *events* that never reach eventually cause
> deadlock. The new solution, Dept(DEPendency Tracker), focuses on waits
> and events themselves. Dept tracks waits and events and report it if
> any event would be never reachable.
>
> Dept does:
> . Works with read lock in the right way.
> . Works with any wait and event e.i. cross-event.
> . Continue to work even after reporting multiple times.
> . Provides simple and intuitive APIs.
> . Does exactly what dependency checker should do.
>
> Q & A
> -----
> Q. Is this the first try ever to address the problem?
> A. No. Cross-release feature (b09be676e0ff2 locking/lockdep: Implement
> the 'crossrelease' feature) addressed it 2 years ago that was a
> Lockdep extension and merged but reverted shortly because:
>
> Cross-release started to report valuable hidden problems but started
> to give report false positive reports as well. For sure, no one
> likes Lockdep's false positive reports since it makes Lockdep stop,
> preventing reporting further real problems.
>
> Q. Why not Dept was developed as an extension of Lockdep?
> A. Lockdep definitely includes all the efforts great developers have
> made for a long time so as to be quite stable enough. But I had to
> design and implement newly because of the following:
>
> 1) Lockdep was designed to track lock acquisition order. The APIs and
> implementation do not fit on wait-event model.
> 2) Lockdep is turned off on detection including false positive. Which
> is terrible and prevents developing any extension for stronger
> detection.
>
> Q. Do you intend to totally replace Lockdep?
> A. No. Lockdep also checks if lock usage is correct. Of course, the
> dependency check routine should be replaced but the other functions
> should be still there.
>
> Q. Do you mean the dependency check routine should be replaced right
> away?
> A. No. I admit Lockdep is stable enough thanks to great efforts kernel
> developers have made. Lockdep and Dept, both should be in the kernel
> until Dept gets considered stable.
>
> Q. Stronger detection capability would give more false positive report.
> Which was a big problem when cross-release was introduced. Is it ok
> with Dept?
> A. It's ok. Dept allows multiple reporting thanks to simple and quite
> generalized design. Of course, false positive reports should be fixed
> anyway but it's no longer as a critical problem as it was.
>
> Signed-off-by: Byungchul Park <byungchul@...com>
If a module previously checked for dependencies by DEPT is loaded and
then would be unloaded, a kernel panic shall occur when the kernel
reuses the corresponding memory area for other purposes. This issue must
be addressed as a priority to enable the use of DEPT. Testing this patch
on the Ubuntu kernel confirms the problem.
> +void dept_key_destroy(struct dept_key *k)
> +{
> + struct dept_task *dt = dept_task();
> + unsigned long flags;
> + int sub_id;
> +
> + if (unlikely(!dept_working()))
> + return;
> +
> + if (dt->recursive == 1 && dt->task_exit) {
> + /*
> + * Need to allow to go ahead in this case where
> + * ->recursive has been set to 1 by dept_off() in
> + * dept_task_exit() and ->task_exit has been set to
> + * true in dept_task_exit().
> + */
> + } else if (dt->recursive) {
> + DEPT_STOP("Key destroying fails.\n");
> + return;
> + }
> +
> + flags = dept_enter();
> +
> + /*
> + * dept_key_destroy() should not fail.
> + *
> + * FIXME: Should be fixed if dept_key_destroy() causes deadlock
> + * with dept_lock().
> + */
> + while (unlikely(!dept_lock()))
> + cpu_relax();
> +
> + for (sub_id = 0; sub_id < DEPT_MAX_SUBCLASSES; sub_id++) {
> + struct dept_class *c;
> +
> + c = lookup_class((unsigned long)k->base + sub_id);
> + if (!c)
> + continue;
> +
> + hash_del_class(c);
> + disconnect_class(c);
> + list_del(&c->all_node);
> + invalidate_class(c);
> +
> + /*
> + * Actual deletion will happen on the rcu callback
> + * that has been added in disconnect_class().
> + */
> + del_class(c);
> + }
> +
> + dept_unlock();
> + dept_exit(flags);
> +
> + /*
> + * Wait until even lockless hash_lookup_class() for the class
> + * returns NULL.
> + */
> + might_sleep();
> + synchronize_rcu();
> +}
> +EXPORT_SYMBOL_GPL(dept_key_destroy);
Best regards,
Yunseong Kim
Powered by blists - more mailing lists