[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c6d8e729-6537-1a6a-43ff-255e8fbcec7d@gmail.com>
Date: Tue, 28 Dec 2021 10:31:26 +0800
From: Jia-Ju Bai <baijiaju1990@...il.com>
To: Theodore Ts'o <tytso@....edu>
Cc: Matthew Wilcox <willy@...radead.org>, viro@...iv.linux.org.uk,
Jens Axboe <axboe@...nel.dk>, hch@...radead.org,
linux-fsdevel@...r.kernel.org, linux-block@...r.kernel.org,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] fs: super: possible ABBA deadlocks in
do_thaw_all_callback() and freeze_bdev()
On 2021/12/27 21:43, Theodore Ts'o wrote:
> On Mon, Dec 27, 2021 at 05:32:09PM +0800, Jia-Ju Bai wrote:
>> Thanks for your reply and suggestions.
>> I will try to trigger this possible deadlock by enabling lockdep and using
>> the workloads that you suggested.
>> In my opinion, static analysis can conveniently cover some code that is hard
>> to be covered at runtime, and thus it is useful to detecting some
>> infrequently-triggered bugs.
>> However, it is true that static analysis sometimes has many false positives,
>> which is unsatisfactory :(
>> I am trying some works to relieve this problem in kernel-code analysis.
>> I can understand that the related code is not frequently executed, but I
>> think that finding and fixing bugs should be always useful in practice :)
> The thing about the sysrq commands is that they are almost always used
> in emergency situations when the system administrator with physical
> access to the console sends a sysrq command (e.g., by sending a BREAK
> to the serial console). This is usually done when the system has
> *already* locked up for some reason, such as getting livelocked due to
> an out of memory condition, or maybe even a deadlock. So if sysrq-j
> could potentially cause a deadlock, so what? Sysrq-j would only be
> used when the system was in a really bad state due to a bug in any
> case. In over 10 years of kernel development, I can't remember a
> single time when I've needed to use sysrq-j.
>
> So it might be that the better way to handle this would be to make
> sure all of the emergency sysrq code in fs/super.c is under the
> CONFIG_MAGIC_SYSRQ #ifdef --- and then do the static analysis without
> CONFIG_MAGIC_SYSRQ defined.
Thanks for the explanation.
In fact, I did not know the sysrq commands, before finding this bug and
seeing your explanation.
>
> As I said, I agree it's a bug, and if I had infinite resources, I'd
> certainly ask an engineer to completely rework the emergency sysrq-j
> code path to address the potential ABBA deadlock. The problem is I do
> *not* have infinite resources, which means I have to prioritize which
> bugs get attention, and how much time engineers on my team spend
> working on new features or performance enhacements that can justify
> their salaries and ensure that they get good performance ratings ---
> since leadership, technical difficulty and business impact is how
> engineers get judged at my company.
I can understand the priority of bug fixing, with the consideration of
resources and time.
My static analysis tool just provides a small message that there is a
possible bug :)
>
> Unfortunately, judging business impact is one of those things that is
> unfair to expect a static analyzer to do.
Thanks for your understanding :)
Before seeing your explanation, I have no idea of business impact.
But it is indeed practical to consider business impact and resource
assignment in kernel development.
> And after all, if we have
> infinite resources, why should an OS bother with a VM? We can just
> pin all process text/data segments in memory, if money (and DRAM
> availability in the supply chain) is no object. :-)
Haha, interesting idea :)
Thanks a lot,
Jia-Ju Bai
Powered by blists - more mailing lists