[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wipA4605yvnmjW7T9EvARPRCGLARty8UUzRGxic1SXqvg@mail.gmail.com>
Date: Fri, 20 Oct 2023 10:23:57 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Andy Shevchenko <andriy.shevchenko@...el.com>,
Baokun Li <libaokun1@...wei.com>
Cc: Josh Poimboeuf <jpoimboe@...nel.org>, Jan Kara <jack@...e.cz>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Kees Cook <keescook@...omium.org>,
Ferry Toth <ftoth@...londelft.nl>,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [GIT PULL] ext2, quota, and udf fixes for 6.6-rc1
On Fri, 20 Oct 2023 at 07:52, Andy Shevchenko
<andriy.shevchenko@...el.com> wrote:
>
> # first bad commit: [e64db1c50eb5d3be2187b56d32ec39e56b739845] quota: factor out dquot_write_dquot()
Ok, so commit 024128477809 ("quota: factor out dquot_write_dquot()") pre-rebase.
Which honestly seems entirely innocuous, and the only change seems to
be a slight massaging of the return value checking, in that it did a
"if (err)" ine one place before, now it does "if (err < 0)".
And the whole "now it always warns about errors", which used to happen
only in dqput() before.
Neither seems to be very relevant, which just reinforces that yes,
this looks like a timing thing.
> On top of the above I have tried the following:
> 1) dropping inline, replacing it to __always_inline -- no help;
> 2) commenting out the error message -- helps!
>
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -632,8 +632,10 @@ static inline int dquot_write_dquot(struct dquot *dquot)
> {
> int ret = dquot->dq_sb->dq_op->write_dquot(dquot);
> if (ret < 0) {
> +#if 0
> quota_error(dquot->dq_sb, "Can't write quota structure "
> "(error %d). Quota may get out of sync!", ret);
> +#endif
> /* Clear dirty bit anyway to avoid infinite loop. */
> clear_dquot_dirty(dquot);
> }
The only thing quota_error() does is the varags handling and a printk,
so yeah, all that #if 0" would do even if the error triggers (and it
presumably doesn't) is to change code generation around that point,
and change timing.
But what *is* interesting is that that commit that triggers it is
before all the other list-handling changes, so the fact that this was
triggered by that merge and that one commit, *all* that really
happened to trigger your boot failure is literally this:
git log 1500e7e0726e^..024128477809
(that "1500e7e0726e^" is the pre-merge state). So it's not that the
problem was introduced by one of the other list-handling changes and
then 024128477809 just happened to change the timing. No, it's
literally that one commit that moves code around, and that one
quota_error() printout that makes the problem show for you.
So it really looks like the bug is pre-existing. Or actually a
compiler problem that is introduced by the added call that changes
code generation, but honestly, that is a very unlikely thing.
That said - while unlikely, mind just sending me the *failing* copy of
the fs/quota/dquot.o object file, and I'll take a look at the code
around that call. I've looked at enough code generation issues that
it's worth trying..
Linus
Powered by blists - more mailing lists