[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230623043623.GA851@sol.localdomain>
Date: Thu, 22 Jun 2023 21:36:23 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Eric Sandeen <sandeen@...deen.net>
Cc: Dave Chinner <david@...morbit.com>,
syzbot <syzbot+9d0b0d54a8bd799f6ae4@...kaller.appspotmail.com>,
dchinner@...hat.com, djwong@...nel.org, hch@....de,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-xfs@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [xfs?] WARNING: Reset corrupted AGFL on AG NUM. NUM
blocks leaked. Please unmount and run xfs_repair.
On Thu, Jun 22, 2023 at 10:09:55PM -0500, Eric Sandeen wrote:
> > Grepping for "WARNING:" is how other kernel testing systems find WARN_ON's in
> > the log too. For example, see _check_dmesg() in common/rc in xfstests.
> > xfstests fails tests if "WARNING:" is logged. You might be aware of this, as
> > you reviewed and applied xfstests commit 47e5d7d2bb17 which added the code.
> >
> > I understand it's frustrating that Dmitry's attempt to do something about this
> > problem was incomplete. I don't think it is helpful to then send a reflexive,
> > adversarial response that shifts the blame for this longstanding problem with
> > the kernel logs entirely onto syzbot and even Dmitry personally. That just
> > causes confusion about the problem that needs to be solved.
> >
> > Anyway, either everything that parses the kernel logs needs to be smarter about
> > identifying real WARN_ON's, or all instances of "WARNING:" need to be eliminated
> > from the log (with existing code, coding style guidelines, and checkpatch
> > updated as you mentioned). I think I'm leaning towards the position that fake
> > "WARNING:"s should be eliminated. It does seem like a hack, but it makes the
> > "obvious" log pattern matching that everyone tends to write work as expected...
> >
> > If you don't want to help, fine, but at least please try not to be obstructive.
>
> I didn't read Dave's reply as "obstructive." There's been a trend lately of
> ever-growing hoards of people (with machines behind them) generating
> ever-more work for a very small and fixed number of developers who are
> burning out. It's not sustainable. The work-generators need to help make
> things better, or the whole system is going to break.
>
> Dave being frustrated that he has to deal with "bug reports" about a printk
> phrase is valid, IMHO. There are many straws breaking the camel's back these
> days.
>
> You had asked for a constructive suggestion.
>
> My specific suggestion is that the people who decided that printk("WARNING")
> merits must-fix syzbot reports should submit patches to any subsystem they
> plan to test, to replace printk("WARNING") with something that will not
> trigger syzbot reports. Don't spread that pain onto every subsystem
> developer who already has to deal with legitimate and pressing work. Or,
> work out some other reliable way to discern WARN_ON from WARNING.
>
> And add it to checkpatch etc, as Dave suggested.
>
> This falls into the "help us help you" category. Early on, syszbot
> filesystem reports presented filesystems only as a giant array of hex in a C
> file, leaving it to the poor developer to work out how to use standard
> filesystem tools to analyze the input. Now we get standard images. That's an
> improvement, with some effort on the syzbot side that saves time and effort
> for every filesystem developer forever more. Find more ways to make these
> reports more relevant, more accurate, and more efficient to triage.
>
> That's my constructive suggestion.
>
I went ahead and filed an issue against syzkaller for this:
https://github.com/google/syzkaller/issues/3980
I still would like to emphasize that other testing systems such as xfstests do
the same "dmesg | grep WARNING:" thing and therefore have the same problem, at
least in principle. (Whether a test actually finds anything depends on the code
covered, of course.) Again, I'm mentioning this not to try to absolve syzkaller
of responsibility, but rather because it's important that everyone agrees on the
problem here, and ideally its solution too. If people continue operating under
the mistaken belief that this is a syzkaller specific issue, it might be hard to
get kernel patches merged to fix it, especially if those patches involve changes
to checkpatch.pl, CodingStyle, and several dozen different kernel subsystems.
Or, the syzkaller people might go off on their own and find and implement some
way to parse the log reliably, without other the testing systems being fixed...
- Eric
Powered by blists - more mailing lists