[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200414192840.4yp3zqbe2tgtesve@xps.therub.org>
Date: Tue, 14 Apr 2020 14:28:40 -0500
From: Dan Rue <dan.rue@...aro.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Qian Cai <cai@....pw>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Xu <peterx@...hat.com>,
LKML <linux-kernel@...r.kernel.org>,
Linux-MM <linux-mm@...ck.org>, Jens Axboe <axboe@...nel.dk>,
Christoph Lameter <cl@...ux.com>,
Johannes Weiner <hannes@...xchg.org>,
syzkaller <syzkaller@...glegroups.com>
Subject: Re: [PATCH 0/2] mm: Two small fixes for recent syzbot reports
On Tue, Apr 14, 2020 at 01:12:50PM +0200, Dmitry Vyukov wrote:
> On Tue, Apr 14, 2020 at 12:06 AM Qian Cai <cai@....pw> wrote:
> > Well, there are other CI's beyond syzbot.
> > On the other hand, this makes me worry who is testing on linux-next every day.
>
> How do these use-after-free's and locking bugs get past the
> unit-testing systems (which syzbot is not) and remain unnoticed for so
> long?...
> syzbot uses the dumbest VMs (GCE), so everything it triggers during
> boot should be triggerable pretty much everywhere.
> It seems to be an action point for the testing systems. "Boot to ssh"
> is not the best criteria. Again if there is a LOCKDEP error, we are
> not catching any more LOCKDEP errors during subsequent testing. If
> there is a use-after-free, that's a serious error on its own and KASAN
> produces only 1 error by default as well. And as far as I understand,
> lots of kernel testing systems don't even enable KASAN, which is very
> wrong.
> I've talked to +Dan Rue re this few days ago. Hopefully LKFT will
> start catching these as part of unit testing. Which should help with
> syzbot testing as well.
LKFT has recently added testing with KASAN enabled and improved the
kernel log parsing to catch more of this class of errors while
performing our regular functional testing.
Incidentally, -next was also broken for us from March 25 through April 5
due to a perf build failure[0], which eventually made itself all the way
down into v5.6 release and I believe the first two 5.6.x stable
releases.
For -next, LKFT's gap is primarily reporting. We do build and run over
30k tests on every -next daily release, but we send out issues manually
when we see them because triaging is still a manual effort. We're
working to build better automated reporting. If anyone is interested in
watching LKFT's -next results more closely (warning, it's a bit noisy),
please let me know. Watching the results at https://lkft.linaro.org
provides some overall health indications, but again, it gets pretty
difficult to figure out signal from noise once you start drilling down
without sufficient context of the system.
Dan
[0] https://lore.kernel.org/stable/CA+G9fYsZjmf34pQT1DeLN_DDwvxCWEkbzBfF0q2VERHb25dfZQ@mail.gmail.com/
--
Linaro LKFT
https://lkft.linaro.org
Powered by blists - more mailing lists