linux-kernel - Re: [RFC] syzbot process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+aXLGYw2iaocMyf-zhWao96gRw+98caemaebt-9s2DMfw@mail.gmail.com>
Date:   Thu, 28 Dec 2017 11:41:04 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Eric Biggers <ebiggers3@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        syzkaller <syzkaller@...glegroups.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Kostya Serebryany <kcc@...gle.com>,
        Alexander Potapenko <glider@...gle.com>,
        andreyknvl <andreyknvl@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        David Miller <davem@...emloft.net>,
        Willem de Bruijn <willemb@...gle.com>,
        Guenter Roeck <groeck@...gle.com>,
        Stephan Mueller <smueller@...onox.de>
Subject: Re: [RFC] syzbot process

On Fri, Dec 22, 2017 at 4:32 AM, Eric Biggers <ebiggers3@...il.com> wrote:
> On Thu, Dec 21, 2017 at 01:52:40PM +0100, Dmitry Vyukov wrote:
>> However, the cost is that it needs to understand statuses of bugs:
>> most importantly, what commit fixes what bug. It also has support for
>> marking a bug as "invalid", e.g. happened once but most likely was
>> caused by a previous silent memory corruption. And support for marking
>> bugs as duplicates of other bugs, i.e. the same root cause and will be
>> fixed when the target bug is fixed. These simple rules are outlined in
>> the footer of each report and also explained in more detail at the
>> referenced link:
>>
>> ----------------------------------
>> This bug is generated by a dumb bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for details.
>> Direct all questions to syzkaller@...glegroups.com.
>> Please credit me with: Reported-by: syzbot <syzkaller@...glegroups.com>
>> syzbot will keep track of this bug report.
>> Once a fix for this bug is merged into any tree, reply to this email with:
>> #syz fix: exact-commit-title
>> If you want to test a patch for this bug, please reply with:
>> #syz test: git://repo/address.git branch
>> and provide the patch inline or as an attachment.
>> To mark this as a duplicate of another syzbot report, please reply with:
>> #syz dup: exact-subject-of-another-report
>> If it's a one-off invalid bug report, please reply with:
>> #syz invalid
>> Note: if the crash happens again, it will cause creation of a new bug  report.
>> Note: all commands must start from beginning of the line in the email body.
>> ----------------------------------
>>
>> Status tracking allows syzbot to (1) keep track of still unfixed bugs
>> (more than half actually gets lost in LKML archives if nobody keeps
>> track of them), (2) be able to ever report similarly looking crashes
>> as new bugs in future, (3) be able to test fixes.
>>
>> The problem is that these rules are mostly not followed.
>
> As others mentioned, allowing a bug ID to be in the fix's commit message,
> perhaps in the Reported-by line which syzbot already suggests to include, would
> make things a bit easier.
>
> But I think the larger problem is that people in the community don't have any
> visibility into the statuses of the bugs, so they don't have any motivation to
> manage the statuses.
>
> Are you planning to make a dashboard app publicly available for upstream kernel
> bugs being tracked by syzbot?  I think it would be very useful for the
> community, especially for finding more details about a bug, e.g. when was it
> last seen, how often was it seen, has it been seen in multiple trees.  Also for
> finding duplicates which may not have been sent to the correct mailing list.

Hi Eric,

Good question. I would very much like to open the UI, and I hope to do
it in near future, but we need to do some additional work to make it
possible. The good news is that information is already accumulating
and we can do pings, etc.

> syzbot also should be sending out reminders for bugs that are still open if the
> crash is still occurring, and even moreso if there is a reproducer.

Agree. The reasons why this hasn't happen yet are:
1. syzbot is being built up as it's running, I am overwhelmed with
hundreds of bugs and also doing lots of work which may be not directly
visible but important (e.g. improving quality of generated
reproducers, increasing percent of cases when reproducers are created,
improving bug title extraction logic, implementing patch testing by
request, now this new Reported-by-based process, etc).
2. Just sending an email for each open bug every week is simple, but I
afraid it won't be warmly welcomed. The open questions are: how
frequently syzbot should ping? should repro/no repro affect this? what
to do if it stopped happening? stopped happenning for how long? and
what if it happened just few times, so we can't really conclude if it
still happens or not (but we've seen very bad races manifesting this
way)? how should it interact with the following point?

> However, if the crash isn't still occurring, then I expect it will become
> necessary to automatically invalidate the bug after some time, lest the list of
> bugs grow without bound due to bugs that have already been fixed that no one has
> time to debug to figure out exactly when/what the fix was, especially if there
> is no reproducer.  Or perhaps the bug was only in linux-next and only existed
> due to a buggy patch which was dropped or modified before it reached mainline,
> so there is no "fix" commit.

Good point. I think we will need to do this in some form in future.
Again open questions:
 - what is the precise formula behind "isn't still occurring"?
 - should we only close "no repro" bugs?
 - should we re-test bugs with repro? (re-testing is not 100% precise,
so we will lose some real subtle bugs this way)

Thanks