linux-kernel - RFC: upstreaming fuzzing coverage support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+Y7qjRCnwUZc60ZOvXyQTrV+o7mToEUgh5hzKVFc48PnA@mail.gmail.com>
Date:	Fri, 4 Dec 2015 23:11:31 +0100
From:	Dmitry Vyukov <dvyukov@...gle.com>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	syzkaller <syzkaller@...glegroups.com>,
	Kostya Serebryany <kcc@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	Quentin Casasnovas <quentin.casasnovas@...cle.com>,
	Andrey Ryabinin <ryabinin.a.a@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Drysdale <drysdale@...gle.com>
Subject: RFC: upstreaming fuzzing coverage support

Hello,

You may have seen some bugs found by syzkaller reported recently:
https://github.com/google/syzkaller/wiki/Found-Bugs
Syzkaller is a coverage-guided syscall fuzzer. It relies on special
code coverage instrumentation and kernel support to extract code
coverage for individual syscalls. In exchange for this additional
complexity it is more efficient in finding bugs than fuzzers that
blindly generate inputs with rand.

Gcc part was upstreamed today:
https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/sancov.c?revision=231296&view=markup

I would also like to upstream the kernel part. The kernel part is
essentially a debugfs file that allows to instruct kernel to collect
per-task coverage, and then extract and reset it.

Why not gcov? There are several reasons. (1) gcov does not allow to
collect coverage per-task, which is crucial for this case (fuzzer
needs coverage as a stable function of input). (2) a typical fuzzer
loop looks as follows: reset coverage, execute a bit of code, fetch
coverage, repeat. A bit of code is really a bit of code, because lots
of syscalls instantly return with EINVAL. Gcov is fast for coverage
collection, but it is extremely slow for reset and fetch parts (it is
basically O(number of basic blocks or edges), which is ~2M for a beefy
kernel). (3) this special fuzzing coverage also exposes information in
a format more suitable for fuzzers and/or more information (e.g.
direct execution trace).

Besides generation of sequences of syscalls, this coverage can also be
used for generation of blob inputs to kernel. For example, data coming
from network, usb, air; or complex syscall inputs like crypto, bpf,
kdbus. This area is not explored yet.

The current patch lives here:
https://github.com/dvyukov/linux/commit/a8175057d14fa8ff8cc4589edf55a6855d9afdf4
It needs some cleanup for the Makefile part and documentation, but you
can get the general idea.

User-space part that uses it lives here (functions starting with cover_):
https://github.com/google/syzkaller/blob/master/executor/executor.cc

I would like to hear your thoughts on general idea, kernel/user
interface, implementation, etc.
What tree such functionality should go into? mm?

Thank you
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/