linux-kernel - Re: INFO: rcu detected stall in sys

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190313234040.GH10169@gmail.com>
Date:   Wed, 13 Mar 2019 16:40:41 -0700
From:   Eric Biggers <ebiggers@...nel.org>
To:     Dmitry Vyukov <dvyukov@...gle.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        syzbot <syzbot+1505c80c74256c6118a5@...kaller.appspotmail.com>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Al Viro <viro@...iv.linux.org.uk>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: INFO: rcu detected stall in sys_sendfile64 (2)

On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > Also, humans can sometimes find more simpler C reproducers from syzbot provided
> > reproducers. It would be nice if syzbot can accept and use a user defined C
> > reproducer for testing.
> 
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.
> 

Well, it's very tough and not many people are familiar with the syzkaller
codebase, let alone have time to contribute.  But having simplified a lot of
the syzkaller reproducers manually, the main things I do are:

- Replace bare system calls with proper C library calls.  For example:

	#include <sys/syscall.h>

	syscall(__NR_socket, 0xa, 6, 0);

    becomes:

	#include <sys/socket.h>

        socket(AF_INET, SOCK_DCCP, 0); 

- Do the same for structs.  Use the appropriate C header rather than filling in
  each struct manually.  For example:

	*(uint16_t*)0x20000000 = 0xa;
	*(uint16_t*)0x20000002 = htobe16(0x4e20);
	*(uint32_t*)0x20000004 = 0;
	*(uint8_t*)0x20000008 = 0;
	*(uint8_t*)0x20000009 = 0;
	*(uint8_t*)0x2000000a = 0;
	*(uint8_t*)0x2000000b = 0;
	*(uint8_t*)0x2000000c = 0;
	*(uint8_t*)0x2000000d = 0;
	*(uint8_t*)0x2000000e = 0;
	*(uint8_t*)0x2000000f = 0;
	*(uint8_t*)0x20000010 = 0;
	*(uint8_t*)0x20000011 = 0;
	*(uint8_t*)0x20000012 = 0;
	*(uint8_t*)0x20000013 = 0;
	*(uint8_t*)0x20000014 = 0;
	*(uint8_t*)0x20000015 = 0;
	*(uint8_t*)0x20000016 = 0;
	*(uint8_t*)0x20000017 = 0;
	*(uint32_t*)0x20000018 = 0;

    becomes:

	struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_port = htobe16(0x4e20) };

- Put arguments on the stack rather than in a mmap'd region, if possible.

- Simplify any calls to the helper functions that syzkaller emits, e.g.
  syz_open_dev(), syz_kvm_setup_vcpu(), or the networking setup stuff.  Usually
  the reproducer needs a small subset of the functionality to work.

- For multithreaded reproducers, try to incrementally simplify the threading
  strategy.  For example, reduce the number of threads by combining operations.
  Also try running the operations in loops.  Also, using fork() can often result
  in a simpler reproducer than pthreads.

- Instead of using the 'r[]' array to hold all integer return values, give them
  appropriate names.

- Remove duplicate #includes.

- Considering the actual kernel code and the bug, if possible find a different
  way to trigger the same bug that's simpler or more reliable.  If the problem
  is obvious it may be possible to jump right to this step from the beginning.

Some gotchas:

- fault-nth injections are fragile, since the number of memory allocations in a
  particular system call varies by kernel config and kernel version.
  Incrementing n starting from 1 is more reliable.

- Some of the perf_event_open() reproducers are fragile because they hardcode a
  trace event ID, which can change in every kernel version.  Reading the trace
  event ID from /sys/kernel/debug/tracing/events/ is more reliable.

- Reproducers using the KVM API sometimes only work on certain processors (e.g.
  Intel but not AMD) or even depend on the host kernel.

- Reproducers that access the local filesystem sometimes assume that it's ext4.