lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 5 Jul 2019 15:18:06 +0200 From: Dmitry Vyukov <dvyukov@...gle.com> To: "Theodore Ts'o" <tytso@....edu>, syzbot <syzbot+4bfbbf28a2e50ab07368@...kaller.appspotmail.com>, Andreas Dilger <adilger.kernel@...ger.ca>, David Miller <davem@...emloft.net>, eladr@...lanox.com, Ido Schimmel <idosch@...lanox.com>, Jiri Pirko <jiri@...lanox.com>, John Stultz <john.stultz@...aro.org>, linux-ext4@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>, netdev <netdev@...r.kernel.org>, syzkaller-bugs <syzkaller-bugs@...glegroups.com>, Thomas Gleixner <tglx@...utronix.de> Cc: syzkaller <syzkaller@...glegroups.com> Subject: Re: INFO: rcu detected stall in ext4_write_checks On Wed, Jun 26, 2019 at 8:43 PM Theodore Ts'o <tytso@....edu> wrote: > > On Wed, Jun 26, 2019 at 10:27:08AM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: abf02e29 Merge tag 'pm-5.2-rc6' of git://git.kernel.org/pu.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1435aaf6a00000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=e5c77f8090a3b96b > > dashboard link: https://syzkaller.appspot.com/bug?extid=4bfbbf28a2e50ab07368 > > compiler: gcc (GCC) 9.0.0 20181231 (experimental) > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11234c41a00000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15d7f026a00000 > > > > The bug was bisected to: > > > > commit 0c81ea5db25986fb2a704105db454a790c59709c > > Author: Elad Raz <eladr@...lanox.com> > > Date: Fri Oct 28 19:35:58 2016 +0000 > > > > mlxsw: core: Add port type (Eth/IB) set API > > Um, so this doesn't pass the laugh test. > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10393a89a00000 > > It looks like the automated bisection machinery got confused by two > failures getting triggered by the same repro; the symptoms changed > over time. Initially, the failure was: > > crashed: INFO: rcu detected stall in {sys_sendfile64,ext4_file_write_iter} > > Later, the failure changed to something completely different, and much > earlier (before the test was even started): > > run #5: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor216456474" "root@...128.15.205:./syz-executor216456474"]: exit status 1 > Connection timed out during banner exchange > lost connection > > Looks like an opportunity to improve the bisection engine? Hi Ted, Yes, these infrastructure errors plague bisections episodically. That's https://github.com/google/syzkaller/issues/1250 It did not confuse bisection explicitly as it understands that these are infrastructure failures rather then a kernel crash, e.g. here you may that it correctly identified that this run was OK and started bisection in v4.10 v4.9 range besides 2 scp failures: testing release v4.9 testing commit 69973b830859bc6529a7a0468ba0d80ee5117826 with gcc (GCC) 5.5.0 run #0: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" ...]: exit status 1 Connection timed out during banner exchange run #1: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" ....]: exit status 1 Connection timed out during banner exchange run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect start v4.10 v4.9 Though, of course, it may confuse bisection indirectly by reducing number of tests per commit. So far I wasn't able to gather any significant info about these failures. We gather console logs, but on these runs they are empty. It's easy to blame everything onto GCE but I don't have any bit of information that would point either way. These failures just appear randomly in production and usually in batches...
Powered by blists - more mailing lists