linux-kernel - Re: INFO: rcu detected stall in ext4_file_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190227215755.GD10828@mit.edu>
Date:   Wed, 27 Feb 2019 16:57:55 -0500
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Dmitry Vyukov <dvyukov@...gle.com>
CC:     syzbot <syzbot+7d19c5fe6a3f1161abb7@...kaller.appspotmail.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        <linux-ext4@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: INFO: rcu detected stall in ext4_file_write_iter

On Wed, Feb 27, 2019 at 10:58:50AM +0100, Dmitry Vyukov wrote:
> Peter, Ingo, do you have any updates on the
> perf_event_open/sched_setattr stalls? This bug cause assorted hangs
> throughout kernel and so is nasty.
> 
> syzkaller tries to remove all syscalls from reproducers one-by-one.
> Somehow without sched_setattr the hang did not reproduce (a bunch of
> repros have perf_event_open+sched_setattr so somehow they seem to be
> related)

FWIW, at least for me, the repro.c with sched_setattr commented out
(see the repro.c attached to a message[1] earlier in the thread) it
was reproducing reliably on a 2 CPU, 2 GB memory KVM using the
ext4.git tree (dev branch, 5.0-rc3 plus ext4 commits for the next
merge window) using a Debian stable-based VM[2].

[1] https://groups.google.com/d/msg/syzkaller-bugs/ByPpM3WZw1s/li7SsaEyAgAJ
[2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/root_fs.img.amd64

> But even with perfect repros machines still won't be
> able to tell in all cases that even though the hang happened in ext4
> code, the root cause is actually another scheduler-related system
> call. So thanks for looking into this.

To be clear, there was *not* a scheduler-related system call in the
repro.c I was playing with (see [2]); just perf_event_open(2) and
sendfile(2).

Cheers,

	      	      	    	 		    	- Ted