linux-kernel - Re: [syzbot] [xfs?] INFO: task hung in xfs_ail_push_all

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANp29Y6Rv2vUg463F3SYTsSNDr=Hmnarbz377tS=Hash7pT4xw@mail.gmail.com>
Date: Fri, 18 Oct 2024 12:13:33 +0200
From: Aleksandr Nogikh <nogikh@...gle.com>
To: Dave Chinner <david@...morbit.com>
Cc: syzbot <syzbot+611be8174be36ca5dbc9@...kaller.appspotmail.com>, cem@...nel.org, 
	chandan.babu@...cle.com, djwong@...nel.org, linux-kernel@...r.kernel.org, 
	linux-xfs@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [xfs?] INFO: task hung in xfs_ail_push_all_sync (2)

Hi Dave,

On Thu, Oct 17, 2024 at 2:53 AM 'Dave Chinner' via syzkaller-bugs
<syzkaller-bugs@...glegroups.com> wrote:
>
> On Wed, Oct 16, 2024 at 04:22:27PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    09f6b0c8904b Merge tag 'linux_kselftest-fixes-6.12-rc3' of..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14af3fd0580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=7cd9e7e4a8a0a15b
> > dashboard link: https://syzkaller.appspot.com/bug?extid=611be8174be36ca5dbc9
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16c7705f980000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14d2fb27980000
>

It's better to just leave the issue open until syzbot actually stops
triggering it. Otherwise, after every "#syz invalid", the crash will
be eventually seen again and re-sent to the mailing lists.

In the other email you mentioned
"/sys/fs/xfs/<dev>/error/metadata/EIO/max_retries" as the only way to
prevent this hang. Must max_retries be set every time after xfs is
mounted? Or is it possible to somehow preconfigure it once at VM boot
and then no longer worry about it during fuzzing?

> I explained this last time syzbot triggered this: this is a syzbot
> configuration problem, not a filesystem bug.
>
> [   96.418071][ T5112] XFS (loop0): Mounting V5 Filesystem c496e05e-540d-4c72-b591-04d79d8b4eeb
> [   96.593743][ T5112] XFS (loop0): Ending clean mount
> [   96.791357][ T5112] loop0: detected capacity change from 32768 to 0
> [   96.814808][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.814808][ T5127] loop0: rw=4097, sector=2, nr_sectors = 1 limit=0
> [   96.851235][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.851235][ T5127] loop0: rw=4097, sector=24, nr_sectors = 8 limit=0
> [   96.860284][    T9] XFS (loop0): metadata I/O error in "xfs_buf_ioerror_alert_ratelimited+0x7b/0x1e0" at daddr 0x2 len 1 error 5
> [   96.886045][    T9] kworker/0:1: attempt to access beyond end of device
> [   96.886045][    T9] loop0: rw=4097, sector=2, nr_sectors = 1 limit=0
> [   96.900489][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.900489][ T5127] loop0: rw=4097, sector=32, nr_sectors = 8 limit=0
> [   96.932892][    T9] kworker/0:1: attempt to access beyond end of device
> [   96.932892][    T9] loop0: rw=4097, sector=24, nr_sectors = 8 limit=0
> [   96.940364][ T5127] xfsaild/loop0: attempt to access beyond end of device
> [   96.940364][ T5127] loop0: rw=4097, sector=8832, nr_sectors = 64 limit=0
> .....
>
> And so it goes until something tries to freeze the filesystem and
> gets stuck waiting for writeback of metadata that is not making
> progress because XFS defaults to -retry metadata write errors
> forever- until the filesystem is shut down.
>
> If the user expects an XFS filesystem to fail fast when they
> accidentally shrink the block device under a mounted filesytem, then
> they need to configure XFS to fail metadata IO fast. Otherwise
> metadata will remain dirty and be retried until the filesystem is
> shut down or the error behaviour is reconfigured.
>
> Please fix your syzbot configurations and/or tests that screw with
> the block device under filesystems to configure XFS filesystems to
> fail fast so that these tests no longer generate unwanted noise.
>
> #syz invalid
>
> -Dave.
> --
> Dave Chinner
> david@...morbit.com
>

-- 
Aleksandr