lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGRrVHxhK9Bsk42m5D2LEiVzoWxbK7Z5=FGynLcbqsX-5iWT0g@mail.gmail.com>
Date:   Thu, 14 Mar 2019 14:03:08 -0600
From:   Ross Zwisler <zwisler@...gle.com>
To:     linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
        Jan Kara <jack@...e.com>, Jens Axboe <axboe@...nel.dk>,
        linux-block@...r.kernel.org
Cc:     Ross Zwisler <zwisler@...nel.org>
Subject: question about writeback

Hi,

I'm trying to understand a failure I'm seeing with both v4.14 and
v4.19 based kernels, and I was hoping you could point me in the right
direction.

What seems to be happening is that under heavy I/O we get into a
situation where for a given inode/mapping we eventually reach a steady
state where one task is continuously dirtying pages and marking them
for writeback via ext4_writepages(), and another task is continuously
completing I/Os via ext4_end_bio() and clearing the
PAGECACHE_TAG_WRITEBACK flags.  So, we are making forward progress as
far as I/O is concerned.

The problem is that another task calls filemap_fdatwait_range(), and
that call never returns because it always finds pages that are tagged
for writeback.  I've added some prints to __filemap_fdatawait_range(),
and the total number of pages tagged for writeback seems pretty
constant.  It goes up and down a bit, but does not seem to move
towards 0.  If we halt I/O the system eventually recovers, but if we
keep I/O going we can block the task waiting in
__filemap_fdatawait_range() long enough for the system to reboot due
to what it perceives as hung task.

My question is: Is there some mechanism that is supposed to prevent
this sort of situation?  Or is it expected that with slow enough
storage and a high enough I/O load, we could block inside of
filemap_fdatawait_range() indefinitely since we never run out of dirty
pages that are marked for writeback?

Thanks,
- Ross

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ