lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CAGRrVHxhBbMX9W9OmftUPDq5R4koJNjJDDy2xPDF073GTVo6jw@mail.gmail.com> Date: Thu, 14 Mar 2019 14:37:55 -0600 From: Ross Zwisler <zwisler@...gle.com> To: Dave Chinner <david@...morbit.com> Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>, Jan Kara <jack@...e.com>, Jens Axboe <axboe@...nel.dk>, linux-block@...r.kernel.org, Ross Zwisler <zwisler@...nel.org> Subject: Re: question about writeback On Thu, Mar 14, 2019 at 2:18 PM Dave Chinner <david@...morbit.com> wrote: > On Thu, Mar 14, 2019 at 02:03:08PM -0600, Ross Zwisler wrote: > > Hi, > > > > I'm trying to understand a failure I'm seeing with both v4.14 and > > v4.19 based kernels, and I was hoping you could point me in the right > > direction. > > > > What seems to be happening is that under heavy I/O we get into a > > situation where for a given inode/mapping we eventually reach a steady > > state where one task is continuously dirtying pages and marking them > > for writeback via ext4_writepages(), and another task is continuously > > completing I/Os via ext4_end_bio() and clearing the > > PAGECACHE_TAG_WRITEBACK flags. So, we are making forward progress as > > far as I/O is concerned. > > > > The problem is that another task calls filemap_fdatwait_range(), and > > that call never returns because it always finds pages that are tagged > > for writeback. I've added some prints to __filemap_fdatawait_range(), > > and the total number of pages tagged for writeback seems pretty > > constant. It goes up and down a bit, but does not seem to move > > towards 0. If we halt I/O the system eventually recovers, but if we > > keep I/O going we can block the task waiting in > > __filemap_fdatawait_range() long enough for the system to reboot due > > to what it perceives as hung task. > > > > My question is: Is there some mechanism that is supposed to prevent > > this sort of situation? Or is it expected that with slow enough > > storage and a high enough I/O load, we could block inside of > > filemap_fdatawait_range() indefinitely since we never run out of dirty > > pages that are marked for writeback? > > SO your problem is that you are doing an extending write, and then > doing __filemap_fdatawait_range(end = LLONG_MAX), and while it > blocks on the pages under IO, the file is further extended and so > the next radix tree lookup finds more pages past that page under > writeback? > > i.e. because it is waiting for pages to complete, it never gets > ahead of the extending write or writeback and always ends up with > more pages to wait on and so never reached the end of the file as > directed? > > So perhaps the caller should be waiting on a specific range to bound > the wait (e.g. isize as the end of the wait) rather than using the > default "keep going until the end of file is reached" semantics? The call to __filemap_fdatawait_range() is happening via the jdb2 code: jbd2_journal_commit_transaction() journal_finish_inode_data_buffers() filemap_fdatawait_keep_errors() __filemap_fdatawait_range(end = LLONG_MAX) Would it have to be an extending write? Or could it work the same if you have one thread just moving forward through a very large file, dirtying pages, and the __filemap_fdatawait_range() call will just keep finding new pages as it moves forward through the big file? In either case, I think your description of the problem is correct. Is this just a "well, don't do that" type situation, or is this supposed to have a different result? - Ross
Powered by blists - more mailing lists