lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 5 Oct 2015 11:04:35 -0700
From:	Dave Hansen <dave.hansen@...el.com>
To:	Theodore Ts'o <tytso@....edu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [REGRESSION] 998ef75ddb and aio-dio-invalidate-failure w/
 data=journal

I managed to catch the condition in an ftrace.  Full spew is below.

We can see that the iov_iter_copy_from_user_atomic() "failed" and ended
up with a copied=0 which we can see in the ext4_journalled_write_end()
tracepoint as "copied 0".

So we're in this code with copied=0 and len=4096:

> static int ext4_journalled_write_end(struct file *file, ... unsigned copied, ...
> {
...

>         else {
>                 if (copied < len) {
>                         if (!PageUptodate(page))
>                                 copied = 0;
>                         page_zero_new_buffers(page, from+copied, to);
>                 }
> 
>                 ret = ext4_walk_page_buffers(handle, page_buffers(page), from,
>                                              to, &partial, write_end_fn);
>                 if (!partial)
>                         SetPageUptodate(page);
>         }


The warning comes out of ext4_walk_page_buffers() and the dirty state
comes from page_zero_new_buffers(). That seems a _bit_ goofy that the
filesystem is marking the page dirty and then so shortly warning about it.

If we either come in here with copied=0 or force it (when
!PageUptodate()) we are essentially telling the caller that its write
must be repeated. We have no data in the buffer to preserve and we are
not leaking any data we are not marking it Uptodate.

The very lightly tested attached patch gets rid of the "Spotted dirty
metadata buffer" messages that I've been getting testing like this:

	mount -o data=journal /dev/sda1 /mnt/sda1
	TEST_DIR=/mnt/sda1 TEST_DEV=/dev/sda1 ./tests/generic/208

It _seems_ like a relatively sane thing to do.

>>  2)  aio-dio-3204  |               |              ext4_journalled_write_end() {
>>  2)  aio-dio-3204  |               |                /* ext4_journalled_write_end: dev 8,1 ino 12 pos 0 len 4096 copied 0 */
>>  2)  aio-dio-3204  |               |                page_zero_new_buffers() {
>>  2)  aio-dio-3204  |               |                  mark_buffer_dirty() {
>>  2)  aio-dio-3204  |               |                    /* block_dirty_buffer: 8,1 sector=34356 size=4096 */
>>  2)  aio-dio-3204  |   0.025 us    |                    page_mapping();
>>  2)  aio-dio-3204  |               |                    __set_page_dirty.constprop.46() {
>>  2)  aio-dio-3204  |   0.026 us    |                      _raw_spin_lock_irqsave();
>>  2)  aio-dio-3204  |               |                      account_page_dirtied() {
>>  2)  aio-dio-3204  |               |                        /* writeback_dirty_page: bdi 8:0: ino=12 index=0 */
>>  2)  aio-dio-3204  |               |                        __inc_zone_page_state() {
>>  2)  aio-dio-3204  |   0.030 us    |                          __inc_zone_state();
>>  2)  aio-dio-3204  |   0.211 us    |                        }
>>  2)  aio-dio-3204  |               |                        __inc_zone_page_state() {
>>  2)  aio-dio-3204  |   0.024 us    |                          __inc_zone_state();
>>  2)  aio-dio-3204  |   0.211 us    |                        }
>>  2)  aio-dio-3204  |   1.085 us    |                      }
>>  2)  aio-dio-3204  |   0.033 us    |                      _raw_spin_unlock_irqrestore();
>>  2)  aio-dio-3204  |   1.727 us    |                    }
>>  2)  aio-dio-3204  |               |                    __mark_inode_dirty() {
>>  2)  aio-dio-3204  |               |                      /* writeback_mark_inode_dirty: bdi 8:0: ino=12 state=I_DIRTY_SYNC|I_DIRTY_DATASYNC|I_DIRTY_PAGES flags=I_DIRTY_PAGES */
>>  2)  aio-dio-3204  |   0.207 us    |                    }
>>  2)  aio-dio-3204  |   2.600 us    |                  }
>>  2)  aio-dio-3204  |   2.878 us    |                }
>>  2)  aio-dio-3204  |               |                ext4_walk_page_buffers() {
>>  2)  aio-dio-3204  |               |                  write_end_fn() {
>>  2)  aio-dio-3204  |               |                    __ext4_handle_dirty_metadata() {
>>  2)  aio-dio-3204  |   0.024 us    |                      _cond_resched();
>>  2)  aio-dio-3204  |               |                      jbd2_journal_dirty_metadata() {
>>  2)  aio-dio-3204  |   0.025 us    |                        _raw_spin_lock();
>>  2)  aio-dio-3204  |               |                        __jbd2_journal_file_buffer() {
>>  2)  aio-dio-3204  |               |                          warn_dirty_buffer.isra.10() {
>>  2)  aio-dio-3204  |               |                            bdevname() {
>>  2)  aio-dio-3204  | + 15.280 us   |                              disk_name();
>>  2)  aio-dio-3204  | + 15.661 us   |                            }
>>  2)  aio-dio-3204  |               |                            /* ^A4JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 34356). There's a risk of filesystem corruption in case of system crash. */
>>  2)  aio-dio-3204  |               |                            bdevname() {
>>  2)  aio-dio-3204  |   0.080 us    |                              disk_name();
>>  2)  aio-dio-3204  |   0.298 us    |                            }


View attachment "ext4-regression-2015-10-05-0.patch" of type "text/x-patch" (982 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ