lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <495553EB.6030604@zappa.cx>
Date:	Fri, 26 Dec 2008 23:00:11 +0100
From:	Andreas Sundstrom <sunkan@...pa.cx>
To:	linux-kernel@...r.kernel.org
Subject: Re: 2.6.28 ext4, xen and lvm volume becomes ro after snapshot

Theodore Tso wrote:
> On Fri, Dec 26, 2008 at 07:48:22PM +0100, Andreas Sundstrom wrote:
>> Yes, I mounted it with ext3 and barrier=1 and could reproduce the problem.
>> ext3 did not remount the fs ro though, it seems to only disable barriers:
>>
>> [    7.681759] blkfront: xvda1: write barrier op failed
>> [    7.681776] blkfront: xvda1: barriers disabled
>> [    7.681785] end_request: I/O error, dev xvda1, sector 4584
>> [    7.681800] end_request: I/O error, dev xvda1, sector 4584
>> [    7.681886] JBD: barrier-based sync failed on xvda1 - disabling barriers
>>
>> And then I tested with ext4 and barrier=0 and that also works.
> 
> Ext4 has patches which will checks the error returns on writes to the
> journal, and will abort the journal in case of I/O failures.  Ext3
> should have the same patches, but it's apparently missing one of the
> patches, or it's otherwise not noticing the problem.  (You were
> testing ext3 on a 2.6.28 kernel, right?)

Yes it was the same kernel even.

> 
>> But I'm here if you want something tested or a patch verified or anything,
>> but I guess this might be a Xen issue rather than vanilla kernel stuff.
> 
> Yes, this looks very much like a Xen issue.  What is going on is that
> we submit the write with barriers enabled, and if it fails, we try
> again without barriers.  I'm guessing that Xen emulation code didn't
> notice that we were trying again without barriers, or the Xen
> emulation isn't clearing the error flag, but for whatever reason,
> we're getting a write failure somewhere else later on, and that's
> causing the failures.
> 
> What would be really useful to nail down exactly what is going on
> would be to patch fs/jbd/journal.c and fs/jbd2/journal.c so that the
> line:
> 
> u8 journal_enable_debug __read_mostly;
> 
> is changed to read:
> 
> u8 journal_enable_debug=3 __read_mostly;
> 
> 
> and similarly in fs/jbd2/journal.c, change:
> 
> u8 jbd2_journal_enable_debug __read_mostly;
> 
> to read
> 
> u8 jbd2_journal_enable_debug=3 __read_mostly;
> 
> That will generate a lot more debugging information, and hopefully we
> can see exactly what was going on right before the journal abort, and
> why ext4 apparently didn't get the corret error return after the
> barrier operation failed.
> 
> But yes, this ultimately seems very likely to be a Xen emulation bug.

Unfortunately it didn't compile with the above changes and I'm totally
crap at programming :(

But I enabled debugfs and did
"echo 3 > /sys/kernel/debug/jbd2/jbd2-debug" and reproduced the problem
by taking a snapshot while the system was live.
I hope this had the same effect as your proposed change.

I also put it on this URL to make it easier to read
http://pastebin.com/m7efd8eb3


Dec 26 22:38:57 192.168.20.33 kernel: imklog 3.18.6, log source =
/proc/kmsg started.
Dec 26 22:42:01 192.168.20.33 kernel: [  482.814386]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be000
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815139]
(fs/jbd2/transaction.c, 1267): jbd2_journal_stop: transaction too old,
requesting commit for handle c34be000

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815147] (fs/jbd2/journal.c,
449): __jbd2_log_start_commit: JBD: requesting commit 1549171/1549170

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815157] (fs/jbd2/journal.c,
546): jbd2_log_wait_commit: JBD: want 1549171, j_commit_sequence=1549170

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815168] (fs/jbd2/journal.c,
195): kjournald2: kjournald2 wakes
Dec 26 22:42:01 192.168.20.33 kernel: [  482.815174] (fs/jbd2/journal.c,
147): kjournald2: commit_sequence=1549170, commit_request=1549171

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815181] (fs/jbd2/journal.c,
150): kjournald2: OK, requests differ

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815188] (fs/jbd2/commit.c,
362): jbd2_journal_commit_transaction: superblock not updated

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815194] (fs/jbd2/commit.c,
374): jbd2_journal_commit_transaction: JBD: starting commit of
transaction 1549171

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815210]
(fs/jbd2/checkpoint.c, 762): __jbd2_journal_drop_transaction: Dropping
transaction 1549170, all done

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815218] (fs/jbd2/commit.c,
447): jbd2_journal_commit_transaction: JBD: commit phase 1

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815226] (fs/jbd2/commit.c,
465): jbd2_journal_commit_transaction: JBD: commit phase 2

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815235] (fs/jbd2/revoke.c,
520): jbd2_journal_write_revoke_records: Wrote 0 revoke records

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815242] (fs/jbd2/commit.c,
477): jbd2_journal_commit_transaction: JBD: commit phase 2

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815249] (fs/jbd2/commit.c,
703): jbd2_journal_commit_transaction: JBD: commit phase 3

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815255] (fs/jbd2/commit.c,
762): jbd2_journal_commit_transaction: JBD: commit phase 4

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815261] (fs/jbd2/commit.c,
792): jbd2_journal_commit_transaction: JBD: commit phase 5

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815369]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815389]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815415]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815425]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815490]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.815500]
(fs/jbd2/transaction.c, 111): start_this_handle: New handle c34be018
going live.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.816304] blkfront: xvda1:
write barrier op failed
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816320] blkfront: xvda1:
barriers disabled
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816330] end_request: I/O
error, dev xvda1, sector 5600
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816342] end_request: I/O
error, dev xvda1, sector 5600
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816380] Aborting journal on
device xvda1:8.
Dec 26 22:42:01 192.168.20.33 kernel: [  482.816389] (fs/jbd2/journal.c,
449): __jbd2_log_start_commit: JBD: requesting commit 1549172/1549170

Dec 26 22:42:01 192.168.20.33 kernel: [  482.816397] (fs/jbd2/journal.c,
1278): jbd2_journal_update_superblock: JBD: updating superblock (start
1, seq 1549157, errno -5)
Dec 26 22:42:01 192.168.20.33 kernel: [  482.821228] EXT4-fs error
(device xvda1) in ext4_reserve_inode_write: Journal has aborted

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836134] (fs/jbd2/commit.c,
812): jbd2_journal_commit_transaction: JBD: commit phase 6

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836142] (fs/jbd2/commit.c,
937): jbd2_journal_commit_transaction: JBD: commit phase 7

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836151]
(fs/jbd2/checkpoint.c, 762): __jbd2_journal_drop_transaction: Dropping
transaction 1549171, all done

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836158] (fs/jbd2/commit.c,
1005): jbd2_journal_commit_transaction: JBD: commit 1549171 complete,
head 1549157
Dec 26 22:42:01 192.168.20.33 kernel: [  482.836169] (fs/jbd2/journal.c,
147): kjournald2: commit_sequence=1549171, commit_request=1549172

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836175] (fs/jbd2/journal.c,
150): kjournald2: OK, requests differ

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836182] (fs/jbd2/commit.c,
362): jbd2_journal_commit_transaction: superblock not updated

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836189] (fs/jbd2/commit.c,
374): jbd2_journal_commit_transaction: JBD: starting commit of
transaction 1549172

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836205] journal commit I/O
error
Dec 26 22:42:01 192.168.20.33 kernel: [  482.836521] ext4_abort called.

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836531] EXT4-fs error
(device xvda1): ext4_journal_start_sb: Detected aborted journal

Dec 26 22:42:01 192.168.20.33 kernel: [  482.836548] Remounting
filesystem read-only
Dec 26 22:42:01 192.168.20.33 kernel: [  482.836685] ext4_da_writepages:
jbd2_start: 1642 pages, ino 81965; err -30

/Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ