linux-ext4 - Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110505154348.GK5323@quack.suse.cz>
Date:	Thu, 5 May 2011 17:43:48 +0200
From:	Jan Kara <jack@...e.cz>
To:	Martin_Zielinski@...fee.com
Cc:	jack@...e.cz, tytso@....edu, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug

On Thu 05-05-11 09:55:22, Martin_Zielinski@...fee.com wrote:
> Hello once more.
> I have one concern against the patch:
> If the situation is triggered again and again, the patch would produce lots of output.
> Maybe it's better to use WARN_ONCE.
  Yes, probably it will. Changed to WARN_ONCE in the JBD patch.

								Honza
> -----Original Message-----
> From: Jan Kara [mailto:jack@...e.cz] 
> Sent: Mittwoch, 4. Mai 2011 23:55
> To: Zielinski, Martin
> Cc: tytso@....edu; jack@...e.cz; linux-ext4@...r.kernel.org
> Subject: Re: [PATCH 2/2] jbd: fix fsync() tid wraparound bug
> 
> On Wed 04-05-11 09:21:04, Martin_Zielinski@...fee.com wrote:
> > Here's an update.
> > In my first post I was not aware of the implementation of tid_gt.
> > I agree that 2 and a half billion commits on an SD card are - hmph -
> > unlikely
>   <snip>
> 
> > gdb) p *journal
> > $4 = {j_flags = 16, j_errno = 0, j_sb_buffer = 0xffff88031f156dc8, 
> >   j_superblock = 0xffff88031f876000, j_format_version = 2, j_state_lock = {raw_lock = {
> >       slock = 2874125135}}, j_barrier_count = 0, j_barrier = {count = {counter = 1}, wait_lock = {
> >       raw_lock = {slock = 0}}, wait_list = {next = 0xffff88031e6c4638, 
> >       prev = 0xffff88031e6c4638}, owner = 0x0}, j_running_transaction = 0x0, 
> >   j_committing_transaction = 0x0, j_checkpoint_transactions = 0xffff88031bd16b40,
> > ...
> > j_tail_sequence = 2288011385, j_transaction_sequence = 2288014068, 
> >   j_commit_sequence = 2288014067, j_commit_request = 140530417,
> > ...
> >   j_wbuf = 0xffff88031de98000, j_wbufsize = 512, j_last_sync_writer = 4568, 
> >   j_average_commit_time = 69247, j_private = 0xffff88031fd49400}
>   <snip>
> 
> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_sync_tid
> > $5 = {counter = -2006954411}
> > (gdb) p ((struct ext3_inode_info*)(0xffff88031f0c0758-0xd0))->i_datasync_tid
> > $3 = {counter = 140530417}
> > 
> > > j_commit_request = 140530417
> > 
> > So it *is* a datasync from sqlite. And your fix will catch it. 
> > I still don't understand, where this number comes from. 
>   Ok, so i_datasync_tid got corrupted. But look at the numbers in hex:
> i_datasync_tid==140530417==0x86052F1
> and
> i_commit_sequence==2288014067==0x886052F3
> 
> So it's a single bit error - we lost the highest bit of the number. Are you
> getting the cores from different machines? Otherwise I'd suspect the HW.
> If it's not HW I'm at loss what can cause it... You can try moving
> i_datasync_tid to a different place in struct ext3_inode_info so that we
> can rule out / confirm whether some code external to i_datasync_tid
> handling is just causing random memory corruption...
> 
> 								Honza
> -- 
> Jan Kara <jack@...e.cz>
> SUSE Labs, CR
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html