linux-kernel - [patch] fix data corruption bug in __block_write_full

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061229121946.GA17837@elte.hu>
Date:	Fri, 29 Dec 2006 13:19:46 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...l.org>
Cc:	Segher Boessenkool <segher@...nel.crashing.org>,
	David Miller <davem@...emloft.net>, nickpiggin@...oo.com.au,
	kenneth.w.chen@...el.com, guichaz@...oo.fr, hugh@...itas.com,
	linux-kernel@...r.kernel.org, ranma@...edrich.de,
	gordonfarquharson@...il.com, akpm@...l.org, a.p.zijlstra@...llo.nl,
	tbm@...ius.com, arjan@...radead.org, andrei.popa@...eo.ro
Subject: [patch] fix data corruption bug in __block_write_full_page()

* Linus Torvalds <torvalds@...l.org> wrote:

> I do have a few interesting details from the trace I haven't really 
> analyzed yet. Here's the trace for events on one of the pages that was 
> corrupted. Note how the events are numbered (there were 171640 events 
> total), so the thing you see is just a small set of events from the 
> whole big trace, but it's the ones that talk about _that_ particular 
> page.

i've extended the tracer in -rt to trace all relevant pagetable, 
pagecache, buffer-cache and IO events and coupled the tracer to your 
test.c code. The corruption happens here:

    test-2126  0.... 3756170us+: trace_page (cf20ebd8 b6a2c000 0)
 pdflush-2006  0.... 6432909us+: trace_page (cf20ebd8 b6a2c000 4200420)
    test-2126  0.... 8135596us+: trace_page (cf20ebd8 b6a2c000 4200420)
    test-2126  0D... 9012933us+: do_page_fault (8048900 4 b6a2c000)
    test-2126  0.... 9023278us+: trace_page (cf262f24 b6a2c000 0)
    test-2126  0.... 9023305us > sys_prctl (000000d8 b6a2c000 000000ac)

address 0xb6a2c000 is the one that shows the corruption. Now, this 
address is mapped to page cf262f24 when the bug happened, but it had 
page 0xcf20ebd8 mapped to it 3 seconds ago, which has this history:

    test-2126  0.... 3756413us+: trace_page (cf20ebd8 0 0)
    test-2126  0.... 3756469us+: trace_page (cf20ebd8 0 0)
    test-2126  0.... 3757341us+: trace_page (cf20ebd8 10 0)
  IRQ-14-402   0.... 3759332us+: trace_page (cf20ebd8 ffffffff 0)
  IRQ-14-402   0.... 3759376us+: trace_page (cf20ebd8 ffffffff 0)
    test-2126  0.... 5104662us+: trace_page (cf20ebd8 b6a2c400 0)
    test-2126  0.... 5104687us+: trace_page (cf20ebd8 1 0)
 pdflush-2006  0.... 6432909us+: trace_page (cf20ebd8 b6a2c000 4200420)
 pdflush-2006  0.... 6432952us+: trace_page (cf20ebd8 ffffffff 4200420)
 pdflush-2006  0.... 6432986us+: trace_page (cf20ebd8 1 4200420)
 pdflush-2006  0.... 6433022us+: trace_page (cf20ebd8 4096 4200420)
 pdflush-2006  0.... 6433061us+: trace_page (cf20ebd8 0 4200420)
 pdflush-2006  0.... 6433112us+: trace_page (cf20ebd8 0 4200420)
 pdflush-2006  0.... 6433154us+: trace_page (cf20ebd8 0 4200420)
 pdflush-2006  0.... 6433303us+: trace_page (cf20ebd8 11 4200420)
 pdflush-2006  0.... 6433343us+: trace_page (cf20ebd8 13 4200420)
 pdflush-2006  0.... 6433382us+: trace_page (cf20ebd8 14 4200420)
 pdflush-2006  0.... 6433421us+: trace_page (cf20ebd8 15 4200420)
 pdflush-2006  0.... 6433460us+: trace_page (cf20ebd8 ffffffff 4200420)
 pdflush-2006  0.... 6433504us+: trace_page (cf20ebd8 ffffffff 4200420)
    test-2126  0.... 8135596us+: trace_page (cf20ebd8 b6a2c000 4200420)

in particular timestamp 6433421us is interesting:

 pdflush-2006  0.... 6433504us+: trace_page (cf20ebd8 ffffffff 4200420)
 pdflush-2006  0.... 6433526us : trace_page()<-test_clear_page_writeback()<-end_page_writeback()<-__block_write_full_page()
 pdflush-2006  0.... 6433526us+: block_write_full_page()<-ext3_ordered_writepage()<-generic_writepages()<-(-1)()

i.e. the page got its pending writeback cancelled in 
block_write_full_page(), without any IRQ#14 activity whatsoever! That 
looks quite suspect. It is this piece of code in 
__block_write_full_page():

                /*
                 * The page was marked dirty, but the buffers were
                 * clean.  Someone wrote them back by hand with
                 * ll_rw_block/submit_bh.  A rare case.
                 */
                ....
                if (uptodate)
                        SetPageUptodate(page);
                end_page_writeback(page);

A 'rare case' ... hm. So i tried a quick workaround below, just to keep 
us from marking the page clean, to see whether the corruption goes away 
- and i was unable to trigger the corruption after half an hour of 
testing, while before it triggered within 10 seconds!

now this patch is only an ugly hack, but the bug definitely seems to be 
related to buffer management, as you suspected.

	Ingo

---
 fs/buffer.c |    1 +
 1 file changed, 1 insertion(+)

Index: linux/fs/buffer.c
===================================================================
--- linux.orig/fs/buffer.c
+++ linux/fs/buffer.c
@@ -1702,6 +1702,7 @@ done:
 		} while (bh != head);
 		if (uptodate)
 			SetPageUptodate(page);
+		set_page_dirty(page);
 		end_page_writeback(page);
 		/*
 		 * The page and buffer_heads can be released at any time from
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/