lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0612181648490.3479@woody.osdl.org>
Date:	Mon, 18 Dec 2006 16:57:30 -0800 (PST)
From:	Linus Torvalds <torvalds@...l.org>
To:	Andrei Popa <andrei.popa@...eo.ro>
cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...l.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Hugh Dickins <hugh@...itas.com>,
	Florian Weimer <fw@...eb.enyo.de>,
	Marc Haber <mh+linux-kernel@...schlus.de>,
	Martin Michlmayr <tbm@...ius.com>
Subject: Re: 2.6.19 file content corruption on ext3



On Tue, 19 Dec 2006, Andrei Popa wrote:
> > > 
> > > nope, no file corruption at all.
> > 
> > Ok. That's interesting, but I think you actually #ifdef'ed out too 
> > much:
> > 
> > It was really just the _inner_ "if (mapping_cap_account_dirty(.." 
> > statement that I meant you should remove.
> > 
> > Can you try that too?
> 
> I have file corruption: "Hash check on download completion found bad
> chunks, consider using "safe_sync"."

Ok, that's interesting.

So it doesn't seem to be the call to page_mkclean() itself that causes 
corruption. It looks like Peter's hunch that maybe there's some bug in 
PG_dirty handling _itself_ might be an idea..

And the reason it only started happening now is that it may just have been 
_hidden_ by the fact that while we kept the dirty bits in the page tables, 
we'd end up writing the dirty page _despite_ having lost the PG_dirty bit. 
So if it's some bad interaction between writable mappings and some other 
part of the system, we just didn't see it earlier, exactly because we had 
_lots_ of dirty bits, and it was enough that _one_ of them was right.

If you didn't see corruption when you #ifdef'ed out too much of the 
"test_clean_page_dirty() function (the _whole_ TestClearPageDirty() 
if-statement), but you get it when you just comment out the stuff that 
does the page_mkclean(), that's interesting.

I'm left lookin gat the "radix_tree_tag_clear()" in 
test_clear_page_dirty().

What happens if you only ifdef out that single thing? 

The actual page-cleaning functions make sure to only clear the TAG_DIRTY 
bit _after_ the page has been marked for writeback. Is there some ordering 
constraint there, perhaps?

I'm really reaching here. I'm trying to see the pattern, and I'm not 
seeing it. I'm asking you to test things just to get more of a feel for 
what triggers the failure, than because I actually have any kind of idea 
of what the heck is going on.

Andrew, Nick, Hugh - any ideas?

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ