lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0612190929240.3483@woody.osdl.org>
Date:	Tue, 19 Dec 2006 09:43:00 -0800 (PST)
From:	Linus Torvalds <torvalds@...l.org>
To:	Nick Piggin <nickpiggin@...oo.com.au>
cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...l.org>, andrei.popa@...eo.ro,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Hugh Dickins <hugh@...itas.com>,
	Florian Weimer <fw@...eb.enyo.de>,
	Marc Haber <mh+linux-kernel@...schlus.de>,
	Martin Michlmayr <tbm@...ius.com>
Subject: Re: 2.6.19 file content corruption on ext3



Btw,
 here's a totally new tangent on this: it's possible that user code is 
simply BUGGY. 

There is one case where the kernel actually forcibly writes zeroes into a 
file: when we're writing a page that straddles the "inode->i_size" 
boundary. See the various writepages in fs/buffer.c, they all contain 
variations on that theme (although most of them aren't as well commented 
as this snippet):

        /*
         * The page straddles i_size.  It must be zeroed out on each and every
         * writepage invocation because it may be mmapped.  "A file is mapped
         * in multiples of the page size.  For a file that is not a multiple of
         * the  page size, the remaining memory is zeroed when mapped, and
         * writes to that region are not written out to the file."
         */
        kaddr = kmap_atomic(page, KM_USER0);
        memset(kaddr + offset, 0, PAGE_CACHE_SIZE - offset);
        flush_dcache_page(page);
        kunmap_atomic(kaddr, KM_USER0);

Now, this should _matter_ only for user processes that are buggy, and that 
have written to the page _before_ extending it with ftruncate(). That's 
definitely a serious bug, but it's one that can do totally undetected 
depending on when the actual write-out happens.

So what I'm saying is that if we end up writing things earlier thanks to 
the more aggressive dirty-page-management thing in 2.6.19, we might 
actually just expose a long-time userspace bug that was just a LOT harder 
to trigger before..

I'm not saying this is the cause of all this, but we've been tearing our 
hair out, and it migth be worthwhile trying this really really really 
stupid patch that will notice when that happens at truncate() time, and 
tell the user that he's a total idiot. Or something to that effect.

Maybe the reason this is so easy to trigger with rtorrent is not because 
rtorrent does some magic pattern that triggers a kernel bug, but simply 
because rtorrent itself might have a bug.

Ok, so it's a long shot, but it's still worth testing, I suspect. The 
patch is very simple: whenever we do an _expanding_ truncate, we check the 
last page of the _old_ size, and if there were non-zero contents past the 
old size, we complain.

As an attachement is a test-program that _should_ trigger a 
kernel message like

	a.out: BADNESS: truncate check 17000

for good measure, just so that you can verify that the patch works and 
actually catches this case.

(The 17000 number is just the one-hundred _invalid_ 0xaa bytes - out of 
the 200 we wrote - that were summed up: 100*0xaa == 17000. Anything 
non-zero is always a bug).

I doubt this is really it, but it's worth trying. If you fill out a page, 
and only do "ftruncate()" in response to SIGBUS messages (and don't 
truncate to whole pages), you could potentially see zeroes at the end of 
the page exactly because _writeout_ cleared the page for you! So it 
_could_ explain the symptoms, but only if user-space was horribly horribly 
broken.

		Linus

----
diff --git a/mm/memory.c b/mm/memory.c
index c00bac6..79cecab 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1842,6 +1842,33 @@ void unmap_mapping_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
+static void check_last_page(struct address_space *mapping, loff_t size)
+{
+	pgoff_t index;
+	unsigned int offset;
+	struct page *page;
+
+	if (!mapping)
+		return;
+	offset = size & ~PAGE_MASK;
+	if (!offset)
+		return;
+	index = size >> PAGE_SHIFT;
+	page = find_lock_page(mapping, index);
+	if (page) {
+		unsigned int check = 0;
+		unsigned char *kaddr = kmap_atomic(page, KM_USER0);
+		do {
+			check += kaddr[offset++];
+		} while (offset < PAGE_SIZE);
+		kunmap_atomic(kaddr,KM_USER0);
+		unlock_page(page);
+		page_cache_release(page);
+		if (check)
+			printk("%s: BADNESS: truncate check %u\n", current->comm, check);
+	}
+}
+
 /**
  * vmtruncate - unmap mappings "freed" by truncate() syscall
  * @inode: inode of the file used
@@ -1875,6 +1902,7 @@ do_expand:
 		goto out_sig;
 	if (offset > inode->i_sb->s_maxbytes)
 		goto out_big;
+	check_last_page(mapping, inode->i_size);
 	i_size_write(inode, offset);
 
 out_truncate:
View attachment "test.c" of type "TEXT/PLAIN" (566 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ