lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 17 Jun 2008 16:00:10 +1000
From:	"Bron Gondwana" <brong@...tmail.fm>
To:	"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
	"Nick Piggin" <npiggin@...e.de>,
	"Andrew Morton" <akpm@...ux-foundation.org>,
	"Linus Torvalds" <torvalds@...ux-foundation.org>
Cc:	" Rob Mueller" <robm@...tmail.fm>
Subject: BUG: mmapfile/writev spurious zero bytes (x86_64/not i386, bisected,
 reproducable)

Background: we recently upgraded one of our 64bit kernel
machines to 2.6.25 and discovered that Cyrus skiplist
files were becoming randomly corrupted.  This is a machine
with a 32bit Debian Etch userland but a 64bit kernel with
32bit support.

We run this way so we can actually use the 12Gb memory as
cache without running out of inode space, but don't have
to support two different sets of userland across our
servers.


The symptom - 16 to 24 bytes of zero appearing "randomly"
within the file after a "checkpoint" (file rewrite,
skipping stale copies of records).

Further investigation by retaining the pre-checkpoint files
showed that those bytes were the last ones of a page in the
original file.


Attached is a small C program which recreates the actions
that Cyrus takes, and uses record lengths identical to a
known-broken skiplist file on our systems.

Using this I was able to bisect the kernel to find the
commit which caused the problem:


08291429cfa6258c4cd95d8833beb40f828b194e is first bad commit
commit 08291429cfa6258c4cd95d8833beb40f828b194e
Author: Nick Piggin <npiggin@...e.de>
Date:   Tue Oct 16 01:24:59 2007 -0700

    mm: fix pagecache write deadlocks

    Modify the core write() code so that it won't take a pagefault while holding a
    lock on the pagecache page. There are a number of different deadlocks possible
    if we try to do such a thing:

    [...]

    Signed-off-by: Nick Piggin <npiggin@...e.de>
    Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@...ux-foundation.org>


For Cyrus users, this is a really serious bug - occasionally
the zeros will hit a "navigational component" of the file,
causing crashes and being noticeable.  Most of the time
(including this example) it will just cause silent corruption
and data loss.

I suspect this will be visible to users as large swathes of
messages becoming unread, and if it hits the mailboxes.db,
large swathes of mailboxes just disappearing.  Not good.


I apologise for the length of the attached C program.  I tried
to make it shorter, but kept not tickling the bug.  There's also
another advantage of keeping it like this - it closely mirrors the
Cyrus behaviour, to the point where the output is a valid skiplist
file.

It also has a "magic" mode, just pass a second parameter.  It will
read through the mapped memory in order before the checkpoint.
This makes the bug disappear.

Let me know if there's anything else I can do to make this report
clearer.  I've had a quick glance around the code, but especially
since it's 64 bit kernel only bug (I tested by rebooting the test
machine with a 2.6.25.3 32 bit kernel I had lying around and the
bug was not visible there.  I've also repeated the test on my Ubuntu
desktop machine with the shipped kernel vs my hand-compiled
known-bad kernel, and tested with a 64bit userland in a chroot as
well).

Regards,

Bron.

-- 
  Bron Gondwana
  brong@...tmail.fm


View attachment "maptest.c" of type "text/x-csrc" (9823 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ