linux-kernel - Re: Linux 3.1-rc7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFwf3iOwyDDxjKbJ0fs=fy+-8mkoFHdgsEq32nAkerXS4g@mail.gmail.com>
Date:	Wed, 28 Sep 2011 08:47:15 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Arnaud Lacombe <lacombar@...il.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 3.1-rc7

On Tue, Sep 27, 2011 at 10:34 PM, Arnaud Lacombe <lacombar@...il.com> wrote:
>
> <off-topic>
> Speaking of corruption, I'm encountering another set on an external
> hard-drive, connected through USB.

I don't think it's unrelated or off-topic.

>     The same corruption pop up (at least in those text file): a
> sequence of 4 bytes is replaced by 0x000000E0 at offset 0x1E4 of the
> start of the file for some of them, 0x3E4 for two other (same
> corruption though). Locating the corruption will be more tricky in
> binary files.

So it's possible that it's some rogue kernel pointer. We've certainly
had those before. Constants offsets like that happen with some
structure allocation that just happens to be say 1kB in size, and the
rogue kernel pointer assigns at a fixed offset to something that has
already been free'd.

You might want to try to compile the kernel with SLUB_DEBUG_ON set,
and possibly also DEBUG_PAGEALLOC.

HOWEVER. It's quite possible that it's hardware too.

> I may not trust the drive, but the fact that only known offset are
> corrupted (in text files), the exact same way, sounds too much of a
> coincidence. Anyway, I started a long SMART self test to see if it
> catches anything, as there was no DMA transfer error[0].

It *could* be the disk, but it's much more likely to be something like
memory or a bad cable. Which wouldn't show up with SMART, since that
just tests internal disk issues.

Do you get some occasional random SIGSEGV's too?

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/