linux-kernel - Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200710221259.27989.bs@q-leap.de>
Date:	Mon, 22 Oct 2007 12:59:27 +0200
From:	Bernd Schubert <bs@...eap.de>
To:	Soeren Sonnenburg <kernel@....de>
Cc:	Tejun Heo <htejun@...il.com>, linux-ide@...r.kernel.org,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Jeff Garzik <jeff@...zik.org>
Subject: Re: sata sil3114 vs. certain seagate drives results in filesystem corruptions

On Monday 22 October 2007 12:36:32 Soeren Sonnenburg wrote:
> On Mon, 2007-10-22 at 11:48 +0200, Bernd Schubert wrote:
> > Hello,
> >
> > On Monday 22 October 2007 04:12:44 Tejun Heo wrote:
> > > Helo,
> > > [...]
> > >
> > > > Now when I write large files of zeros to root(sda&sdb) and read the
> > > > file back in it contains a few nonzero entries:
> > > >
> > > > # dd if=/dev/zero of=/foo bs=1M count=2000
> > > > # hexdump /foo
> > > > 0000000 0000 0000 0000 0000 0000 0000 0000 0000
> > > > *
> > > > <after >1GB random parts, within large blocks of zeroes>
> > > >
> > > > I can reliably trigger this on the md0 / devmapper-root setup when I
> > > > write about 2GB of data (note that this machine has 1.5G of memory -
> > > > and still 1GB is often enough to see this problem). Here it does not
> > > > matter where in the filesystem I do these writes.
> >
> > Thats almost the same test as I'm always doing. Only I do not write only
> > 2GB,
>
> Well when I read your mail I thought that I could be seeing exactly the
> same bug... it still may be. However ``my'' problem does not go away
> with the mod15fix ...

Yeah, pity it did not fix it :( I will try to port Tejuns patch 
(http://home-tj.org/wiki/index.php/Sil_m15w#Patches) to 2.6.23 today or 
tomorrow. If you are testing anyway, could you then also try this?

>
> > but as much as it fits onto the disk. On reading back this file, the
> > filesystem will report errors somewhere between 50GB and 230GB (disk size
> > is 250GB).
>
> Wow, I really see lots of corruptions (well every 1-2 GB a couple of
> bytes are corrupted). Are you getting similiarly many in the 50G - 230G
> region?
>
> > > Thanks.  I'll try to reproduce the problem here.  What's your
> > > motherboard?
> >
> > All tested S2882 boards here.
>
> I assume all equipped with lots of memory and mostly empty pci slots?

Yes, all pci-slots are free and the systems to have between 4 and 16GB memory 
(ecc, monitored with edac). Well, those are cluster systems (actually tyan 
names those B2882).
Do you think the configuration is related? Here it also happens with odirect, 
we tested this to minimize memory effects.


Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/