[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C65331A.9050203@redhat.com>
Date: Fri, 13 Aug 2010 07:57:14 -0400
From: Eric Sandeen <sandeen@...hat.com>
To: Evan Jones <evanj@....EDU>
CC: linux-ext4@...r.kernel.org
Subject: Re: Intel SSD data loss: Any possible way this is user / software
error?
Evan Jones wrote:
> I'm testing a few systems that attempt to log data to disk reliably. I
> bought a brand new Intel SSD (X25-M G2) for this purpose. It appears to
> me that this disk does *not* store data reliably when there are power
> failures, even with write barriers, even with the cache disabled. I'm
> surprised that this disk might be this broken (possible), but it may
> also mean I've made a mistake. Is there any possible way that I have a
> bug in the test described below? The test works as expected with a
> couple SATA magnetic disks.
>
>
> Configuration:
>
> * Linux 2.6.32 (a distributed with Ubuntu 10.04)
> * SATA SSD directly attached to the system's built-in controller (Intel
> N10/ICH7)
> * ext4 with default options (meaning barrier=1)
> * Disable the write cache (hdparm -W 0 /dev/sdb)
Just out of curiosity, what do you see when the write cache is on?
Seems counter-intuitive that it'd work better, but talking w/
Ric Wheeler, he was curious... maybe Intel didn't test with the
write cache off?
Also, would you be willing to publish the test you're using?
Thanks,
-Eric
>
> The test:
>
> 1. Write a 64 MB file of zeros (first use fallocate, then zero fill)
> 2. fsync()
> 3. write() blocks of this file with a sequence number.
> 4. fdatasync()
> 5. Send UDP packet reporting the sequence number written.
> 6. Go to 3.
>
> While this test is running, I pull the power out of the drive to
> simulate a hard failure. On the magnetic disks I have, this works as
> expected: On reboot, the log file contains the complete record that was
> reported as last written (it may also contain part of the next record).
>
> On the X25-M, when I use large writes (128 kB), it loses data fairly
> frequently (every couple attempts): I either see the last log record as
> being before the reported one, or occasionally I get a media error when
> reading back the file.
>
> I'm surprised that this disk could be this broken, but I suppose it is
> possible. Any help is welcomed. Thanks,
>
> Evan Jones
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists