[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-id: <459C3F29.2@shaw.ca>
Date: Wed, 03 Jan 2007 17:41:29 -0600
From: Robert Hancock <hancockr@...w.ca>
To: Christoph Anton Mitterer <calestyo@...entia.net>
Cc: linux-kernel@...r.kernel.org
Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory
hole mapping related bug?!
Christoph Anton Mitterer wrote:
> Hi.
>
> Perhaps some of you have read my older two threads:
> http://marc.theaimsgroup.com/?t=116312440000001&r=1&w=2 and the even
> older http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2
>
> The issue was basically the following:
> I found a severe bug mainly by fortune because it occurs very rarely.
> My test looks like the following: I have about 30GB of testing data on
> my harddisk,... I repeat verifying sha512 sums on these files and check
> if errors occur.
> One test pass verifies the 30GB 50 times,... about one to four
> differences are found in each pass.
>
> The corrupted data is not one single completely wrong block of data or
> so,.. but if you look at the area of the file where differences are
> found,.. than some bytes are ok,.. some are wrong,.. and so on (seems to
> be randomly).
>
> Also, there seems to be no event that triggers the corruption,.. it
> seems to be randomly, too.
>
> It is really definitely not a harware issue (see my old threads my
> emails to Tyan/Hitachi and my "workaround" below. My system isn't
> overclocked.
>
>
>
> My System:
> Mainboard: Tyan S2895
> Chipsets: Nvidia nforce professional 2200 and 2050 and AMD 8131
> CPU: 2x DualCore Opterons model 275
> RAM: 4GB Kingston Registered/ECC
> Diskdrives: IBM/Hitachi: 1 PATA, 2 SATA
>
>
> The data corruption error occurs on all drives.
>
>
> You might have a look at the emails between me and Tyan and Hitachi,..
> they contain probalby lots of valuable information (especially my
> different tests).
>
>
>
> Some days ago,.. an engineer of Tyan suggested me to boot the kernel
> with mem=3072M.
> When doing this,.. the issue did not occur (I don't want to say it was
> solved. Why? See my last emails to Tyan!)
> Then he suggested me to disable the memory hole mapping in the BIOS,...
> When doing so,.. the error doesn't occur, too.
> But I loose about 2GB RAM,.. and,.. more important,.. I cant believe
> that this is responsible for the whole issue. I don't consider it a
> solution but more a poor workaround which perhaps only by fortune solves
> the issue (Why? See my last eMails to Tyan ;) )
>
>
>
> So I'd like to ask you if you perhaps could read the current information
> in this and previous mails,.. and tell me your opinions.
> It is very likely that a large number of users suffer from this error
> (namely all Nvidia chipset users) but only few (there are some,.. I
> found most of them in the Nvidia forums,.. and they have exactly the
> same issue) identify this as an error because it's so rare.
>
> Perhaps someone have an idea why disabling the memhole mapping solves
> it. I've always thought that memhole mapping just moves some address
> space to higher addreses to avoid the conflict between address space for
> PCI devices and address space for pyhsical memory.
> But this should be just a simple addition and not solve this obviously
> complex error.
If this is related to some problem with using the GART IOMMU with memory
hole remapping enabled, then 2.6.20-rc kernels may avoid this problem on
nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA
controller are concerned as the sata_nv driver now supports 64-bit DMA
on these chipsets and so no longer requires the IOMMU.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@...pamshaw.ca
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists