linux-kernel - Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-id: <459C3F29.2@shaw.ca>
Date:	Wed, 03 Jan 2007 17:41:29 -0600
From:	Robert Hancock <hancockr@...w.ca>
To:	Christoph Anton Mitterer <calestyo@...entia.net>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: data corruption with nvidia chipsets and IDE/SATA drives // memory
 hole mapping related bug?!

Christoph Anton Mitterer wrote:
> Hi.
> 
> Perhaps some of you have read my older two threads:
> http://marc.theaimsgroup.com/?t=116312440000001&r=1&w=2 and the even
> older http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2
> 
> The issue was basically the following:
> I found a severe bug mainly by fortune because it occurs very rarely.
> My test looks like the following: I have about 30GB of testing data on
> my harddisk,... I repeat verifying sha512 sums on these files and check
> if errors occur.
> One test pass verifies the 30GB 50 times,... about one to four
> differences are found in each pass.
> 
> The corrupted data is not one single completely wrong block of data or
> so,.. but if you look at the area of the file where differences are
> found,.. than some bytes are ok,.. some are wrong,.. and so on (seems to
> be randomly).
> 
> Also, there seems to be no event that triggers the corruption,.. it
> seems to be randomly, too.
> 
> It is really definitely not a harware issue (see my old threads my
> emails to Tyan/Hitachi and my "workaround" below. My system isn't
> overclocked.
> 
> 
> 
> My System:
> Mainboard: Tyan S2895
> Chipsets: Nvidia nforce professional 2200 and 2050 and AMD 8131
> CPU: 2x DualCore Opterons model 275
> RAM: 4GB Kingston Registered/ECC
> Diskdrives: IBM/Hitachi: 1 PATA, 2 SATA
> 
> 
> The data corruption error occurs on all drives.
> 
> 
> You might have a look at the emails between me and Tyan and Hitachi,..
> they contain probalby lots of valuable information (especially my
> different tests).
> 
> 
> 
> Some days ago,.. an engineer of Tyan suggested me to boot the kernel
> with mem=3072M.
> When doing this,.. the issue did not occur (I don't want to say it was
> solved. Why? See my last emails to Tyan!)
> Then he suggested me to disable the memory hole mapping in the BIOS,...
> When doing so,.. the error doesn't occur, too.
> But I loose about 2GB RAM,.. and,.. more important,.. I cant believe
> that this is responsible for the whole issue. I don't consider it a
> solution but more a poor workaround which perhaps only by fortune solves
> the issue (Why? See my last eMails to Tyan ;) )
> 
> 
> 
> So I'd like to ask you if you perhaps could read the current information
> in this and previous mails,.. and tell me your opinions.
> It is very likely that a large number of users suffer from this error
> (namely all Nvidia chipset users) but only few (there are some,.. I
> found most of them in the Nvidia forums,.. and they have exactly the
> same issue) identify this as an error because it's so rare.
> 
> Perhaps someone have an idea why disabling the memhole mapping solves
> it. I've always thought that memhole mapping just moves some address
> space to higher addreses to avoid the conflict between address space for
> PCI devices and address space for pyhsical memory.
> But this should be just a simple addition and not solve this obviously
> complex error.

If this is related to some problem with using the GART IOMMU with memory 
hole remapping enabled, then 2.6.20-rc kernels may avoid this problem on 
nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA 
controller are concerned as the sata_nv driver now supports 64-bit DMA 
on these chipsets and so no longer requires the IOMMU.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@...pamshaw.ca
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/