lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 11 Nov 2006 17:35:49 +0100
From:	Christoph Anton Mitterer <calestyo@...entia.net>
To:	Sergey Vlasov <vsu@...linux.ru>
CC:	linux-kernel@...r.kernel.org
Subject: Re: unexplainable read errors, copy/diff-issue

Sergey Vlasov wrote:
>> http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2
>>
>>     
>
> So you have 4GB RAM, and most likely some memory is remapped above the
> 4GB address boundary.
Uhm don't know,.. I'm running an amd64 kernel and I've always thought 
there is no such boundaray.
But yes I have 4 GB.

> Could you show the full dmesg output after boot?
>   
Yes I will but you'll have to wait until monday or tuesday. I'm 
currently visiting my parents and have no access to my main PC :(


> Other things you can try:
>
>  - Boot with mem=3072M (or some larger value which is still less than
>    the amount of RAM below the 4GB boundary - the exact value could be
>    found from the dmesg output) and check whether you can reproduce the
>    corruption in this configuration.
>   
I'll do that as soon as I'm at home.


>  - Look in the BIOS setup for memory remapping options (Google indicates
>    that it may be called "Hammer Configuration/Memory Hole Mapping" on
>    this board).  Maybe you need to try different values (AFAIR there
>    were some complaints about unstabilities with software remapping;
>    cannot find the exact page now).
>   

I think I have correctly set these settings up.
As far as I can remember:
The Memhole Mapping was set to Hardware.
The IOMMU is enabled and the IOMMU memory was set to 64MB  (I "found 
this out" because for all values less than 64MB (i.e. 32) the Linux 
kernel complained.



Some other things that I remember now from my exhaustive testing:
-The error also occurred directly after a reboot (thus the file cache 
was empty) when running a script that went through all my test files and 
verified them with their sha512 sums.

- I once did the following,.. suddenly after diff found a difference I 
Ctrl-C'ed and copied the files to another location.
In this case the files were probably used from the cache, thus the error 
was really stored on disk.
I used vbindiff (hex differ) and seen that, in the differing range, not 
just all bytes were different,.. but some were ok, than some were 
different again,.. and so on.
So, at least in that case, it was not one whole range that was totally 
wrong, but only part of the bytes.

- Another thing... perhaps this was only by chance but:
When I did sha512 sums or diffs,.. the errors were always found in the 
files I copied.... not in the original files. Of course diff could not 
say me that (because it doesn't tell which files are original) but 
sha512sum could.
This is very strange because:
My first big tests were:
1) The original by Exact Audio Copy under Windows created files on my 
PATA disc in a FAT32 partition
- compared with -
a) copies from that files to another place on the FAT32 partition
b) copies from that files to an ext3 partition on one of the SATA discs.
=> There one could imagine that the failure would be done in the copying 
(which is impossible or unlikely,.. because then the differences should 
be always in the same file(s).

2) I copied the files from FAT32 to ext3 again,.. and then copied the 
whole stuff from ext3 to another location on the ext3 partition.
The error happens here, too.

And I think it's very strange the even for test 2 the differences seemed 
to be always in the most recent copy.
Perhaps this was only fortune.



- I tried the whole thing under Windows (installed GNU diff tools there).
Copied the files and started the diff.
Until I had to abort (because my parents came to get me...) there have 
been found no differences.
Anyway I'm not sure if this says so much:
First of all,.. the diff is very very very slow in Windows (many times 
slower than in Linux),.. any I have all DMA/Busmaster/etc drivers 
installed in Windows.
Because of this I was not able to complete at least one whole diff over 
all the files, thus it would still be possible that errors have occurred.
The Windows task manager (if I interpret his data right) told me that 
diff has read about 20GB of data.... which mean it would have diffed 
about 10GB of the files (so only one third).

Another thing that I wonder about:
The Task Manager shows me somewhere something like System Cache: 2,1 GB 
(about).
As the EAC project was the first time for me to use Windows since 
Windows 95 or so.... I'm not sure what that means and if it is the same 
as the Linux file cache.
If so:
Linux seems to use all "free" memory for caching files but Windows would 
use only about the half of my memory.
Perhaps that could be a reason if the error would not occur under 
Windows (btw: I'm going to make several complete tests in Windows Monday 
or Tuesday when I'm back in Munich). Just imagin if there would be an 
hardware error in that unused 2GB.
I'm not sure enough about the internal Linux memory management to tell 
if that may be a reason for the error. I could imagine that Linux reads 
data first into the unused (cache) sections of mainmemory and copies it 
from there to the virtual memory (which is actually physical memory too) 
of the diff process.
Thus if there would be an error in my memory (although memtest86+ did 
not, until now, tell me of an error) it could be possible that Windows 
never use that memory regions, and thus the diff under windows won't get 
any corrupted data, too.
You understand what I mean?

btw: Can someone tell me if it's possible to instruct windows to use the 
whole memory as file cache? (And if so how ;-) )



My further tests (as I'm currently intend to do) are:
-Severall copy/diffs under windows
-An even longer memtest86+
-Using some Knoppix or so, to see if the error is related to my 
Distribution, my custom kernel or something like that.
-The kernel options Sergey suggested me to try
-Everything else some of you would suggest me :)

Thanks and best wishes,
Chris.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ