linux-kernel - Re: Lots of con-current I/O = resets SATA link? (2.6.25.10)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.1.10.0807080433550.17980@p34.internal.lan>
Date:	Tue, 8 Jul 2008 04:34:33 -0400 (EDT)
From:	Justin Piszcz <jpiszcz@...idpixels.com>
To:	Gerhard Wiesinger <lists@...singer.com>
cc:	linux-kernel@...r.kernel.org, linux-raid@...r.kernel.org,
	linux-ide@...r.kernel.org
Subject: Re: Lots of con-current I/O = resets SATA link? (2.6.25.10)



On Tue, 8 Jul 2008, Gerhard Wiesinger wrote:

> On Mon, 7 Jul 2008, Justin Piszcz wrote:
>
>> Hi Gerhard,
>> 
>> It /could/ be the port itself if you have changed the cable and disk..
>> 
>
> Yes, but it is very unlikely. I have written TB of data there without any 
> problems. Anyway this is my 3rd exchanged SAMSUNG disk ...
>
>
>> Have you tried loading the disk with dd and seeing if you can reproduce the 
>> problem? You are getting the same error I get generally, I can recommend 
>> turning OFF NCQ first and see if the problem goes away.
>> 
>> # Define DISKS.
>> cd /sys/block
>> DISKS=$(/bin/ls -1d sd[a-z])
>> 
>> # Disable NCQ on all disks.
>> echo "Disabling NCQ on all disks..."
>> for i in $DISKS
>> do
>>  echo "Disabling NCQ on $i"
>>  echo 1 > /sys/block/"$i"/device/queue_depth
>> done
>> 
>
> I tried to disable NCQ on all disks and tried to rebuild the raid, but it 
> still failed to rebuild with the same error message.
>
> I also tried the nolapic kernel parameter without success.
>
> /dev/sda:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
> Always       -       0
> /dev/sdb:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
> Always       -       0
> /dev/sdc:  5 Reallocated_Sector_Ct   0x0033   091   091   010    Pre-fail 
> Always       -       413
> /dev/sdd:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
> Always       -       0
> /dev/sde:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
> Always       -       0
> /dev/sdf:  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail 
> Always       -       0
>
> The only thing is that the Reallocated_Sector_Ct is still >0 on /dev/sdc 
> (keep in mind this is my 3rd new Samsung disk on /dev/sdc and I had up to 
> 3000 Reallocated_Sector_Ct on previous disks in < 1 day !!!).
>
> Should I replace the disk a fourth time?
>
> When you search in google you find a lot of threads with the timeout problem. 
> Might this be a software issue?
>
> Any ideas?

Please run:

smartctl -t short /dev/sdc
sleep 300
smartctl -t long /dev/sdc

Wait 2-3 hours or more and:

smartctl -a /dev/sdc

Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/