lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 05 Jan 2007 00:15:58 -0300
From:	Pablo Sebastian Greco <lkml@...agreco.com.ar>
To:	Tejun Heo <htejun@...il.com>
CC:	linux-kernel@...r.kernel.org
Subject: Re: SATA problems

Pablo Sebastian Greco wrote:
> Tejun Heo wrote:
>> Pablo Sebastian Greco wrote:
>>  
>>> By crash I mean the whole system going down, having to reset the entire
>>> machine.
>>> I'm sending you 4 files:
>>> dmesg: current boot dmesg, just a boot, because no errors appeared 
>>> after
>>> last crash, since the server is out of production right now (errors
>>> usually appear under heavy load, and this primarily a transparent proxy
>>> for about 1000 simultaneous users)
>>> lspci: the way you asked for it
>>> messages and messages.1: files where you can see old boots and crashes
>>> (even a soft lockup).
>>> If there is anything else I can do, let me know. If you need direct
>>> access to the server, I can arrange that too.
>>>     
>>
>> Can you try 2.6.20-rc3 and see if 'CLO not available' message goes away
>> (please post boot dmesg)?
>>
>> The crash/lock is because filesystem code does not cope with IO errors
>> very well.  I can't tell why timeouts are occurring in the first place.
>>  It seems that only samsung drives are affected (sda2, 3, 4).  Hmmm...
>> Please apply the attached patch to 2.6.20-rc3 and test it.
>>
>> Thanks.
>>
>>   
> Here's boot dmesg with 2.6.20-rc3 + blacklist. And you are right about 
> only affecting samsung drives, but since only those drives get all the 
> heavy load, couldn't tell exactly.
> I'm putting the server in production right now, so I think in a few 
> hours I'll have more info.
>
> Thanks.
> Pablo.
After an uptime of  13:34 under heavy load and no errors, I'm pretty 
sure your patch is correct. Is there a way to backport this to 2.6.18.x?
Just an off topic question, does anyone know why I get so uneven IRQ 
handling on 2.6.19-20 and almost perfect on 2.6.20-rc2-mm1?

Thanks for everything.
Pablo.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ