lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 11 Oct 2007 15:26:27 +0900
From:	Tejun Heo <htejun@...il.com>
To:	Torsten Kaiser <just.for.lkml@...glemail.com>
CC:	Jens Axboe <jens.axboe@...cle.com>, Jeff Garzik <jeff@...zik.org>,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org
Subject: Re: sata_sil24 broken since 2.6.23-rc4-mm1

Torsten Kaiser wrote:
>>> That missing +1 would explain, why the SGE_TRM never gets set.
>> Thanks a lot for tracking this down.  Does changing the above code fix
>> your problem?
> 
> I did not try it.
> I'm not an libata expert and while this change looks suspicios, I
> can't be 100% sure if that change was intended.
> And I did not want to experiment this deep in the code and risk
> corrupting the hole drive.

I don't think you would risk too much by changing that bit of code.
Please try it.

>>> But I'm still not understanding, how the kernel could only fail
>>> sometimes at bootup, but after that working without any visible
>>> errors? Is the sil-chip rather intelligent about detecting corrupted
>>> sglists and silently ignoring them?
>> I have no idea why it fails only sometimes.
> 
> And that is, why I'm so unsure.
> The error looks to serious to only cause random failures on one of two
> drives on bootup.
> I never had trouble with the remaining drive on the SiI-chip or both
> drives if one got killed during booting.
> 
> I'm guessing that leaving the computer powered down long enough fills
> the RAM with a special pattern that really hangs the drive, while
> normaly it would just reject the invalid data. (I have ECC-RAM, does
> this matter?)
> 
> Another guess might be that most of the time the Sil-chip correctly
> terminates after the transfer-length is reached, even if SGE_TRM is
> missing...

I have no idea either.  We'll probably need a PCI bus tracer to tell
exactly what's going on.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ