lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64bb37e0710102254n60e22338t4baf75a47b93ac14@mail.gmail.com>
Date:	Thu, 11 Oct 2007 07:54:53 +0200
From:	"Torsten Kaiser" <just.for.lkml@...glemail.com>
To:	"Tejun Heo" <htejun@...il.com>
Cc:	"Jens Axboe" <jens.axboe@...cle.com>,
	"Jeff Garzik" <jeff@...zik.org>, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org
Subject: Re: sata_sil24 broken since 2.6.23-rc4-mm1

On 10/11/07, Tejun Heo <htejun@...il.com> wrote:
> Torsten Kaiser wrote:
> > Looking closer at
> > http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commitdiff;h=ec6fdded4d76aa54aa57341e5dfdd61c507b1dcd
> > the change to libata.h seems bogus :
> >
> > in ata_qc_first_sg:
> > old                                new
> > return qc->__sg                    return qc->__sg
> > qc->__sg - qc->__sg == 0           qc->n_iter=0
> > -> sg - qc->__sg corresponds to qc->n_iter
> >
> > in ata_qc_next_sg:
> > sg++;                              sg_next(sg); qc->n_iter++;
> > sg - qc->__sg < qc->n_elem         qc->n_iter < qc->nelem
> > -> sg - qc->__sg corresponds to qc->n_iter
> >
> > but in ata_sg_is_last:
> > (sg - qc->__sg) +1 == qc->n_elem   qc->n_iter == qc->n_elem
> > if sg - qc->__sg corresponds to qc->n_iter then shoudn't it be
> > qc->n_iter+1 == qc->n_elem?
> >
> > That missing +1 would explain, why the SGE_TRM never gets set.
>
> Thanks a lot for tracking this down.  Does changing the above code fix
> your problem?

I did not try it.
I'm not an libata expert and while this change looks suspicios, I
can't be 100% sure if that change was intended.
And I did not want to experiment this deep in the code and risk
corrupting the hole drive.

> Jens, Torsten's analysis looks correct && depending on qc state (n_iter)
> during iteration doesn't look like a good idea.  Those iterators are not
> supposed to have side effects.  Would it be difficult to implement
> sg_last() test?
>
> > And it would fit the symptoms, that the boot would fail at random. If
> > the "correct" garbage was in place to where the sglist runs off it
> > hangs the drive.
> > And that would even fit the two different errors that I only got one time each:
> > * a completely illegal access (PCI master abort while fetching SGT)
> > * wrong alignment of the SGT (SGT no on qword boundary)
> > At that that times the garbage seemed to point invalid addresses.
> >
> > But I'm still not understanding, how the kernel could only fail
> > sometimes at bootup, but after that working without any visible
> > errors? Is the sil-chip rather intelligent about detecting corrupted
> > sglists and silently ignoring them?
>
> I have no idea why it fails only sometimes.

And that is, why I'm so unsure.
The error looks to serious to only cause random failures on one of two
drives on bootup.
I never had trouble with the remaining drive on the SiI-chip or both
drives if one got killed during booting.

I'm guessing that leaving the computer powered down long enough fills
the RAM with a special pattern that really hangs the drive, while
normaly it would just reject the invalid data. (I have ECC-RAM, does
this matter?)

Another guess might be that most of the time the Sil-chip correctly
terminates after the transfer-length is reached, even if SGE_TRM is
missing...

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ