linux-kernel - [RFC] sdhci: timeouts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Sat, 28 May 2011 19:09:32 -0400
From:	Charles Hannum <root@...ck.net>
To:	linux-kernel@...r.kernel.org
Subject: [RFC] sdhci: timeouts

Between all the devices with special “quirks” to use
SDHCI_QUIRK_BROKEN_TIMEOUT_VAL and
SDHCI_QUIRK_DATA_TIMEOUT_USES_SDCLK, and the 19200 Google hits for
“linux sdhci timeout”, it sure seems like there's a problem there
somewhere.  Having been bitten by it on my own Dell laptop, I went
poking, and found:

1) The second term of the timeout calculation (based on tacc_clks) is
totally bogus.  It's dividing a whole number of SDCLK cycles by the
host clock frequency, but is expecting to get microseconds.  See
attached patch sdhci-timeout-clks.diff.

2) The SDHCI spec is very specific that in the presence of both a
Transfer Complete and a Data Timeout Error, that the Transfer Complete
takes precedence.  This is documented under the definition of the
Transfer Complete bit (page 53 of the SDHCI 2.0 spec).  See attached
patch sdhci-timeout-int.diff.

3) There's a lot of folklore about buggy clocks on various chips, but
no hard data.  I always hate this kind of folklore.  I found it
helpful to actually measure the timeout and see if it was what we
expected.  Ultimately this proved that the controllers in my machines
were in fact delivering timeouts pretty much exactly as expected.  See
attached patch sdhci-timeout-log.diff (depends on the previous two
diffs); it outputs message of the form:

May 28 18:53:01 lop-nor kernel: [24056.231401] sdhci: timeout,
requested 508400484ns actual 510006113ns, TMCLK configured 33000
estimated 32896
May 28 18:53:33 lop-nor kernel: [24088.002670] sdhci: timeout,
requested 508400484ns actual 510005815ns, TMCLK configured 33000
estimated 32896
May 28 18:55:29 lop-nor kernel: [24204.139937] sdhci: timeout,
requested 508400484ns actual 500006027ns, TMCLK configured 33000
estimated 33554
May 28 18:55:29 lop-nor kernel: [24204.887654] sdhci: timeout,
requested 508400484ns actual 500006025ns, TMCLK configured 33000
estimated 33554

4) Ultimately I found that some SDHC cards just seem to take a good
long time to respond.  I ended up increasing the write timeout to 1s
and I've encountered 0 problems since then.  Since this is only an
error condition, and therefore hardly of any performance concerns, I
suggest it may be a good idea to do this in general.  See attached
patch sdhci-timeout-limit.diff

View attachment "sdhci-timeout-clks.diff" of type "text/x-patch" (440 bytes)

View attachment "sdhci-timeout-int.diff" of type "text/x-patch" (430 bytes)

View attachment "sdhci-timeout-log.diff" of type "text/x-patch" (2541 bytes)

View attachment "sdhci-timeout-limit.diff" of type "text/x-patch" (352 bytes)