linux-kernel - Re: [PATCH] mmc: core: Remove timeout when enabling cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aa26f5dc9e595feb3772495a6d0542c007948f48.camel@collabora.co.uk>
Date:   Tue, 20 Nov 2018 12:38:05 +0100
From:   Sjoerd Simons <sjoerd.simons@...labora.co.uk>
To:     Wolfram Sang <wsa@...-dreams.de>,
        Ulf Hansson <ulf.hansson@...aro.org>
Cc:     Faiz Abbas <faiz_abbas@...com>,
        "linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
        kernel@...labora.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Hongjie Fang <hongjiefang@...micro.com>,
        Bastian Stender <bst@...gutronix.de>,
        Kyle Roeschley <kyle.roeschley@...com>,
        Wolfram Sang <wsa+renesas@...g-engineering.com>,
        Shawn Lin <shawn.lin@...k-chips.com>,
        Harish Jenny K N <harish_kandiga@...tor.com>,
        Simon Horman <horms+renesas@...ge.net.au>
Subject: Re: [PATCH] mmc: core: Remove timeout when enabling cache

On Tue, 2018-11-20 at 11:23 +0100, Wolfram Sang wrote:
> > > > That also happens to be one of the cards we deploy; However i
> > > > did
> > > > wonder about adding a quirk but decided against it as it was
> > > > not clear
> > > > to me from the specification that CACHE ON really is meant to
> > > > complete
> > > > within GENERIC_CMD6_TIMEOUT. That and i fret about ending up in
> > > > hit-a-
> > > > mole games as the failure is really quite tedious (boot
> > > > failure).
> > > 
> > > I agree that we should use the more defensive variant as a
> > > default. I
> > > mean there should be no performance regression since most cards
> > > will
> > > respond just faster, or? The only downside I could see is that we
> > > might
> > > miss a real timeout with no bounds set and might get stuck?
> > 
> > Well, you have a point, but still it's kind of nice to know which
> > cards are behaving well and which ones that doesn't. Hence I think
> > I
> > prefer to stick using a quirk, unless you have a strong opinion.

Not an incredibly strong opinion either; I just wonder if it's the
right trade-off. 

If the quirk/work-around is not there while it should be, the impact is
that you get an unusable card (which for eMMC is likely to mean a
failure to boot the system). Which is somewhat unfortunate.

If the work-around is there while it's not needed then there doesn't
seem to be much of an impact at all; Apart from it not being reported
to the user/developer/kernel community? 

In which case it might make more to put in a warning iff the card takes
too long with a list of cards for which this is known?

> No strong opinion. Especially not if you say it is in the spec
> (although
> "must be sufficient" would be better than "should be" ;)). Also, I
> assume this failure is reproducible and should turn up during
> development? Compared to "happens once in a while randomly"?

For the card in question it happens only on hard power off; The time it
takes seems correlated to the state of the cache at hard power off (It
takes substantially longer if there was a lot of I/O activity at 
 the time of hard power off). With light I/O activity the current
timeout is sometimes enough.

So if you know the pattern, or just happen to hit it often in e.g.
automated testing, it does show up during development. Otherwise it can
appear to "happen once in a while randomly".

Unfortunately for me, it was really a case of getting reports of some
boards started failing at some point which took a while to track back.
Especially since it's a battery powered device (thus hard poweroffs are
rather rare) and we allow the board manufactorer to select from various
different eMMCs depending on price/available at build time...

> Yet, if we add a quirk for that, then we should probably mention it
> in
> an error message when we hit -ETIMEDOUT for cache on ("does your card
> need this quirk?")? It can be pretty time consuming to track this
> down
> otherwise, I'd think.

Yes please. It would be nice if someone happens to have the right
contacts with Micron to see if it's a known issue for their cards in
general or just this one.

Also would be good to have a timeout higher then 1 seconds (or for
these cards not have one?); On our testing thusfar we've seen timeouts
up to 850ms, but it's impossible to ensure that that's the true upper
bound.

-- 
Sjoerd Simons
Collabora Ltd.