lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1552401766.3083.3.camel@HansenPartnership.com>
Date:   Tue, 12 Mar 2019 07:42:46 -0700
From:   James Bottomley <James.Bottomley@...senPartnership.com>
To:     Jarkko Sakkinen <jarkko.sakkinen@...ux.intel.com>
Cc:     Calvin Owens <calvinowens@...com>, Peter Huewe <peterhuewe@....de>,
        Jason Gunthorpe <jgg@...pe.ca>, Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-integrity@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...com
Subject: Re: [PATCH] tpm: Make timeout logic simpler and more robust

On Tue, 2019-03-12 at 14:50 +0200, Jarkko Sakkinen wrote:
> On Mon, Mar 11, 2019 at 05:27:43PM -0700, James Bottomley wrote:
> > On Mon, 2019-03-11 at 16:54 -0700, Calvin Owens wrote:
> > > e're having lots of problems with TPM commands timing out, and
> > > we're seeing these problems across lots of different hardware
> > > (both v1/v2).
> > > 
> > > I instrumented the driver to collect latency data, but I wasn't
> > > able to find any specific timeout to fix: it seems like many of
> > > them are too aggressive. So I tried replacing all the timeout
> > > logic with a single universal long timeout, and found that makes
> > > our TPMs 100% reliable.
> > > 
> > > Given that this timeout logic is very complex, problematic, and
> > > appears to serve no real purpose, I propose simply deleting all
> > > of it.
> > 
> > "no real purpose" is a bit strong given that all these timeouts are
> > standards mandated.  The purpose stated by the standards is that
> > there needs to be a way of differentiating the TPM crashed from the
> > TPM is taking a very long time to respond.  For a normally
> > functioning TPM it looks complex and unnecessary, but for a
> > malfunctioning one it's a lifesaver.
> 
> Standards should be only followed when they make practical sense and
> ignored when not. The range is only up to 2s anyway.

I don't disagree ... and I'm certainly not going to defend the TCG
because I do think the complexity of some of its standards contributed
to the lack of use of TPM 1.2.

However, I am saying we should root cause this problem rather than take
a blind shot at the apparent timeout complexity.  My timeout
instability is definitely related to the polling adjustments, so it's
not unreasonable to think Facebooks might be as well.

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ