linux-kernel - Re: 2.6.37.1 s2disk regression (TPM)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 22 Feb 2011 06:57:44 -0500
From:	Stefan Berger <stefanb@...ux.vnet.ibm.com>
To:	Jiri Slaby <jirislaby@...il.com>
CC:	Rajiv Andrade <srajiv@...ux.vnet.ibm.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	linux-pm <linux-pm@...ts.linux-foundation.org>,
	stable@...nel.org,
	Linux kernel mailing list <linux-kernel@...r.kernel.org>,
	debora@...ux.vnet.ibm.com,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	preining@...ic.at
Subject: Re: 2.6.37.1 s2disk regression (TPM)

On 02/22/2011 03:41 AM, Jiri Slaby wrote:
> On 02/22/2011 01:42 AM, Stefan Berger wrote:
>> On 02/21/2011 05:10 PM, Jiri Slaby wrote:
>>> On 02/21/2011 11:07 PM, Rajiv Andrade wrote:
>>>> On 02/21/2011 06:44 PM, Jiri Slaby wrote:
>>>>> On 02/21/2011 10:29 PM, Stefan Berger wrote:
>>>>>> On 02/21/2011 03:39 PM, Jiri Slaby wrote:
>>>>>>> On 02/21/2011 06:12 PM, Rajiv Andrade wrote:
>>>>>>>> On 02/21/2011 01:34 PM, Jiri Slaby wrote:
>>>>>>>>> There has to be another problem which caused my regression. And
>>>>>>>>> since it
>>>>>>>>> reports "Operation Timed out", the former default timeout values
>>>>>>>>> worked
>>>>>>>>> for me, the ones read from TPM do not.
>>>>>>>> Yes, it's highly due inconsistent timeout values reported by the
>>>>>>>> TPM as
>>>>>>>> I mentioned, my working timeouts are:
>>>>>>>> 3020000 4510000 181000000
>>>>>>> 1000000 2000 150000
>>>>>>>
>>>>>>> Actually the first one from HW is 1. This is one is HZ after
>>>>>>> correction
>>>>>>> in get_timeout. So perhaps it is in ms, yes.
>>>>>> Following the specs, the timeouts are supposed to be in
>>>>>> microseconds and
>>>>>> ascending order for short, medium and long duration. Of course, if the
>>>>>> device returns wrong timeouts, the command isn't going to succeed,
>>>>>> failing the suspend in this case. Nevertheless, I think we need the
>>>>>> patch I put in but at the same time we'll need a work-around for
>>>>>> devices
>>>>>> like this.
>>>>> Yes, the patch is correct per se. But as it breaks bunch of machines it
>>>>> cannot go in now. The rule is no regressions.
>>>>>
>>>>> After you have the workaround it should go into the next rc1 after
>>>>> that.
>>>>> Do you plan to add a dmi-based quirk? Or, IOW do you want me to attach
>>>>> dmidecode output? Or are you going to base it solely on TPM
>>>>> manufacturer/version
>>>> It's more reliable to base the workaround on the values themselves,
>>>> instead of the TPM's ID, since
>>>> we don't know whether other models will behave similarly.
>>> As I wrote, you may base it on dmi data.
>>>
>>>> It should be fine then to extend the existing workaround for short
>>>> timeouts to the medium and long ones.
>>> OK, but how will you guess the values?
>> One way of doing it would be to at least make sure that the timeouts are
>>
>> short<  medium<  long
>>
>> and if that's not true, as in the case of your TPM, set the timeouts to
>> 0 and have Rajiv's work-around kick in  OR we assign the same high
>> values to the timeouts explicily that Rajiv's work-around is using right
>> now. Of course there could be another type of bad TPM firmware out there
>> where all values are in ascending order but given in ms and cause
>> time-outs -- but I would wait for someone to point that out since I am
>> not aware of such a device.
> Note that it is in ascending order (1 2000 150000). As I wrote the first
> timeout (1) is replaced by one HZ in get_timeouts.
The forthcoming patch will simply also adapt the other 2 values and 
multiply them by 1000. The reason for the suspend failure is the 2nd 
timeout with TPM_SaveState command being of medium duration.

There will be a 2nd patch for re-enabling the TPM's interrupts that the 
BIOS may (this may be BIOS-dependent) have disabled while sending a 
command (TPM_Startup) to the TPM upon resume and having used polling 
mode and leaving it with the interrupts disabled.

I'd appreciate it if you tested both of them.

    Stefan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/