linux-kernel - Re: [PATCH v2] tpm: Fix the timeout & use ktime

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <048723bf-4a8d-451a-911b-f9f94a4270d7@gmail.com>
Date: Fri, 4 Jul 2025 16:39:20 +0100
From: "Orlov, Ivan" <ivan.orlov0322@...il.com>
To: Jarkko Sakkinen <jarkko@...nel.org>, Jonathan McDowell <noodles@...th.li>
Cc: "Orlov, Ivan" <iorlov@...zon.co.uk>, "peterhuewe@....de"
 <peterhuewe@....de>, "jgg@...pe.ca" <jgg@...pe.ca>,
 "linux-integrity@...r.kernel.org" <linux-integrity@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "Woodhouse, David" <dwmw@...zon.co.uk>
Subject: Re: [PATCH v2] tpm: Fix the timeout & use ktime

On 04/07/2025 16:16, Jarkko Sakkinen wrote:
> On Fri, Jul 04, 2025 at 10:02:33AM +0100, Jonathan McDowell wrote:
>> On Wed, Jun 25, 2025 at 07:43:07PM +0300, Jarkko Sakkinen wrote:
>>> On Sun, Jun 22, 2025 at 09:52:58PM +0100, Jonathan McDowell wrote:
>>>> On Fri, Jun 20, 2025 at 06:08:31PM +0000, Orlov, Ivan wrote:
>>>>> The current implementation of timeout detection works in the following
>>>>> way:
>>>>>
>>>>> 1. Read completion status. If completed, return the data
>>>>> 2. Sleep for some time (usleep_range)
>>>>> 3. Check for timeout using current jiffies value. Return an error if
>>>>>    timed out
>>>>> 4. Goto 1
>>>>>
>>>>> usleep_range doesn't guarantee it's always going to wake up strictly in
>>>>> (min, max) range, so such a situation is possible:
>>>>>
>>>>> 1. Driver reads completion status. No completion yet
>>>>> 2. Process sleeps indefinitely. In the meantime, TPM responds
>>>>> 3. We check for timeout without checking for the completion again.
>>>>>    Result is lost.
>>>>>
>>>>> Such a situation also happens for the guest VMs: if vCPU goes to sleep
>>>>> and doesn't get scheduled for some time, the guest TPM driver will
>>>>> timeout instantly after waking up without checking for the completion
>>>>> (which may already be in place).
>>>>>
>>>>> Perform the completion check once again after exiting the busy loop in
>>>>> order to give the device the last chance to send us some data.
>>>>>
>>>>> Since now we check for completion in two places, extract this check into
>>>>> a separate function.
>>>>>
>>>>> Signed-off-by: Ivan Orlov <iorlov@...zon.com>
>>>>> ---
>>>>> V1 -> V2:
>>>>> - Exclude the jiffies -> ktime change from the patch
>>>>> - Instead of recording the time before checking for completion, check
>>>>>   for completion once again after leaving the loop
>>>>>
>>>>> drivers/char/tpm/tpm-interface.c | 17 +++++++++++++++--
>>>>> 1 file changed, 15 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
>>>>> index 8d7e4da6ed53..6960ee2798e1 100644
>>>>> --- a/drivers/char/tpm/tpm-interface.c
>>>>> +++ b/drivers/char/tpm/tpm-interface.c
>>>>> @@ -82,6 +82,13 @@ static bool tpm_chip_req_canceled(struct tpm_chip *chip, u8 status)
>>>>> 	return chip->ops->req_canceled(chip, status);
>>>>> }
>>>>>
>>>>> +static bool tpm_transmit_completed(struct tpm_chip *chip)
>>>>> +{
>>>>> +	u8 status_masked = tpm_chip_status(chip) & chip->ops->req_complete_mask;
>>>>> +
>>>>> +	return status_masked == chip->ops->req_complete_val;
>>>>> +}
>>>>> +
>>>>> static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
>>>>> {
>>>>> 	struct tpm_header *header = buf;
>>>>> @@ -129,8 +136,7 @@ static ssize_t tpm_try_transmit(struct tpm_chip *chip, void *buf, size_t bufsiz)
>>>>> 	stop = jiffies + tpm_calc_ordinal_duration(chip, ordinal);
>>>>> 	do {
>>>>> 		u8 status = tpm_chip_status(chip);
>>>>> -		if ((status & chip->ops->req_complete_mask) ==
>>>>> -		    chip->ops->req_complete_val)
>>>>> +		if (tpm_transmit_completed(chip))
>>>>> 			goto out_recv;
>>>>
>>>> The only thing I'd point out here is we end up doing a double status read
>>>> one after the other (once here, once in tpm_transmit_completed), and I'm
>>>> pretty sure I've seen instances where that caused a problem.
>>>
>>> It would be easy to to prevent at least double reads after completion
>>> e.g., in tpm_chip_status():
>>
>> Or just take the simple approach and make the check after the while loop:
>>
>> 	if ((tpm_chip_status(chip) & chip->ops->req_complete_mask) ==
>> 	    chip->ops->req_complete_val)
>> 		goto out_recv;
>>
>> There might be potential for a longer term cleanup using chip->status to
>> cache things, but I'm little concerned that's going to open paths where we
>> might not correctly update it, so I think it should be a separate piece.
>>
>> (I'm motivated by the fact we've started to see the "Operation Canceled"
>> error and I'd like us to close on the best way to fix it. :) )
> 
> This would work for me too!
> 

Hi, and sorry for the late reply :(

I believe this option would work for us as well. Please let me know 
whether you'd like me to send V3 or you feel free to send it yourself if 
you want.

--
Kind regards,
Ivan Orlov