[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fd34e752-b6ce-4880-9ef5-4feda985bf42@infineon.com>
Date: Mon, 26 Feb 2024 13:43:55 +0100
From: Alexander Steffen <Alexander.Steffen@...ineon.com>
To: "Daniel P. Smith" <dpsmith@...rtussolutions.com>, Lino Sanfilippo
<l.sanfilippo@...bus.com>, Jarkko Sakkinen <jarkko@...nel.org>, "Jason
Gunthorpe" <jgg@...pe.ca>, Sasha Levin <sashal@...nel.org>,
<linux-integrity@...r.kernel.org>, <linux-kernel@...r.kernel.org>
CC: Ross Philipson <ross.philipson@...cle.com>, Kanth Ghatraju
<kanth.ghatraju@...cle.com>, Peter Huewe <peterhuewe@....de>
Subject: Re: [PATCH 1/3] tpm: protect against locality counter underflow
On 23.02.2024 02:55, Daniel P. Smith wrote:
> On 2/20/24 13:42, Alexander Steffen wrote:
>> On 02.02.2024 04:08, Lino Sanfilippo wrote:
>>> On 01.02.24 23:21, Jarkko Sakkinen wrote:
>>>
>>>>
>>>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>>>> Commit 933bfc5ad213 introduced the use of a locality counter to
>>>>> control when a
>>>>> locality request is allowed to be sent to the TPM. In the commit,
>>>>> the counter
>>>>> is indiscriminately decremented. Thus creating a situation for an
>>>>> integer
>>>>> underflow of the counter.
>>>>
>>>> What is the sequence of events that leads to this triggering the
>>>> underflow? This information should be represent in the commit message.
>>>>
>>>
>>> AFAIU this is:
>>>
>>> 1. We start with a locality_counter of 0 and then we call
>>> tpm_tis_request_locality()
>>> for the first time, but since a locality is (unexpectedly) already
>>> active
>>> check_locality() and consequently __tpm_tis_request_locality() return
>>> "true".
>>
>> check_locality() returns true, but __tpm_tis_request_locality() returns
>> the requested locality. Currently, this is always 0, so the check for
>> !ret will always correctly indicate success and increment the
>> locality_count.
>>
>> But since theoretically a locality != 0 could be requested, the correct
>> fix would be to check for something like ret >= 0 or ret == l instead of
>> !ret. Then the counter will also be incremented correctly for localities
>> != 0, and no underflow will happen later on. Therefore, explicitly
>> checking for an underflow is unnecessary and hides the real problem.
>>
>
> My apologies, but I will have to humbly disagree from a fundamental
> level here. If a state variable has bounds, then those bounds should be
> enforced when the variable is being manipulated.
That's fine, but that is not what your proposed fix does.
tpm_tis_request_locality and tpm_tis_relinquish_locality are meant to be
called in pairs: for every (successful) call to tpm_tis_request_locality
there *must* be a corresponding call to tpm_tis_relinquish_locality
afterwards. Unfortunately, in C there is no language construct to
enforce that (nothing like a Python context manager), so instead
locality_count is used to count the number of successful calls to
tpm_tis_request_locality, so that tpm_tis_relinquish_locality can wait
to actually relinquish the locality until the last expected call has
happened (you can think of that as a Python RLock, to stay with the
Python analogies).
So if locality_count ever gets negative, that is certainly a bug
somewhere. But your proposed fix hides this bug, by allowing
tpm_tis_relinquish_locality to be called more often than
tpm_tis_request_locality. You could have added something like
BUG_ON(priv->locality_count == 0) before decrementing the counter. That
would really enforce the bounds, without hiding the bug, and I would be
fine with that.
Of course, that still leaves the actual bug to be fixed. In this case,
there is no mismatch between the calls to tpm_tis_request_locality and
tpm_tis_relinquish_locality. It is just (as I said before) that the
counting of successful calls in tpm_tis_request_locality is broken for
localities != 0, so that is what you need to fix.
> Assuming that every
> path leading to the variable manipulation code has ensured proper
> manipulation is just that, an assumption. When assumptions fail is how
> bugs and vulnerabilities occur.
>
> To your point, does this full address the situation experienced, I would
> say it does not. IMHO, the situation is really a combination of both
> patch 1 and patch 2, but the request was to split the changes for
> individual discussion. We selected this one as being the fixes for two
> reasons. First, it blocks the underflow such that when the Secure Launch
> series opens Locality 2, it will get incremented at that time and the
> internal locality tracking state variables will end up with the correct
> values. Thus leading to the relinquish succeeding at kernel shutdown.
> Second, it provides a stronger defensive coding practice.
>
> Another reason that this works as a fix is that the TPM specification
> requires the registers to be mirrored across all localities, regardless
> of the active locality. While all the request/relinquishes for Locality
> 0 sent by the early code do not succeed, obtaining the values via the
> Locality 0 registers are still guaranteed to be correct.
>
> v/r,
> dps
Powered by blists - more mailing lists