lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 26 May 2023 01:45:23 +0200
From:   Lino Sanfilippo <l.sanfilippo@...bus.com>
To:     Jarkko Sakkinen <jarkko@...nel.org>,
        Lino Sanfilippo <LinoSanfilippo@....de>, peterhuewe@....de,
        jgg@...pe.ca
Cc:     jsnitsel@...hat.com, hdegoede@...hat.com, oe-lkp@...ts.linux.dev,
        lkp@...el.com, peter.ujfalusi@...ux.intel.com,
        peterz@...radead.org, linux@...ewoehner.de,
        linux-integrity@...r.kernel.org, linux-kernel@...r.kernel.org,
        lukas@...ner.de, p.rosenberger@...bus.com
Subject: Re: [PATCH 1/2] tpm, tpm_tis: Handle interrupt storm

Hi,

On 24.05.23 05:58, Jarkko Sakkinen wrote:

>> Commit e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") enabled
>> interrupts instead of polling on all capable TPMs. Unfortunately, on some
>> products the interrupt line is either never asserted or never deasserted.
> 
> Use Reported-by and Closes tag and remove this paragraph.
> 
> In Closes link instead from lore the email where the bug was reported.

Ok

> 
>> The former causes interrupt timeouts and is detected by
>> tpm_tis_core_init(). The latter results in interrupt storms.
> 
> Please describe instead the system where the bug was realized. Don't
> worry about speculative descriptions. We only deal with ones actually
> realized.
> 
>> Recent reports concern the Lenovo ThinkStation P360 Tiny, Lenovo ThinkPad
>> L490 and Inspur NF5180M6:
>>
>> https://lore.kernel.org/linux-integrity/20230511005403.24689-1-jsnitsel@redhat.com/
>> https://lore.kernel.org/linux-integrity/d80b180a569a9f068d3a2614f062cfa3a78af5a6.camel@kernel.org/
> 
> Please remove all of this, as the fixes have been handled. Let's keep
> the commit message focused.
> 

Ok.

>> The current approach to avoid those storms is to disable interrupts by
>> adding a DMI quirk for the concerned device.
>>
>> However this is a maintenance burden in the long run, so use a generic
>> approach:
>>
>> Detect an interrupt storm by counting the number of unhandled interrupts
>> within a 10 ms time interval. In case that more than 1000 were unhandled
>> deactivate interrupts, deregister the handler and fall back to polling.
>>
>> This equals the implementation that handles interrupt storms in
>> note_interrupt() by means of timestamps and counters in struct irq_desc.
>> However the function to access this structure is private so the logic has
>> to be reimplemented in the TPM TIS core.
> 
> I only now found out that this is based on kernel/irq/spurious.c code
> partly. Why this was unmentioned?
> 
> That would make this already more legitimate because it is based
> on field tested metrics.
>> Then we only have to discuss about counter.

I mentioned this in one of my responses:
https://lore.kernel.org/all/da435e0d-5f22-fac7-bc10-96a0fd4c6d54@kunbus.com/

> 
>> routine trigger a worker thread that executes the unregistration.
>>
>> Suggested-by: Lukas Wunner <lukas@...ner.de>
>> Signed-off-by: Lino Sanfilippo <l.sanfilippo@...bus.com>
>> ---
>>  drivers/char/tpm/tpm_tis_core.c | 71 +++++++++++++++++++++++++++++++--
>>  drivers/char/tpm/tpm_tis_core.h |  6 +++
>>  2 files changed, 74 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 558144fa707a..458ebf8c2f16 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -752,6 +752,55 @@ static bool tpm_tis_req_canceled(struct tpm_chip *chip, u8 status)
>>       return status == TPM_STS_COMMAND_READY;
>>  }
>>
>> +static void tpm_tis_handle_irq_storm(struct tpm_chip *chip)
>> +{
>> +     struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> +     int intmask = 0;
>> +
>> +     dev_err(&chip->dev, HW_ERR
>> +             "TPM interrupt storm detected, polling instead\n");
> 
> Degrading this to warn is fine because it is legit behaviour in a
> sense.
> 

Agreed.

>> +
>> +     tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>> +
>> +     intmask &= ~TPM_GLOBAL_INT_ENABLE;
>> +
>> +     tpm_tis_request_locality(chip, 0);
>> +     tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>> +     tpm_tis_relinquish_locality(chip, 0);
>> +
>> +     chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> +
>> +     /*
>> +      * We must not call devm_free_irq() from within the interrupt handler,
> 
> Never use "we" form. Always use either:
> 
> 1. Imperative
> 2. Passive
> 
> I.e. to address this, you would write instead "devm_free_irq() must not
> be called within the interrupt handler because ...".
> 

Ok.

>> +      * since this function waits for running interrupt handlers to finish
>> +      * and thus it would deadlock. Instead trigger a worker that does the
>> +      * unregistration.
>> +      */
>> +     schedule_work(&priv->free_irq_work);
>> +}
>> +
>> +static void tpm_tis_process_unhandled_interrupt(struct tpm_chip *chip)
>> +{
>> +     const unsigned int MAX_UNHANDLED_IRQS = 1000;
>> +     struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 
> Reverse order and add empty line.
> 

Ok.

>> +     /*
>> +      * The worker to free the TPM interrupt (free_irq_work) may already
>> +      * be scheduled, so make sure it is not scheduled again.
>> +      */
>> +     if (!(chip->flags & TPM_CHIP_FLAG_IRQ))
>> +             return;
>> +
>> +     if (time_after(jiffies, priv->last_unhandled_irq + HZ/10))
>> +             priv->unhandled_irqs = 1;
>> +     else
>> +             priv->unhandled_irqs++;
>> +
>> +     priv->last_unhandled_irq = jiffies;
>> +
>> +     if (priv->unhandled_irqs > MAX_UNHANDLED_IRQS)
>> +             tpm_tis_handle_irq_storm(chip);
> 
> Why wouldn't we switch to polling mode even when there is a single
> unhandled IRQ?
> 

As Lukas wrote, not handling an interrupt may be legit if it was raised
by a device that shares the interrupt line with the TPM.

Regards,
Lino

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ