lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <550f0928-73dd-0dbf-0763-c9cc529847df@kunbus.com>
Date:   Fri, 9 Jun 2023 18:03:21 +0200
From:   Lino Sanfilippo <l.sanfilippo@...bus.com>
To:     Jarkko Sakkinen <jarkko@...nel.org>,
        Lino Sanfilippo <LinoSanfilippo@....de>, peterhuewe@....de,
        jgg@...pe.ca
Cc:     jsnitsel@...hat.com, hdegoede@...hat.com, oe-lkp@...ts.linux.dev,
        lkp@...el.com, peter.ujfalusi@...ux.intel.com,
        peterz@...radead.org, linux@...ewoehner.de,
        linux-integrity@...r.kernel.org, linux-kernel@...r.kernel.org,
        lukas@...ner.de, p.rosenberger@...bus.com,
        kernel test robot <yujie.liu@...el.com>
Subject: Re: [PATCH v2] tpm,tpm_tis: Handle interrupt storm


Hi,

On 09.06.23 16:33, Jarkko Sakkinen wrote:

> 
> On Tue May 30, 2023 at 8:47 PM EEST, Lino Sanfilippo wrote:
>> From: Lino Sanfilippo <l.sanfilippo@...bus.com>
>>
>> After activation of interrupts for TPM TIS drivers 0-day reports an
>> interrupt storm on an Inspur NF5180M6/NF5180M6 server.
>>
>> Fix this by detecting the storm and falling back to polling:
>> Count the number of unhandled interrupts within a 10 ms time interval. In
>> case that more than 1000 were unhandled deactivate interrupts entirely,
>> deregister the handler and use polling instead.
>>
>> The storm detection logic equals the implementation in note_interrupt()
>> which uses timestamps and counters stored in struct irq_desc. Since this
>> structure is private to the generic interrupt core the TPM TIS core uses
>> its own timestamps and counters. Furthermore the TPM interrupt handler
>> always returns IRQ_HANDLED to prevent the generic interrupt core from
>> processing the interrupt storm.
>>
>> Since the interrupt deregistration function devm_free_irq() waits for all
>> interrupt handlers to finish, only trigger a worker in the interrupt
>> handler and do the unregistration in the worker to avoid a deadlock.
>>
>> Reported-by: kernel test robot <yujie.liu@...el.com>
>> Closes: https://lore.kernel.org/oe-lkp/202305041325.ae8b0c43-yujie.liu@intel.com/
>> Suggested-by: Lukas Wunner <lukas@...ner.de>
>> Signed-off-by: Lino Sanfilippo <l.sanfilippo@...bus.com>
>> ---
> 
> Sorry for the latency. I've moved home office to a new location,
> which has caused ~2 week lag. Unfortunate timing.
> 


No prob :)


>>  drivers/char/tpm/tpm_tis_core.c | 93 ++++++++++++++++++++++++++++-----
>>  drivers/char/tpm/tpm_tis_core.h |  4 ++
>>  2 files changed, 85 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
>> index 558144fa707a..7ae8228e803f 100644
>> --- a/drivers/char/tpm/tpm_tis_core.c
>> +++ b/drivers/char/tpm/tpm_tis_core.c
>> @@ -468,25 +468,32 @@ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
>>       return rc;
>>  }
>>
>> +static void __tpm_tis_disable_interrupts(struct tpm_chip *chip)
>> +{
>> +     struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> +     u32 intmask = 0;
>> +
>> +     tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>> +     intmask &= ~TPM_GLOBAL_INT_ENABLE;
>> +
>> +     tpm_tis_request_locality(chip, 0);
>> +     tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>> +     tpm_tis_relinquish_locality(chip, 0);
>> +
>> +     chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>> +}
>> +
>>  static void disable_interrupts(struct tpm_chip *chip)
> 
> Add tpm_ prefix here too. It makes tracing/grepping/etc so much nicer.

Ok.

> 
>>  {
>>       struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> -     u32 intmask;
> 
> int_mask is more readable

Ok.

> 
>> -     int rc;
>>
>>       if (priv->irq == 0)
>>               return;
>>
>> -     rc = tpm_tis_read32(priv, TPM_INT_ENABLE(priv->locality), &intmask);
>> -     if (rc < 0)
>> -             intmask = 0;
>> -
>> -     intmask &= ~TPM_GLOBAL_INT_ENABLE;
>> -     rc = tpm_tis_write32(priv, TPM_INT_ENABLE(priv->locality), intmask);
>> +     __tpm_tis_disable_interrupts(chip);
>>
>>       devm_free_irq(chip->dev.parent, priv->irq, chip);
>>       priv->irq = 0;
>> -     chip->flags &= ~TPM_CHIP_FLAG_IRQ;
>>  }
>>
>>  /*
>> @@ -752,6 +759,53 @@ static bool tpm_tis_req_canceled(struct tpm_chip *chip, u8 status)
>>       return status == TPM_STS_COMMAND_READY;
>>  }
>>
>> +static void tpm_tis_reenable_polling(struct tpm_chip *chip)
>> +{
>> +     struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> +
>> +     dev_warn(&chip->dev, FW_BUG
>> +              "TPM interrupt storm detected, polling instead\n");
>> +
>> +     __tpm_tis_disable_interrupts(chip);
>> +
>> +     /*
>> +      * devm_free_irq() must not be called from within the interrupt handler,
>> +      * since this function waits for running handlers to finish and thus it
>> +      * would deadlock. Instead trigger a worker that takes care of the
>> +      * unregistration.
>> +      */
>> +     schedule_work(&priv->free_irq_work);
>> +}
>> +
>> +static irqreturn_t tpm_tis_check_for_interrupt_storm(struct tpm_chip *chip)
>> +{
>> +     struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
>> +     const unsigned int MAX_UNHANDLED_IRQS = 1000;
> 
> Please declare this in the beginning of file because it is non-empirical
> tuning parameter. I do not want it to be buried here. It is now as good
> as a magic number.
> 
> Or perhaps even tpm_tis_core.h?
> 

For now that constant is only used in tpm_tis_core.c. So I would favor to define it there.

> Why MAX_UNHANDLED_IRQS is exactly 1000 and not 1? I would rollback eagerly.


Because the IRQ line may be shared with another device which has raised the
interrupt instead of the TPM. So unhandled interrupts may be legit.

Regards,
Lino




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ