lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 May 2023 22:46:44 +0200
From:   Lino Sanfilippo <l.sanfilippo@...bus.com>
To:     Jarkko Sakkinen <jarkko@...nel.org>,
        Lino Sanfilippo <LinoSanfilippo@....de>, peterhuewe@....de,
        jgg@...pe.ca
Cc:     jsnitsel@...hat.com, hdegoede@...hat.com, oe-lkp@...ts.linux.dev,
        lkp@...el.com, peter.ujfalusi@...ux.intel.com,
        peterz@...radead.org, linux@...ewoehner.de,
        linux-integrity@...r.kernel.org, linux-kernel@...r.kernel.org,
        lukas@...ner.de, p.rosenberger@...bus.com
Subject: Re: [PATCH 1/2] tpm, tpm_tis: Handle interrupt storm

Hi,

On 23.05.23 20:53, Jarkko Sakkinen wrote:
> ATTENTION: This e-mail is from an external sender. Please check attachments and links before opening e.g. with mouseover.
> 
> 
> On Mon May 22, 2023 at 5:31 PM EEST, Lino Sanfilippo wrote:
>> From: Lino Sanfilippo <l.sanfilippo@...bus.com>
>>
>> Commit e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") enabled
>> interrupts instead of polling on all capable TPMs. Unfortunately, on some
>> products the interrupt line is either never asserted or never deasserted.
>>
>> The former causes interrupt timeouts and is detected by
>> tpm_tis_core_init(). The latter results in interrupt storms.
>>
>> Recent reports concern the Lenovo ThinkStation P360 Tiny, Lenovo ThinkPad
>> L490 and Inspur NF5180M6:
>>
>> https://lore.kernel.org/linux-integrity/20230511005403.24689-1-jsnitsel@redhat.com/
>> https://lore.kernel.org/linux-integrity/d80b180a569a9f068d3a2614f062cfa3a78af5a6.camel@kernel.org/
>>
>> The current approach to avoid those storms is to disable interrupts by
>> adding a DMI quirk for the concerned device.
>>
>> However this is a maintenance burden in the long run, so use a generic
>> approach:
> 
> I'm trying to comprehend how you evaluate, how big maintenance burden
> this would be. Adding even a few dozen table entries is not a
> maintenance burden.
> 
> On the other hand any new functionality is objectively a maintanance
> burden of some measure (applies to any functionality). So how do we know
> that taking this change is less of a maintenance burden than just add
> new table entries, as they come up?
> 

Initially this set was created as a response to this 0-day bug report which you asked me
to have a look at:

https://lore.kernel.org/linux-integrity/d80b180a569a9f068d3a2614f062cfa3a78af5a6.camel@kernel.org/

My hope was that it could also avoid some of (existing or future) DMI entries. But even if it does not
(e.g. the problem Péter Ujfalusi reported with the UPX-i11 cannot be fixed by this patch set and thus
needs the DMI quirk) we may at least avoid more bug reports due to interrupt storms once
6.4 is released. 


>> Detect an interrupt storm by counting the number of unhandled interrupts
>> within a 10 ms time interval. In case that more than 1000 were unhandled
>> deactivate interrupts, deregister the handler and fall back to polling.
> 
> I know it can be sometimes hard to evaluate but can you try to explain
> how you came up to the 10 ms sampling period and 1000 interrupt
> threshold? I just don't like abritrary numbers.

At least the 100 ms is not plucked out of thin air but its the same time period
that the generic code in note_interrupt() uses - I assume for a good reason.
Not only this number but the whole irq storm detection logic is taken from 
there: 

> 
>> This equals the implementation that handles interrupt storms in
>> note_interrupt() by means of timestamps and counters in struct irq_desc.

The number of 1000 unhandled interrupts is still far below the 99900 used in
note_interrupt() but IMHO enough to indicate that there is something seriously
wrong with interrupt processing and it is probably saver to fall back to polling.


Regards,
Lino





Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ