lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 May 2023 22:12:49 +0300
From:   "Jarkko Sakkinen" <jarkko@...nel.org>
To:     "Jarkko Sakkinen" <jarkko@...nel.org>,
        "Lino Sanfilippo" <LinoSanfilippo@....de>, <peterhuewe@....de>,
        <jgg@...pe.ca>
Cc:     <jsnitsel@...hat.com>, <hdegoede@...hat.com>,
        <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <peter.ujfalusi@...ux.intel.com>, <peterz@...radead.org>,
        <linux@...ewoehner.de>, <linux-integrity@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <l.sanfilippo@...bus.com>,
        <lukas@...ner.de>, <p.rosenberger@...bus.com>
Subject: Re: [PATCH 1/2] tpm, tpm_tis: Handle interrupt storm

On Tue May 23, 2023 at 9:53 PM EEST, Jarkko Sakkinen wrote:
> On Mon May 22, 2023 at 5:31 PM EEST, Lino Sanfilippo wrote:
> > From: Lino Sanfilippo <l.sanfilippo@...bus.com>
> >
> > Commit e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") enabled
> > interrupts instead of polling on all capable TPMs. Unfortunately, on some
> > products the interrupt line is either never asserted or never deasserted.
> >
> > The former causes interrupt timeouts and is detected by
> > tpm_tis_core_init(). The latter results in interrupt storms.
> >
> > Recent reports concern the Lenovo ThinkStation P360 Tiny, Lenovo ThinkPad
> > L490 and Inspur NF5180M6:
> >
> > https://lore.kernel.org/linux-integrity/20230511005403.24689-1-jsnitsel@redhat.com/
> > https://lore.kernel.org/linux-integrity/d80b180a569a9f068d3a2614f062cfa3a78af5a6.camel@kernel.org/
> >
> > The current approach to avoid those storms is to disable interrupts by
> > adding a DMI quirk for the concerned device.
> >
> > However this is a maintenance burden in the long run, so use a generic
> > approach:
>
> I'm trying to comprehend how you evaluate, how big maintenance burden
> this would be. Adding even a few dozen table entries is not a
> maintenance burden.
>
> On the other hand any new functionality is objectively a maintanance
> burden of some measure (applies to any functionality). So how do we know
> that taking this change is less of a maintenance burden than just add
> new table entries, as they come up?
>
> > Detect an interrupt storm by counting the number of unhandled interrupts
> > within a 10 ms time interval. In case that more than 1000 were unhandled
> > deactivate interrupts, deregister the handler and fall back to polling.
>
> I know it can be sometimes hard to evaluate but can you try to explain
> how you came up to the 10 ms sampling period and 1000 interrupt
> threshold? I just don't like abritrary numbers.

Also here I wonder how you came up with this computational model. This
is not same as saying it is wrong. There's just whole stack of options.

Out of top of my head you could e.g. window average the duration between
IRQs. When the average goes beyond threshold, then you shutdown
interrupts.

The pro I would see in this that it is much easier intuitively discuss
how much there should be time in-between interrupts that the kernel
handles it, than how many IRQs you can stack into time interval, which
blows my head tbh.

BR, Jarkko

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ