[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM7w-FXHWzcN1Y7pwb6+1KA6A2oZRrfpOJdWFVWjRvjbp+DEOg@mail.gmail.com>
Date: Mon, 19 Sep 2022 11:03:11 -0400
From: Sven van Ashbrook <svenva@...omium.org>
To: Dominik Brodowski <linux@...inikbrodowski.net>,
Peter Huewe <peterhuewe@....de>,
Jarkko Sakkinen <jarkko@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Herbert Xu <herbert@...dor.apana.org.au>,
Olivia Mackall <olivia@...enic.com>,
Alex Levin <levinale@...gle.com>,
Andrey Pronin <apronin@...gle.com>,
"Jason A. Donenfeld" <jason@...c4.com>,
Stephen Boyd <swboyd@...gle.com>,
Rajat Jain <rajatja@...gle.com>,
Eric Biggers <ebiggers@...gle.com>,
Petr Mladek <pmladek@...e.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
"Theodore Ts'o" <tytso@....edu>, linux-crypto@...r.kernel.org
Subject: Re: [PATCH v2 1/2] random: move add_hwgenerator_randomness()'s wait
outside function
On Fri, Sep 16, 2022 at 10:51 AM Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> The other thing that occurred to me when reading this patch in context
> of the other one is that this sleep you're removing here is not the
> only sleep in the call chain. Each hwrng driver can also sleep, and
> many do, sometimes for a long time, blocking until there's data
> available, which might happen after minutes in some cases. So maybe
> that's something to think about in context of this patchset -- that
> just moving this to a delayed worker might not actually fix the issue
> you're having with sleeps.
>
This is an excellent point. A look at tpm2_calc_ordinal_duration()
reveals that tpm_transmit() may block for 300s at a time. So when
we are using a WQ_FREEZABLE delayed_work, the PM may have to wait
for up to 300s when draining the wq on suspend. That will introduce
a lot of breakage in suspend/resume.
Dominik: in light of this, please proceed with your patch, without
rebasing it onto mine.
+ tpm maintainers Peter Huewe and Jarkko Sakkinen, a quick recap of
the problem:
- on ChromeOS we are seeing intermittent suspend/resume errors+warnings
related to activity of the core's hwrng_fillfn. this kthread keeps
runningduring suspend/resume. if this happens to kick off an bus (i2c)
transaction while the bus driver is in suspend, this triggers
a "Transfer while suspended" warning from the i2c core, followed by
an error return:
i2c_designware i2c_designware.1: Transfer while suspended
tpm tpm0: i2c transfer failed (attempt 1/3): -108
[ snip 10s of transfer failed attempts]
- in 2019, Stephen Boyd made an attempt at fixing this by making the
hwrng_fillfn kthread freezable. But a freezable thread requires
different API calls for scheduling, waiting, and timeout. This
generated regressions, so the solution had to be reverted.
https://patchwork.kernel.org/project/linux-crypto/patch/20190805233241.220521-1-swboyd@chromium.org/
- the current patch attempts to halt hwrng_fillfn during suspend by
converting it to a self-rearming delayed_work. The PM drains all
work before going into suspend. But, the potential minute-long
blocking delays in tpm make this solution infeasible.
Peter and Jarkko, can you think of a possible way forward to eliminate
the warnings+errors?
-Sven
Powered by blists - more mailing lists