lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gULZymuAuLzG74WxdEuLPqAg+HLWkJ_Wv6m3PLq6aJOg@mail.gmail.com>
Date:   Wed, 27 Oct 2021 20:24:25 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        Shuai Xue <xueshuai@...ux.alibaba.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>,
        James Morse <james.morse@....com>, Len Brown <lenb@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        luanshi <zhangliguang@...ux.alibaba.com>,
        zhuo.song@...ux.alibaba.com
Subject: Re: [PATCH v3] ACPI, APEI, EINJ: Relax platform response timeout to 1 second.

On Tue, Oct 26, 2021 at 7:05 PM Luck, Tony <tony.luck@...el.com> wrote:
>
> On Tue, Oct 26, 2021 at 03:28:29PM +0800, Shuai Xue wrote:
> > When injecting an error into the platform, the OSPM executes an
> > EXECUTE_OPERATION action to instruct the platform to begin the injection
> > operation. And then, the OSPM busy waits for a while by continually
> > executing CHECK_BUSY_STATUS action until the platform indicates that the
> > operation is complete. More specifically, the platform is limited to
> > respond within 1 millisecond right now. This is too strict for some
> > platforms.
> >
> > For example, in Arm platform, when injecting a Processor Correctable error,
> > the OSPM will warn:
> >     Firmware does not respond in time.
> >
> > And a message is printed on the console:
> >     echo: write error: Input/output error
> >
> > We observe that the waiting time for DDR error injection is about 10 ms and
> > that for PCIe error injection is about 500 ms in Arm platform.
> >
> > In this patch, we relax the response timeout to 1 second.
> >
> > Signed-off-by: Shuai Xue <xueshuai@...ux.alibaba.com>
>
> Reviewed-by: Tony Luck <tony.luck@...el.com>
>
> Rafael: Do you want to take this in the acpi tree? If not, I can
> apply it to the RAS tree (already at -rc7, so in next merge cycle
> after 5.16-rc1 comes out).

I'll queue it up for 5.16.

Thanks!

> > ---
> > Changelog v2 -> v3:
> > - Implemented the timeout in usleep_range instead of msleep.
> > - Dropped command line interface of timeout.
> > - Link to the v1 patch: https://lkml.org/lkml/2021/10/14/1402
> > ---
> >  drivers/acpi/apei/einj.c | 15 ++++++++-------
> >  1 file changed, 8 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> > index 133156759551..6e1ff4b62a8f 100644
> > --- a/drivers/acpi/apei/einj.c
> > +++ b/drivers/acpi/apei/einj.c
> > @@ -28,9 +28,10 @@
> >  #undef pr_fmt
> >  #define pr_fmt(fmt) "EINJ: " fmt
> >
> > -#define SPIN_UNIT            100                     /* 100ns */
> > -/* Firmware should respond within 1 milliseconds */
> > -#define FIRMWARE_TIMEOUT     (1 * NSEC_PER_MSEC)
> > +#define SLEEP_UNIT_MIN               1000                    /* 1ms */
> > +#define SLEEP_UNIT_MAX               5000                    /* 5ms */
> > +/* Firmware should respond within 1 seconds */
> > +#define FIRMWARE_TIMEOUT     (1 * USEC_PER_SEC)
> >  #define ACPI5_VENDOR_BIT     BIT(31)
> >  #define MEM_ERROR_MASK               (ACPI_EINJ_MEMORY_CORRECTABLE | \
> >                               ACPI_EINJ_MEMORY_UNCORRECTABLE | \
> > @@ -171,13 +172,13 @@ static int einj_get_available_error_type(u32 *type)
> >
> >  static int einj_timedout(u64 *t)
> >  {
> > -     if ((s64)*t < SPIN_UNIT) {
> > +     if ((s64)*t < SLEEP_UNIT_MIN) {
> >               pr_warn(FW_WARN "Firmware does not respond in time\n");
> >               return 1;
> >       }
> > -     *t -= SPIN_UNIT;
> > -     ndelay(SPIN_UNIT);
> > -     touch_nmi_watchdog();
> > +     *t -= SLEEP_UNIT_MIN;
> > +     usleep_range(SLEEP_UNIT_MIN, SLEEP_UNIT_MAX);
> > +
> >       return 0;
> >  }
> >
> > --
> > 2.20.1.12.g72788fdb
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ