lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YW2Vbkn5d6r3Y4LA@agluck-desk2.amr.corp.intel.com>
Date:   Mon, 18 Oct 2021 08:40:30 -0700
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Shuai Xue <xueshuai@...ux.alibaba.com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "james.morse@....com" <james.morse@....com>,
        "lenb@...nel.org" <lenb@...nel.org>,
        "rjw@...ysocki.net" <rjw@...ysocki.net>,
        "zhangliguang@...ux.alibaba.com" <zhangliguang@...ux.alibaba.com>,
        "zhuo.song@...ux.alibaba.com" <zhuo.song@...ux.alibaba.com>
Subject: Re: [PATCH] ACPI, APEI, EINJ: Relax platform response timeout to 1
 second.

On Sun, Oct 17, 2021 at 12:06:52PM +0800, Shuai Xue wrote:
> Hi, Tony,
> 
> Thank you for your reply.
> 
> > Spinning for 1ms was maybe ok. Spinning for up to 1s seems like a bad idea.
> >
> > This code is executed inside a mutex ... so maybe it is safe to sleep instead of spin?
> 
> May the email Subject misled you. This code do NOT spin for 1 sec. The period of the
> spinning depends on the SPIN_UNIT.

Not just the subject line. See the comment you changed here:

> > -#define SPIN_UNIT		100			/* 100ns */
> > -/* Firmware should respond within 1 milliseconds */
> > -#define FIRMWARE_TIMEOUT	(1 * NSEC_PER_MSEC)
> > +#define SPIN_UNIT		100			/* 100us */
> > +/* Firmware should respond within 1 seconds */
> > +#define FIRMWARE_TIMEOUT	(1 * USEC_PER_SEC)

That definitely reads to me that the timeout was increased from
1 millisecond to 1 second. With the old code polling for completion
every 100ns, and the new code polling every 100us
> 
> The period was 100 ns and changed to 100 us now. In my opinion, spinning for 100 ns or 100 us is OK :)

But what does the code do in between polls? The calling code is:

        for (;;) {
                rc = apei_exec_run(&ctx, ACPI_EINJ_CHECK_BUSY_STATUS);
                if (rc)
                        return rc;
                val = apei_exec_ctx_get_output(&ctx);
                if (!(val & EINJ_OP_BUSY))
                        break;
                if (einj_timedout(&timeout))
                        return -EIO;
        }

Now apei_exec_run() and apei_exec_ctx_get_output() are a maze of
functions & macros. But I don't think they can block, sleep, or
context switch.

So this code is "spinning" until either BIOS says the operation is
complete, or the FIRMWARE_TIMEOUT is reached.

It avoids triggering a watchdog by the call to touch_nmi_watchdog()
after each spin between polls. But the whole thing may be spinning
for a second.

I'm not at all sure that I'm right that the spin could be replaced
with an msleep(). It will certainly slow things down for systems
and EINJ operations that actually complete quickly (because instead
of returnining within 100ns (or 100us with your patch) it will sleep
for 1 ms (rounded up to next jiffie ... so 4 ms of HZ=250 systems.

But I don't care if my error injections take 4ms.

I do care that one logical CPU spins for 1 second.

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ