linux-kernel - [PATCH v2 0/3] Fix infinite machine check loop in futex_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210111214452.1826-1-tony.luck@intel.com>
Date:   Mon, 11 Jan 2021 13:44:49 -0800
From:   Tony Luck <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Tony Luck <tony.luck@...el.com>, x86@...nel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
        linux-mm@...ck.org
Subject: [PATCH v2 0/3] Fix infinite machine check loop in futex_wait_setup()

Linux can now recover from machine checks where kernel code is
doing get_user() to access application memory. But there isn't
a way to distinguish whether get_user() failed because of a page
fault or a machine check.

Thus there is a problem if any kernel code thinks it can retry
an access after doing something that would fix the page fault.

One such example (I'm sure there are more) is in futex_wait_setup()
where an attempt to read the futex with page faults disabled. Then
a retry (after dropping a lock so page faults are safe):

        ret = get_futex_value_locked(&uval, uaddr);

        if (ret) {
                queue_unlock(*hb);

                ret = get_user(uval, uaddr);

It would be good to avoid deliberately taking a second machine
check (especially as the recovery code does really bad things
and ends up in an infinite loop!).

V2 (thanks to feedback from PeterZ) fixes this by changing get_user() to
return -ENXIO ("No such device or address") for the case where a machine
check occurred. Peter left it open which error code to use (suggesting
"-EMEMERR or whatever name we come up with"). I think the existing ENXIO
error code seems appropriate (the address being accessed has effectively
gone away). But I don't have a strong attachment if anyone thinks we
need a new code.

Callers can check for ENXIO in paths where the access would be
retried so they can avoid a second machine check.

Patch roadmap:

Part 1 (unchanged since v1):
Add code to avoid the infinite loop in the machine check
code. Just panic if code runs into the same machine check a second
time. This should make it much easier to debug other places where
this happens.

Part 2: Change recovery path for get_user() to return -ENXIO

Part 3: Fix the one case in futex code that my test case hits (I'm
sure there are more).

TBD: There are a few places in arch/x86 code that test "ret == -EFAULT"
or have "switch (ret) { case -EFAULT: }" that may benefit from an
additional check for -ENXIO. For now those will continue to crash
(just like every pre-v5.10 kernel crashed when get_user() touched
poison).

Tony Luck (3):
  x86/mce: Avoid infinite loop for copy from user recovery
  x86/mce: Add new return value to get_user() for machine check
  futex, x86/mce: Avoid double machine checks

 arch/x86/kernel/cpu/mce/core.c | 7 ++++++-
 arch/x86/lib/getuser.S         | 8 +++++++-
 arch/x86/mm/extable.c          | 1 +
 include/linux/sched.h          | 3 ++-
 kernel/futex.c                 | 5 ++++-
 5 files changed, 20 insertions(+), 4 deletions(-)

-- 
2.21.1