lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri,  8 Jan 2021 14:22:49 -0800
From:   Tony Luck <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     Tony Luck <tony.luck@...el.com>, x86@...nel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Darren Hart <dvhart@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
        linux-mm@...ck.org
Subject: [PATCH 0/2] Fix infinite machine check loop in futex_wait_setup()

Linux can now recover from machine checks where kernel code is
doing get_user() to access application memory. But there isn't
a way to distinguish whether get_user() failed because of a page
fault or a machine check.

Thus there is a problem if any kernel code thinks it can retry
an access after doing something that would fix the page fault.

One such example (I'm sure there are more) is in futex_wait_setup()
where an attempt to read the futex with page faults disabled. Then
a retry (after dropping a lock so page faults are safe):


        ret = get_futex_value_locked(&uval, uaddr);

        if (ret) {
                queue_unlock(*hb);

                ret = get_user(uval, uaddr);

It would be good to avoid deliberately taking a second machine
check (especially as the recovery code does really bad things
and ends up in an infinite loop!).

My proposal is to add a new function arch_memory_failure()
that can be called after get_user() returns -EFAULT to allow
graceful recovery.

Futex reviewers: I just have one new call (that fixes my test
case). If you could point out other places this is needed,
that would be most helpful.

Patch roadmap:

Part 1: Add code to avoid the infinite loop in the machine check
code. Just panic if code runs into the same machine check a second
time. This should make it much easier to debug other places where
this happens.

Part 2: Add arch_memory_failure() and use it in futex_wait_setup().
[Suggestions gladly accepted for the current best way to handle the
#defines etc. to define an arch specific function to be used in
generic code]

Tony Luck (2):
  x86/mce: Avoid infinite loop for copy from user recovery
  futex, x86/mce: Avoid double machine checks

 arch/x86/include/asm/mmu.h     |  7 +++++++
 arch/x86/kernel/cpu/mce/core.c | 17 ++++++++++++++++-
 include/linux/mm.h             |  4 ++++
 include/linux/sched.h          |  3 ++-
 kernel/futex.c                 |  3 +++
 5 files changed, 32 insertions(+), 2 deletions(-)

-- 
2.21.1

Powered by blists - more mailing lists