linux-kernel - Re: [PATCH] kernel: introduce prctl(PR_LOG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <29f1822d-e4cd-eedb-bea8-619db1d56335@redhat.com>
Date:   Wed, 22 Sep 2021 19:46:47 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Peter Collingbourne <pcc@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        YiFei Zhu <yifeifz2@...inois.edu>,
        Colin Ian King <colin.king@...onical.com>,
        Mark Rutland <mark.rutland@....com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Gabriel Krisman Bertazi <krisman@...labora.com>,
        Balbir Singh <sblbir@...zon.com>,
        Chris Hyser <chris.hyser@...cle.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        Arnd Bergmann <arnd@...db.de>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Alexey Gladkov <legion@...nel.org>,
        Ran Xiaokai <ran.xiaokai@....com.cn>,
        Xiaofeng Cao <caoxiaofeng@...ong.com>,
        Cyrill Gorcunov <gorcunov@...il.com>,
        Thomas Cedeno <thomascedeno@...gle.com>,
        Marco Elver <elver@...gle.com>,
        Alexander Potapenko <glider@...gle.com>
Cc:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        Evgenii Stepanov <eugenis@...gle.com>
Subject: Re: [PATCH] kernel: introduce prctl(PR_LOG_UACCESS)

On 22.09.21 08:18, Peter Collingbourne wrote:
> This patch introduces a kernel feature known as uaccess logging.
> With uaccess logging, the userspace program passes the address and size
> of a so-called uaccess buffer to the kernel via a prctl(). The prctl()
> is a request for the kernel to log any uaccesses made during the next
> syscall to the uaccess buffer. When the next syscall returns, the address
> one past the end of the logged uaccess buffer entries is written to the
> location specified by the third argument to the prctl(). In this way,
> the userspace program may enumerate the uaccesses logged to the access
> buffer to determine which accesses occurred.
> 
> Uaccess logging has several use cases focused around bug detection
> tools:
> 
> 1) Userspace memory safety tools such as ASan, MSan, HWASan and tools
>     making use of the ARM Memory Tagging Extension (MTE) need to monitor
>     all memory accesses in a program so that they can detect memory
>     errors. For accesses made purely in userspace, this is achieved
>     via compiler instrumentation, or for MTE, via direct hardware
>     support. However, accesses made by the kernel on behalf of the
>     user program via syscalls (i.e. uaccesses) are invisible to these
>     tools. With MTE there is some level of error detection possible in
>     the kernel (in synchronous mode, bad accesses generally result in
>     returning -EFAULT from the syscall), but by the time we get back to
>     userspace we've lost the information about the address and size of the
>     failed access, which makes it harder to produce a useful error report.
> 
>     With the current versions of the sanitizers, we address this by
>     interposing the libc syscall stubs with a wrapper that checks the
>     memory based on what we believe the uaccesses will be. However, this
>     creates a maintenance burden: each syscall must be annotated with
>     its uaccesses in order to be recognized by the sanitizer, and these
>     annotations must be continuously updated as the kernel changes. This
>     is especially burdensome for syscalls such as ioctl(2) which have a
>     large surface area of possible uaccesses.
> 
> 2) Verifying the validity of kernel accesses. This can be achieved in
>     conjunction with the userspace memory safety tools mentioned in (1).
>     Even a sanitizer whose syscall wrappers have complete knowledge of
>     the kernel's intended API may vary from the kernel's actual uaccesses
>     due to kernel bugs. A sanitizer with knowledge of the kernel's actual
>     uaccesses may produce more accurate error reports that reveal such
>     bugs.
> 
>     An example of such a bug, which was found by an earlier version of this
>     patch together with a prototype client of the API in HWASan, was fixed
>     by commit d0efb16294d1 ("net: don't unconditionally copy_from_user
>     a struct ifreq for socket ioctls"). Although this bug turned out to
>     relatively harmless, it was a bug nonetheless and it's always possible
>     that more serious bugs of this sort may be introduced in the future.
> 
> 3) Kernel fuzzing. We may use the list of reported kernel accesses to
>     guide a kernel fuzzing tool such as syzkaller (so that it knows which
>     parts of user memory to fuzz), as an alternative to providing the tool
>     with a list of syscalls and their uaccesses (which again thanks to
>     (2) may not be accurate).
> 
> All signals except SIGKILL and SIGSTOP are masked for the interval
> between the prctl() and the next syscall in order to prevent handlers
> for intervening asynchronous signals from issuing syscalls that may
> cause uaccesses from the wrong syscall to be logged.

Stupid question: can this be exploited from user space to effectively 
disable SIGKILL for a long time ... and do we care?

Like, the application allocates a bunch of memory, issues the prctl() 
and spins in user space. What would happen if the OOM killer selects 
this task as a target and does a do_send_sig_info(SIGKILL, 
SEND_SIG_PRIV, ...) ?

-- 
Thanks,

David / dhildenb