[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211123051658.3195589-6-pcc@google.com>
Date: Mon, 22 Nov 2021 21:16:58 -0800
From: Peter Collingbourne <pcc@...gle.com>
To: Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...nel.org>,
Kees Cook <keescook@...omium.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Masahiro Yamada <masahiroy@...nel.org>,
Sami Tolvanen <samitolvanen@...gle.com>,
YiFei Zhu <yifeifz2@...inois.edu>,
Colin Ian King <colin.king@...onical.com>,
Mark Rutland <mark.rutland@....com>,
Frederic Weisbecker <frederic@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Andrey Konovalov <andreyknvl@...il.com>,
Peter Collingbourne <pcc@...gle.com>,
Gabriel Krisman Bertazi <krisman@...labora.com>,
Chris Hyser <chris.hyser@...cle.com>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Chris Wilson <chris@...is-wilson.co.uk>,
Arnd Bergmann <arnd@...db.de>,
Dmitry Vyukov <dvyukov@...gle.com>,
Christian Brauner <christian.brauner@...ntu.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Alexey Gladkov <legion@...nel.org>,
Ran Xiaokai <ran.xiaokai@....com.cn>,
David Hildenbrand <david@...hat.com>,
Xiaofeng Cao <caoxiaofeng@...ong.com>,
Cyrill Gorcunov <gorcunov@...il.com>,
Thomas Cedeno <thomascedeno@...gle.com>,
Marco Elver <elver@...gle.com>,
Alexander Potapenko <glider@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
Evgenii Stepanov <eugenis@...gle.com>
Subject: [PATCH v2 5/5] Documentation: document uaccess logging
Add documentation for the uaccess logging feature.
Link: https://linux-review.googlesource.com/id/Ia626c0ca91bc0a3d8067d7f28406aa40693b65a2
Signed-off-by: Peter Collingbourne <pcc@...gle.com>
---
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/uaccess-logging.rst | 149 ++++++++++++++++++
2 files changed, 150 insertions(+)
create mode 100644 Documentation/admin-guide/uaccess-logging.rst
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 1bedab498104..4f6ee447ab2f 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -54,6 +54,7 @@ ABI will be found here.
:maxdepth: 1
sysfs-rules
+ uaccess-logging
The rest of this manual consists of various unordered guides on how to
configure specific aspects of kernel behavior to your liking.
diff --git a/Documentation/admin-guide/uaccess-logging.rst b/Documentation/admin-guide/uaccess-logging.rst
new file mode 100644
index 000000000000..4b2b297afc00
--- /dev/null
+++ b/Documentation/admin-guide/uaccess-logging.rst
@@ -0,0 +1,149 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===============
+Uaccess Logging
+===============
+
+Background
+----------
+
+Userspace tools such as sanitizers (ASan, MSan, HWASan) and tools
+making use of the ARM Memory Tagging Extension (MTE) need to
+monitor all memory accesses in a program so that they can detect
+memory errors. Furthermore, fuzzing tools such as syzkaller need to
+monitor all memory accesses so that they know which parts of memory
+to fuzz. For accesses made purely in userspace, this is achieved
+via compiler instrumentation, or for MTE, via direct hardware
+support. However, accesses made by the kernel on behalf of the user
+program via syscalls (i.e. uaccesses) are normally invisible to
+these tools.
+
+Traditionally, the sanitizers have handled this by interposing the libc
+syscall stubs with a wrapper that checks the memory based on what we
+believe the uaccesses will be. However, this creates a maintenance
+burden: each syscall must be annotated with its uaccesses in order
+to be recognized by the sanitizer, and these annotations must be
+continuously updated as the kernel changes.
+
+The kernel's uaccess logging feature provides userspace tools with
+the address and size of each userspace access, thereby allowing these
+tools to report memory errors involving these accesses without needing
+annotations for every syscall.
+
+By relying on the kernel's actual uaccesses, rather than a
+reimplementation of them, the userspace memory safety tools may
+play a dual role of verifying the validity of kernel accesses. Even
+a sanitizer whose syscall wrappers have complete knowledge of the
+kernel's intended API may vary from the kernel's actual uaccesses due
+to kernel bugs. A sanitizer with knowledge of the kernel's actual
+uaccesses may produce more accurate error reports that reveal such
+bugs. For example, a kernel that accesses more memory than expected
+by the userspace program could indicate that either userspace or the
+kernel has the wrong idea about which kernel functionality is being
+requested -- either way, there is a bug.
+
+Interface
+---------
+
+The feature may be used via the following prctl:
+
+.. code-block:: c
+
+ uint64_t addr = 0; /* Generally will be a TLS slot or equivalent */
+ prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &addr, 0, 0, 0);
+
+Supplying a non-zero address as the second argument to ``prctl``
+will cause the kernel to read an address (referred to as the *uaccess
+descriptor address*) from that address on each kernel entry.
+
+When entering the kernel with a non-zero uaccess descriptor address
+to handle a syscall, the kernel will read a data structure of type
+``struct uaccess_descriptor`` from the uaccess descriptor address,
+which is defined as follows:
+
+.. code-block:: c
+
+ struct uaccess_descriptor {
+ uint64_t addr, size;
+ };
+
+This data structure contains the address and size (in array elements)
+of a *uaccess buffer*, which is an array of data structures of type
+``struct uaccess_buffer_entry``. Before returning to userspace, the
+kernel will log information about uaccesses to sequential entries
+in the uaccess buffer. It will also store ``NULL`` to the uaccess
+descriptor address, and store the address and size of the unused
+portion of the uaccess buffer to the uaccess descriptor.
+
+The format of a uaccess buffer entry is defined as follows:
+
+.. code-block:: c
+
+ struct uaccess_buffer_entry {
+ uint64_t addr, size, flags;
+ };
+
+The meaning of ``addr`` and ``size`` should be obvious. On arm64,
+tag bits are preserved in the ``addr`` field. There is currently
+one flag bit assignment for the ``flags`` field:
+
+.. code-block:: c
+
+ #define UACCESS_BUFFER_FLAG_WRITE 1
+
+This flag is set if the access was a write, or clear if it was a
+read. The meaning of all other flag bits is reserved.
+
+When entering the kernel with a non-zero uaccess descriptor
+address for a reason other than a syscall (for example, when
+IPI'd due to an incoming asynchronous signal), any signals other
+than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling
+``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been
+initialized with ``sigfillset(set)``. This is to prevent incoming
+signals from interfering with uaccess logging.
+
+Example
+-------
+
+Here is an example of a code snippet that will enumerate the accesses
+performed by a ``uname(2)`` syscall:
+
+.. code-block:: c
+
+ struct uaccess_buffer_entry entries[64];
+ struct uaccess_descriptor desc;
+ uint64_t desc_addr = 0;
+ prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR, &desc_addr, 0, 0, 0);
+
+ desc.addr = (uint64_t)&entries;
+ desc.size = 64;
+ desc_addr = (uint64_t)&desc;
+
+ struct utsname un;
+ uname(&un);
+
+ struct uaccess_buffer_entry* entries_end = (struct uaccess_buffer_entry*)desc.addr;
+ for (struct uaccess_buffer_entry* entry = entries; entry != entries_end; ++entry) {
+ printf("%s at 0x%lx size 0x%lx\n", entry->flags & UACCESS_BUFFER_FLAG_WRITE ? "WRITE" : "READ",
+ (unsigned long)entry->addr, (unsigned long)entry->size);
+ }
+
+Limitations
+-----------
+
+This feature is currently only supported on the arm64, s390 and x86
+architectures.
+
+Uaccess buffers are a "best-effort" mechanism for logging uaccesses. Of
+course, not all of the accesses may fit in the buffer, but aside from
+that, not all internal kernel APIs that access userspace memory are
+covered. Therefore, userspace programs should tolerate unreported
+accesses.
+
+On the other hand, the kernel guarantees that it will not
+(intentionally) report accessing more data than it is specified
+to read. For example, if the kernel implements a syscall that is
+specified to read a data structure of size ``N`` bytes by first
+reading a page's worth of data and then only using the first ``N``
+bytes from it, the kernel will either report reading ``N`` bytes or
+not report the access at all.
--
2.34.0.rc2.393.gf8c9666880-goog
Powered by blists - more mailing lists