[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190226213850.GA219025@google.com>
Date: Tue, 26 Feb 2019 16:38:50 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: Masami Hiramatsu <mhiramat@...nel.org>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Andy Lutomirski <luto@...capital.net>,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Changbin Du <changbin.du@...il.com>,
Jann Horn <jannh@...gle.com>,
Kees Cook <keescook@...omium.org>,
Andy Lutomirski <luto@...nel.org>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Nadav Amit <namit@...are.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH 0/4] tracing/probes: uaccess: Add support user-space
access
On Mon, Feb 25, 2019 at 11:04:42PM +0900, Masami Hiramatsu wrote:
> Hi,
>
> Here is an RFC series of probe-event to support user-space access
> methods, which we discussed in previous thread.
>
> https://lkml.kernel.org/r/20190220171019.5e81a4946b56982f324f7c45@kernel.org
>
> So in this thread, it is clear that current probe_kernel_read()
> and strncpy_from_unsafe() are not enough to access user-space
> variables from kprobe events on some arch. On such arch, user
> address space and kernel address space can overlap so we have
> to change the memory segment to user-mode before copying.
> But probe_kernel_read() is designed to access primarily kernel
> memory, it may fail to get, or get unexpected value on such
> arch. So we need to expand kprobe fetcharg to support new options
> for such case.
>
> For user-space access extension, this series adds 2 features,
> "ustring" type and user-space dereference syntax. "ustring" is
> used for recording a null-terminated string in user-space from
> kprobe events.
>
> "ustring" type is easy, it is able to use instead of "string"
> type, so if you want to record a user-space string via
> "__user char *", you can use ustring type instead of string.
> For example,
>
> echo 'p do_sys_open path=+0($arg2):ustring' >> kprobe_events
>
> will record the path string from user-space.
>
> The user-space dereference syntax is also simple. Thi just
> adds 'u' prefix before an offset value.
>
> +|-u<OFFSET>(<FETCHARG>)
>
> e.g. +u8(%ax), +u0(+0(%si))
>
> This is more generic. If you want to refer the variable in user-
> space from its address or access a field in data structure in
> user-space, you need to use this.
>
> For example, if you probe do_sched_setscheduler(pid, policy,
> param) and record param->sched_priority, you can add new
> probe as below;
>
> p do_sched_setscheduler priority=+u0($arg3)
>
> Actually, with this feature, "ustring" type is not absolutely
> necessary, because these are same meanings.
>
> +0($arg2):ustring == +u0($arg2):string
>
> Perhups, we may be better removing "ustring" and just introducing
> this user-space dereference syntax...
>
> Note that kprobe event provides these methods, but it doesn't
> change it from kernel to user automatically because we do not
> know whether the given address is in userspace or kernel on
> some arch.
> Moreover, from perf-probe, at this moment it is not able to
> switch. Since __user is not for compiler but checker, we have
> no clue which data structure is in user-space, in debuginfo.
>
> BTW, according to Linus's comment, I implemented probe_user_read()
> and strncpy_from_unsafe_user() APIs. And since those use
> "access_ok()" inside it, if CONFIG_DEBUG_ATOMIC_SLEEP=y on x86,
> it will get a warn message at once. It should be solved before
> merging this series.
I was wondering why access_ok() can sleep. In the arm64 and x86
implementation, I don't see access_ok() itself causing a user pointer
dereference access that can cause a page fault. It seems to just be checking
the validity of the ranges.
Any idea why the access_ok() code has these comments?
"Context: User context only. This function may sleep if pagefaults are
enabled."
My _guess_ is this is because whatever calls access_ok() may also call
something else that *does* fault next, if that's the case then that
WARN_ON_IN_IRQ() in access_ok() is fine, but at least I guess the comments
should be more clear that it is not access_ok() itself that sleeps.
thanks for any help on understanding this,
- Joel
>
> Thank you,
>
> ---
>
> Masami Hiramatsu (4):
> uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault
> uaccess: Add non-pagefault user-space read functions
> tracing/probe: Add ustring type for user-space string
> tracing/probe: Support user-space dereference
>
>
> Documentation/trace/kprobetrace.rst | 13 ++-
> Documentation/trace/uprobetracer.rst | 9 +-
> fs/namespace.c | 2
> include/linux/uaccess.h | 13 +++
> kernel/trace/trace.c | 7 +-
> kernel/trace/trace_kprobe.c | 65 ++++++++++++++++
> kernel/trace/trace_probe.c | 39 ++++++++--
> kernel/trace/trace_probe.h | 3 +
> kernel/trace/trace_probe_tmpl.h | 36 +++++++--
> kernel/trace/trace_uprobe.c | 19 +++++
> mm/maccess.c | 138 ++++++++++++++++++++++++++++++----
> 11 files changed, 302 insertions(+), 42 deletions(-)
>
> --
> Masami Hiramatsu (Linaro) <mhiramat@...nel.org>
Powered by blists - more mailing lists