[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMn1gO4Qcp+t+UhUg4X6KCGbx7_Sm+w8t1_zugPMweuKSZrv7w@mail.gmail.com>
Date: Wed, 15 Dec 2021 17:25:03 -0800
From: Peter Collingbourne <pcc@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Andy Lutomirski <luto@...nel.org>,
Kees Cook <keescook@...omium.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Masahiro Yamada <masahiroy@...nel.org>,
Sami Tolvanen <samitolvanen@...gle.com>,
YiFei Zhu <yifeifz2@...inois.edu>,
Mark Rutland <mark.rutland@....com>,
Frederic Weisbecker <frederic@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Andrey Konovalov <andreyknvl@...il.com>,
Gabriel Krisman Bertazi <krisman@...labora.com>,
Chris Hyser <chris.hyser@...cle.com>,
Daniel Vetter <daniel.vetter@...ll.ch>,
Chris Wilson <chris@...is-wilson.co.uk>,
Arnd Bergmann <arnd@...db.de>,
Dmitry Vyukov <dvyukov@...gle.com>,
Christian Brauner <christian.brauner@...ntu.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Alexey Gladkov <legion@...nel.org>,
Ran Xiaokai <ran.xiaokai@....com.cn>,
David Hildenbrand <david@...hat.com>,
Xiaofeng Cao <caoxiaofeng@...ong.com>,
Cyrill Gorcunov <gorcunov@...il.com>,
Thomas Cedeno <thomascedeno@...gle.com>,
Marco Elver <elver@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
Evgenii Stepanov <eugenis@...gle.com>
Subject: Re: [PATCH v4 4/7] uaccess-buffer: add CONFIG_GENERIC_ENTRY support
On Sat, Dec 11, 2021 at 3:50 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> Peter,
>
> On Thu, Dec 09 2021 at 14:15, Peter Collingbourne wrote:
> > @@ -197,14 +201,19 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> > static void exit_to_user_mode_prepare(struct pt_regs *regs)
> > {
> > unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> > + bool uaccess_buffer_pending;
> >
> > lockdep_assert_irqs_disabled();
> >
> > /* Flush pending rcuog wakeup before the last need_resched() check */
> > tick_nohz_user_enter_prepare();
> >
> > - if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> > + if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) {
> > + bool uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
> > +
> > ti_work = exit_to_user_mode_loop(regs, ti_work);
> > + uaccess_buffer_post_exit_loop(uaccess_buffer_pending);
>
> What? Let me look at the these two functions, which are so full of useful
> comments:
>
> > +bool __uaccess_buffer_pre_exit_loop(void)
> > +{
> > + struct uaccess_buffer_info *buf = ¤t->uaccess_buffer;
> > + struct uaccess_descriptor __user *desc_ptr;
> > + sigset_t tmp_mask;
> > +
> > + if (get_user(desc_ptr, buf->desc_ptr_ptr) || !desc_ptr)
> > + return false;
> > +
> > + current->real_blocked = current->blocked;
> > + sigfillset(&tmp_mask);
> > + set_current_blocked(&tmp_mask);
>
> This prevents signal delivery in exit_to_user_mode_loop(), right?
It prevents asynchronous signal delivery, same as with
sigprocmask(SIG_SETMASK, set, NULL) with a full set.
> > + return true;
> > +}
> > +
> > +void __uaccess_buffer_post_exit_loop(void)
> > +{
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(¤t->sighand->siglock, flags);
> > + current->blocked = current->real_blocked;
> > + recalc_sigpending();
>
> This restores the signal blocked mask _after_ exit_to_user_mode_loop()
> has completed, recalculates pending signals and goes out to user space
> with eventually pending signals.
>
> How is this supposed to be even remotely correct?
Please see this paragraph from the documentation:
When entering the kernel with a non-zero uaccess descriptor
address for a reason other than a syscall (for example, when
IPI'd due to an incoming asynchronous signal), any signals other
than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling
``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been
initialized with ``sigfillset(set)``. This is to prevent incoming
signals from interfering with uaccess logging.
I believe that we will also go out to userspace with pending signals
when one of the signals that came in was a masked (via sigprocmask)
asynchronous signal, so this is an expected state.
> But that aside, let me look at the whole picture as I understand it from
> reverse engineering it. Yes, reverse engineering, because there are
> neither comments in the code nor any useful information in the
> changelogs of 2/7 and 4/7. Also the cover letter and the "documentation"
> are not explaining any of this and just blurb about sanitizers and how
> wonderful this all is.
The whole business with pre/post_exit_loop() is implementing the
paragraph mentioned above. I imagine that the kerneldoc comments could
be improved by referencing that paragraph.
> > @@ -70,6 +71,9 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
> > return ret;
> > }
> >
> > + if (work & SYSCALL_WORK_UACCESS_BUFFER_ENTRY)
> > + uaccess_buffer_syscall_entry();
>
> This conditionally sets SYSCALL_WORK_UACCESS_BUFFER_EXIT.
Right.
> > @@ -247,6 +256,9 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
> >
> > audit_syscall_exit(regs);
> >
> > + if (work & SYSCALL_WORK_UACCESS_BUFFER_EXIT)
> > + uaccess_buffer_syscall_exit();
>
> When returning from the syscall and SYSCALL_WORK_UACCESS_BUFFER_EXIT is
> set, then uaccess_buffer_syscall_exit() clears
> SYSCALL_WORK_UACCESS_BUFFER_EXIT, right?
Right.
> This is called _before_ exit_to_user_mode_prepare(). So why is this
> __uaccess_buffer_pre/post_exit_loop() required at all?
>
> It's not required at all. Why?
>
> Simply because there are only two ways how exit_to_user_mode_prepare()
> can be reached:
>
> 1) When returning from a syscall
>
> 2) When returning from an interrupt which hit user mode execution
>
> #1 SYSCALL_WORK_UACCESS_BUFFER_EXIT is cleared _before_
> exit_to_user_mode_prepare() is reached as documented above.
>
> #2 SYSCALL_WORK_UACCESS_BUFFER_EXIT cannot be set because the entry
> to the kernel does not go through syscall_trace_enter().
>
> So what is this pre/post exit loop code about? Handle something which
> cannot happen in the first place?
The pre/post_exit_loop() functions are checking UACCESS_BUFFER_ENTRY,
which is set when prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR) has been
used to set the uaccess descriptor address address to a non-zero
value. It is a different flag from UACCESS_BUFFER_EXIT. It is
certainly possible for the ENTRY flag to be set in your 2) above,
since that flag is not normally modified while inside the kernel.
> If at all this would warrant a:
>
> if (WARN_ON_ONCE(test_syscall_work(UACCESS_BUFFER_ENTRY)))
> do_something_sensible();
>
> instead of adding undocumented voodoo w/o providing any rationale. Well,
> I can see why that was not provided because there is no rationale to
> begin with.
>
> Seriously, I'm all for better instrumentation and analysis, but if the
> code provided for that is incomprehensible, uncommented and
> undocumented, then the result is worse than what we have now.
Okay, as well as improving the kerneldoc I'll add some code comments
to make it clearer what's going on.
> If you think that this qualifies as documentation:
>
> > +/*
> > + * uaccess_buffer_syscall_entry - hook to be run before syscall entry
> > + */
>
> > +/*
> > + * uaccess_buffer_syscall_exit - hook to be run after syscall exit
> > + */
>
> > +/*
> > + * uaccess_buffer_pre_exit_loop - hook to be run immediately before the
> > + * pre-kernel-exit loop that handles signals, tracing etc. Returns a bool to
> > + * be passed to uaccess_buffer_post_exit_loop.
> > + */
>
> > +/*
> > + * uaccess_buffer_post_exit_loop - hook to be run immediately after the
> > + * pre-kernel-exit loop that handles signals, tracing etc.
> > + * @pending: the bool returned from uaccess_buffer_pre_exit_loop.
> > + */
>
> then we have a very differrent understanding of what documentation
> should provide.
This was intended as interface documentation, so it doesn't go into
too many details. It could certainly be improved though by referencing
the user documentation, as I mentioned above.
Peter
Powered by blists - more mailing lists