linux-kernel - Re: [PATCH v4 4/7] uaccess-buffer: add CONFIG_GENERIC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMn1gO4Qcp+t+UhUg4X6KCGbx7_Sm+w8t1_zugPMweuKSZrv7w@mail.gmail.com>
Date:   Wed, 15 Dec 2021 17:25:03 -0800
From:   Peter Collingbourne <pcc@...gle.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Andy Lutomirski <luto@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        YiFei Zhu <yifeifz2@...inois.edu>,
        Mark Rutland <mark.rutland@....com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Gabriel Krisman Bertazi <krisman@...labora.com>,
        Chris Hyser <chris.hyser@...cle.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        Arnd Bergmann <arnd@...db.de>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Alexey Gladkov <legion@...nel.org>,
        Ran Xiaokai <ran.xiaokai@....com.cn>,
        David Hildenbrand <david@...hat.com>,
        Xiaofeng Cao <caoxiaofeng@...ong.com>,
        Cyrill Gorcunov <gorcunov@...il.com>,
        Thomas Cedeno <thomascedeno@...gle.com>,
        Marco Elver <elver@...gle.com>,
        Alexander Potapenko <glider@...gle.com>,
        linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        Evgenii Stepanov <eugenis@...gle.com>
Subject: Re: [PATCH v4 4/7] uaccess-buffer: add CONFIG_GENERIC_ENTRY support

On Sat, Dec 11, 2021 at 3:50 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> Peter,
>
> On Thu, Dec 09 2021 at 14:15, Peter Collingbourne wrote:
> > @@ -197,14 +201,19 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> >  static void exit_to_user_mode_prepare(struct pt_regs *regs)
> >  {
> >       unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> > +     bool uaccess_buffer_pending;
> >
> >       lockdep_assert_irqs_disabled();
> >
> >       /* Flush pending rcuog wakeup before the last need_resched() check */
> >       tick_nohz_user_enter_prepare();
> >
> > -     if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> > +     if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) {
> > +             bool uaccess_buffer_pending = uaccess_buffer_pre_exit_loop();
> > +
> >               ti_work = exit_to_user_mode_loop(regs, ti_work);
> > +             uaccess_buffer_post_exit_loop(uaccess_buffer_pending);
>
> What? Let me look at the these two functions, which are so full of useful
> comments:
>
> > +bool __uaccess_buffer_pre_exit_loop(void)
> > +{
> > +     struct uaccess_buffer_info *buf = &current->uaccess_buffer;
> > +     struct uaccess_descriptor __user *desc_ptr;
> > +     sigset_t tmp_mask;
> > +
> > +     if (get_user(desc_ptr, buf->desc_ptr_ptr) || !desc_ptr)
> > +             return false;
> > +
> > +     current->real_blocked = current->blocked;
> > +     sigfillset(&tmp_mask);
> > +     set_current_blocked(&tmp_mask);
>
> This prevents signal delivery in exit_to_user_mode_loop(), right?

It prevents asynchronous signal delivery, same as with
sigprocmask(SIG_SETMASK, set, NULL) with a full set.

> > +     return true;
> > +}
> > +
> > +void __uaccess_buffer_post_exit_loop(void)
> > +{
> > +     unsigned long flags;
> > +
> > +     spin_lock_irqsave(&current->sighand->siglock, flags);
> > +     current->blocked = current->real_blocked;
> > +     recalc_sigpending();
>
> This restores the signal blocked mask _after_ exit_to_user_mode_loop()
> has completed, recalculates pending signals and goes out to user space
> with eventually pending signals.
>
> How is this supposed to be even remotely correct?

Please see this paragraph from the documentation:

When entering the kernel with a non-zero uaccess descriptor
address for a reason other than a syscall (for example, when
IPI'd due to an incoming asynchronous signal), any signals other
than ``SIGKILL`` and ``SIGSTOP`` are masked as if by calling
``sigprocmask(SIG_SETMASK, set, NULL)`` where ``set`` has been
initialized with ``sigfillset(set)``. This is to prevent incoming
signals from interfering with uaccess logging.

I believe that we will also go out to userspace with pending signals
when one of the signals that came in was a masked (via sigprocmask)
asynchronous signal, so this is an expected state.

> But that aside, let me look at the whole picture as I understand it from
> reverse engineering it. Yes, reverse engineering, because there are
> neither comments in the code nor any useful information in the
> changelogs of 2/7 and 4/7. Also the cover letter and the "documentation"
> are not explaining any of this and just blurb about sanitizers and how
> wonderful this all is.

The whole business with pre/post_exit_loop() is implementing the
paragraph mentioned above. I imagine that the kerneldoc comments could
be improved by referencing that paragraph.

> > @@ -70,6 +71,9 @@ static long syscall_trace_enter(struct pt_regs *regs, long syscall,
> >                       return ret;
> >       }
> >
> > +     if (work & SYSCALL_WORK_UACCESS_BUFFER_ENTRY)
> > +             uaccess_buffer_syscall_entry();
>
> This conditionally sets SYSCALL_WORK_UACCESS_BUFFER_EXIT.

Right.

> > @@ -247,6 +256,9 @@ static void syscall_exit_work(struct pt_regs *regs, unsigned long work)
> >
> >       audit_syscall_exit(regs);
> >
> > +     if (work & SYSCALL_WORK_UACCESS_BUFFER_EXIT)
> > +             uaccess_buffer_syscall_exit();
>
> When returning from the syscall and SYSCALL_WORK_UACCESS_BUFFER_EXIT is
> set, then uaccess_buffer_syscall_exit() clears
> SYSCALL_WORK_UACCESS_BUFFER_EXIT, right?

Right.

> This is called _before_ exit_to_user_mode_prepare(). So why is this
> __uaccess_buffer_pre/post_exit_loop() required at all?
>
> It's not required at all. Why?
>
> Simply because there are only two ways how exit_to_user_mode_prepare()
> can be reached:
>
>   1) When returning from a syscall
>
>   2) When returning from an interrupt which hit user mode execution
>
> #1 SYSCALL_WORK_UACCESS_BUFFER_EXIT is cleared _before_
>    exit_to_user_mode_prepare() is reached as documented above.
>
> #2 SYSCALL_WORK_UACCESS_BUFFER_EXIT cannot be set because the entry
>    to the kernel does not go through syscall_trace_enter().
>
> So what is this pre/post exit loop code about? Handle something which
> cannot happen in the first place?

The pre/post_exit_loop() functions are checking UACCESS_BUFFER_ENTRY,
which is set when prctl(PR_SET_UACCESS_DESCRIPTOR_ADDR_ADDR) has been
used to set the uaccess descriptor address address to a non-zero
value. It is a different flag from UACCESS_BUFFER_EXIT. It is
certainly possible for the ENTRY flag to be set in your 2) above,
since that flag is not normally modified while inside the kernel.

> If at all this would warrant a:
>
>         if (WARN_ON_ONCE(test_syscall_work(UACCESS_BUFFER_ENTRY)))
>                 do_something_sensible();
>
> instead of adding undocumented voodoo w/o providing any rationale. Well,
> I can see why that was not provided because there is no rationale to
> begin with.
>
> Seriously, I'm all for better instrumentation and analysis, but if the
> code provided for that is incomprehensible, uncommented and
> undocumented, then the result is worse than what we have now.

Okay, as well as improving the kerneldoc I'll add some code comments
to make it clearer what's going on.

> If you think that this qualifies as documentation:
>
> > +/*
> > + * uaccess_buffer_syscall_entry - hook to be run before syscall entry
> > + */
>
> > +/*
> > + * uaccess_buffer_syscall_exit - hook to be run after syscall exit
> > + */
>
> > +/*
> > + * uaccess_buffer_pre_exit_loop - hook to be run immediately before the
> > + * pre-kernel-exit loop that handles signals, tracing etc. Returns a bool to
> > + * be passed to uaccess_buffer_post_exit_loop.
> > + */
>
> > +/*
> > + * uaccess_buffer_post_exit_loop - hook to be run immediately after the
> > + * pre-kernel-exit loop that handles signals, tracing etc.
> > + * @pending: the bool returned from uaccess_buffer_pre_exit_loop.
> > + */
>
> then we have a very differrent understanding of what documentation
> should provide.

This was intended as interface documentation, so it doesn't go into
too many details. It could certainly be improved though by referencing
the user documentation, as I mentioned above.

Peter