lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20141204011316.GH31369@lerouge>
Date:	Thu, 4 Dec 2014 02:13:18 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Richard Guy Briggs <rgb@...hat.com>,
	Eric Paris <eparis@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: [PATCH] context_tracking: Restore previous state in schedule_user

On Wed, Dec 03, 2014 at 04:38:46PM -0800, Andy Lutomirski wrote:
> On Wed, Dec 3, 2014 at 4:30 PM, Dave Jones <davej@...hat.com> wrote:
> > On Wed, Dec 03, 2014 at 04:04:31PM -0800, Andy Lutomirski wrote:
> >  > On Wed, Dec 3, 2014 at 3:58 PM, Frederic Weisbecker <fweisbec@...il.com> wrote:
> >  > > On Wed, Dec 03, 2014 at 03:18:41PM -0800, Andy Lutomirski wrote:
> >  > >> It appears that some SCHEDULE_USER (asm for schedule_user) callers
> >  > >> in arch/x86/kernel/entry_64.S are called from RCU kernel context,
> >  > >> and schedule_user will return in RCU user context.  This causes RCU
> >  > >> warnings and possible failures.
> >  > >>
> >  > >> This is intended to be a minimal fix suitable for 3.18.
> >  > >>
> >  > >> Reported-by: Dave Jones <davej@...hat.com>
> >  > >> Cc: Oleg Nesterov <oleg@...hat.com>
> >  > >> Cc: Frédéric Weisbecker <fweisbec@...il.com>
> >  > >> Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>
> >  > >> Signed-off-by: Andy Lutomirski <luto@...capital.net>
> >  > >
> >  > > Ah, we sent it about at the same time :-)
> >  > >
> >  > > Might be too late for 3.18 though because it's not a regression.
> >
> > Wait, so how come that trace didn't start showing up until recently ?
> 
> Looking at the code, it's because int_careful has the same bug, but
> syscall_trace_leave does:
> 
>     /*
>      * We may come here right after calling schedule_user()
>      * or do_notify_resume(), in which case we can be in RCU
>      * user mode.
>      */
>     user_exit();
> 
> which means that this issue was anticipated when that comment was written.

Indeed, in fact it was expected to work as long as the code that follows the
syscall is limited to schedule_user(), syscall_trace_leave() and do_notify_resume().
But if anything else is called and uses RCU, this doesn't work anymore.

So user_enter() and user_exit() have been designed to be re-entrant on purpose.

> 
> Prior to the 3.18 seccomp changes and the _TIF_WORK typo fix, it would
> have been difficult to hit sysret_audit when context tracking was on
> (you could do it once on the way out from a syscall that enabled
> context tracking).  So this is 3.18 regression.

I see now.

So the real problem is not on schedule_user(). It's rather that __audit_syscall_exit()
should we wrapped inside user_exit()/user_enter() or exception_foo(). The latter
is safer in a sensitive patch. That would be the real and simple regression fix.
Tweaking schedule_user() is more risky.

Then, if you like, we can rethink the whole later, define syscall_trace_leave()
as the only place that calls user_enter() and all the other syscall exit functions
(schedule_user(), do_notify_resume(), __audit_syscall_exit()) can just call
exception_enter() - exception_exit() if they can be called after syscall_trace_leave().
Then finally we can make user_enter and user_exit non-reentrant after careful audit
of how other archs use it (sounds scary though).

Or better yet: if you rework the syscall exit slow path, lets call user_enter() at the
very end of the syscall.

> 
> The sysret_audit code is still totally screwed up AFAICT.  At the very
> least, the whole mess rather strongly suggests that, if both context
> tracking and audit are on, then __audit_syscall_exit is called *twice*
> on each syscall.  __audit_syscall_exit seems to be idempotent, so
> maybe no one has noticed that little glitch.
> 
> I'll ask the x86 people to include my sysret_audit removal for 3.19,
> since I think that this schedule_user change is a better last-minute
> fix than removing a whole chunk of asm.
> 
> --Andy
> 
> >
> >         Dave
> >
> 
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ