linux-kernel - Re: [PATCH] perf: Fix oops when kthread execs user process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190529164407.GA2623@hirez.programming.kicks-ass.net>
Date:   Wed, 29 May 2019 18:44:07 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Will Deacon <will.deacon@....com>
Cc:     Young Xiao <92siuyang@...il.com>, linux@...linux.org.uk,
        mark.rutland@....com, mingo@...hat.com, bp@...en8.de,
        hpa@...or.com, x86@...nel.org, kan.liang@...ux.intel.com,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        ravi.bangoria@...ux.vnet.ibm.com, mpe@...erman.id.au,
        acme@...hat.com, eranian@...gle.com, fweisbec@...il.com,
        jolsa@...hat.com
Subject: Re: [PATCH] perf: Fix oops when kthread execs user process

On Wed, May 29, 2019 at 05:25:28PM +0100, Will Deacon wrote:

> > > > On Wed, May 29, 2019 at 02:05:21PM +0100, Will Deacon wrote:
> > > > > On Wed, May 29, 2019 at 02:55:57PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > >  	if (user_mode(regs)) {
> > > > > 
> > > > > Hmm, so it just occurred to me that Mark's observation is that the regs
> > > > > can be junk in some cases. In which case, should we be checking for
> > > > > kthreads first?

> Sorry, I'm not trying to catch you out! Just trying to understand what the
> semantics are supposed to be.
> 
> I do find the concept of user_mode(regs) bizarre for the idle task. By the
> above, we definitely have a bug on arm64 (user_mode(regs) tends to be
> true for the idle task), and I couldn't figure out how you avoided it on
> x86. I guess it happens to work because the stack is zero-initialised or
> something?

So lets take the whole thing:

static void perf_sample_regs_user(struct perf_regs *regs_user,
				  struct pt_regs *regs,
				  struct pt_regs *regs_user_copy)
{
	if (user_mode(regs)) {
		regs_user->abi = perf_reg_abi(current);
		regs_user->regs = regs;
	} else if (!(current->flags & PF_KTHREAD)) {
		perf_get_regs_user(regs_user, regs, regs_user_copy);
	} else {
		regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
		regs_user->regs = NULL;
	}
}

This is called from the perf-generate-a-sample path, which is typically
an exception (IRQ/NMI/whatever) or a software/tracepoint thing.

In the exception case, the @regs argument are the exception register, as
provided by your entry.S to your exception handlers. In the
software/tracepoint thing, it is the result of
perf_arch_fetch_caller_regs().

So @regs is always 'sane' and user_mode(regs) tells us if the exception
came from userspace (and software/tracepoints always fail this, they
'obviously' don't come from userspace). If we're idle, we're not from
userspace, so this branch doesn't matter.

Next, we test if there is a userspace part _at_all_, this is the newly
minted: '!(current->flags & PF_KTHREAD)', if that passes, we use
architecture magic -- task_pt_regs() -- to get the user-regs. This can
be crap. But since the idle task will always fail our test (as would
the old one, idle->mm is always NULL), we'll never get here for idle.

Then failing the above two, as we must for idle, we'll default to
ABI_NONE/NULL.

Does that help?