linux-kernel - Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of handle_riprel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140401163934.GA26272@redhat.com>
Date:	Tue, 1 Apr 2014 18:39:34 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
	Ananth N Mavinakayanahalli <ananth@...ibm.com>,
	David Long <dave.long@...aro.org>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Jim Keniston <jkenisto@...ibm.com>,
	Jonathan Lebon <jlebon@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/7] uprobes/x86: Conditionalize the usage of
	handle_riprel_insn()

On 04/01, Oleg Nesterov wrote:
>
> Good point, I'll send v2.

Masami, et al, I was going to send v2 plus the next short RFC series
which fixes the problem today, but it turns out that I have no time.
Will try to do this tomorrow.

So let me explain the problem, and how (I think) it should be solved.
Unfortunately, I do not even know the terminology, so firstly I have
to explain you the things I recently learned when I investigated the
bug report ;)

Lets only discuss the "call" instruction (the one which starts with
"e8" byte), because (compared to "jmp") the fix is more complicated.
This instruction simply does

	push rip
	rip += offset_encoded_in_insn	// ignoring the length of insn

To simplify, suppose that we probe such an insn at virtual address
0x1000 and offset_encoded_in_insn == 0x10.

When this task hits the probe, the kernel simply copies this (unmodified)
insn into xol area, and the task does single step. Lets suppose that
the virtual address of xol slot == 0x2000. This means that regs->ip
becomes 0x2000 + 0x10 = 0x2010, and we only need to adjust ->ip and
return adrress.

Note that the new ->ip == 0x2010 can point to nowhere, to the unmapped
area or it can point to the kernel memory. This still works because
the task doesn't even try to execute the next insn at this address.

But! this does NOT works if the new ->ip is non-canonical (not sure this
is the common term, I mean the hole starting from 0xffff800000000000,
which I didn't know about until recently ;). In this case CPU generates
#GP and do_general_protection() kills the task which tries to execute
this insn out-of-line.

The test-case I wrote to ensure:

	void probe_func(void), callee(void);

	int failed = 1;

	asm (
		".text\n"
		".align 4096\n"
		".globl probe_func\n"
		"probe_func:\n"
		"call callee\n"
		"ret"
	);

	/*
	 * This assumes that:
	 *
	 *	- &probe_func = 0x401000 + a_bit, aligned = 0x402000
	 *
	 *	- xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
	 *	  as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
	 *
	 * so we can target the non-canonical address from xol_vma using
	 * the simple math below, 100 * 4096 is just the random offset
	 */
	asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1  + 100 * 4096\n");

	void callee(void)
	{
		failed = 0;
	}

	int main(void)
	{
		probe_func();
		return failed;
	}

It dies if you probe "probe_func" (although this test-case is not very
reliable, randomize_va_space/etc can change the placement of xol area).

Now let me describe how I am going to fix this. Please let me know if
you think this can't work or you see the better solution.

To simplify, lets ignore the fact we have lib/insn.c. The first change
will add

	bool call_emulate_op(...)
	{
		// push return adress
		if (put_user(regs->ip + 5, (long __user *)(regs->sp - 8)))
			return false;

		regs->sp -= 8;
		regs->ip += 5 + *(s32*)(auprobe->insn + 1);
		return true;
	}

and change "case 0xe8" in arch_uprobe_analyze_insn() to setup
->ops = call_xol_ops.

But this is obviously incomplete because put_user() can fail. In this case
we could send a signal an restart, but this is not simple. We do not know
which signal (say, SIGSEGV or SIGBUS ?), we do not know what should we put
into siginfo, etc. And we do not want to emulate __do_page_fault ;)

So the 2nd patch will do the following:

	1. Add "long call_offset" into "struct arch_uprobe". (yes, we
	   should also add a "union" into arch_uprobe, but lets ignore
	   the details).

	2. change "case 0xe8" in arch_uprobe_analyze_insn() to also do

		s32 *offset = (void*)(auprobe->insn + 1);

		arch_uprobe->call_offset = *offset;
		/*
		 * Turn this insn into
		 *
		 *	1: call 1b;
		 *
		 * This is only needed to expand the stack if emulate
		 * fails. We do not turn it into, say, "pushf" because
		 * we do not want to change the 1st byte used by
		 * set_orig_insn().
		 *
		*offset = -5;

	3. Change call_emulate_op() to use arch_uprobe->call_offset

	4. Add

		//
		// In practice, this will be almost never called. put_user()
		// in call_emulate_op() failed, single-step can only succeed
		// if another thread expanded our stack.
		//
		int call_post_xol_op(...)
		{
			regs->sp += 8;
			// tell the caller to restart
			return -EAGAIN;
		}

Once again, if this can work we need more changes to handle jmp's/etc. But
lets discuss this later. I am thinking in horror about conditional jmp ;)
In fact this should be simple, just I do not know (yet) to parse such an
insn, and I simply do not know if lib/insn.c can help me to figure out which
flag in regs->flags ->emulate() should check.

So this all looks a bit complicated, but I do not see a simpler solution.
Well, we could avoid ->emulate() altogether, we could just mangle the
problematic instructions and complicate arch_uprobe_post_xol(), but I do
not think this would be better. And if nothing else, ->emulate is a) simple
and b) should likely succeed and make the probing faster.

But perhaps I missed something?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/