linux-kernel - Re: [PATCH v1 4/10] User Space Breakpoint Assistance Layer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100322214009.bb90d6e2.akpm@linux-foundation.org>
Date:	Mon, 22 Mar 2010 21:40:09 -0400
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Masami Hiramatsu <mhiramat@...hat.com>,
	Mel Gorman <mel@....ul.ie>,
	Ananth N Mavinakayanahalli <ananth@...ibm.com>,
	Jim Keniston <jkenisto@...ux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Randy Dunlap <randy.dunlap@...cle.com>
Subject: Re: [PATCH v1 4/10] User Space Breakpoint Assistance Layer

On Sat, 20 Mar 2010 19:55:41 +0530 Srikar Dronamraju <srikar@...ux.vnet.ibm.com> wrote:

> User Space Breakpoint Assistance Layer (USER_BKPT)
> 

A quick scan, just to show I was paying attention ;)

>
> ...
>
> +/*
> + * Read @nbytes at @vaddr from @tsk into @kbuf.  Return number of bytes read.
> + * Not exported, but available for use by arch-specific user_bkpt code.
> + */

You may as well finish off the kerneldoc on some of these functions.

> +int user_bkpt_read_vm(struct task_struct *tsk, unsigned long vaddr,
> +						void *kbuf, int nbytes)
> +{
> +	if (tsk == current) {
> +		int nleft = copy_from_user(kbuf, (void __user *) vaddr,
> +				nbytes);
> +		return nbytes - nleft;
> +	} else
> +		return access_process_vm(tsk, vaddr, kbuf, nbytes, 0);
> +}

copy_from_user() takes and returns an unsigned long arg but this
function is converting these to and from ints.  That's OK if we're 100%
sure that we'll never get or return an arg >2G.  Otherwise things could
get ghastly.  Please have a think.  (Dittoes for some other functionss
around here).

> +
> + * Write @nbytes from @kbuf at @vaddr in @tsk.  Return number of bytes written.
> + * Can be used to write to stack or data VM areas, but not instructions.
> + * Not exported, but available for use by arch-specific user_bkpt code.
> + */
> +int user_bkpt_write_data(struct task_struct *tsk, unsigned long vaddr,
> +					const void *kbuf, int nbytes)
> +{
> +	int nleft;
> +
> +	if (tsk == current) {
> +		nleft = copy_to_user((void __user *) vaddr, kbuf, nbytes);
> +		return nbytes - nleft;
> +	} else
> +		return access_process_vm(tsk, vaddr, (void *) kbuf,
> +							nbytes, 1);
> +}

This looks like it has the wrong interface.  It should take a `void
__user *vaddr'.  If any casting is to be done, it should be done at the
highest level so that sparse can check that the thing is used correctly
in as many places as possible.

> +/*
> + * Given an address, get its pte. Very similar to get_locked_pte except
> + * there is no spinlock involved.
> + */
> +static inline pte_t *get_pte(struct mm_struct *mm, unsigned long vaddr)

The inline is either unneeded or undesirable, I suspect.

> +{
> +	pgd_t *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte;
> +
> +	pgd = pgd_offset(mm, vaddr);
> +	pud = pud_alloc(mm, pgd, vaddr);
> +	if (!pud)
> +		return NULL;
> +	pmd = pmd_alloc(mm, pud, vaddr);
> +	if (!pmd)
> +		return NULL;
> +	pte = pte_alloc_map(mm, pmd, vaddr);
> +	return pte;
> +}
> +
> +static int write_opcode(struct task_struct *tsk, unsigned long vaddr,
> +						user_bkpt_opcode_t opcode)
> +{
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma;
> +	struct page *old_page, *new_page;
> +	void *maddr;
> +	pte_t *old_pte;
> +	int offset;
> +	int ret = -EINVAL;
> +
> +	if (!tsk)
> +		return ret;
> +
> +	mm = get_task_mm(tsk);
> +	if (!mm)
> +		return ret;
> +
> +	down_read(&mm->mmap_sem);
> +
> +	/* Read the page with vaddr into memory */
> +	ret = get_user_pages(tsk, mm, vaddr, 1, 0, 1, &old_page, &vma);
> +	if (ret <= 0) {
> +		up_read(&mm->mmap_sem);
> +		goto mmput_out;
> +	}
> +
> +	/* Allocate a page and copy contents over from the old page */
> +	new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vaddr);

It might be smarter to allocate this page outside the mmap_sem region. 
More scalable, less opportunity for weird deadlocks.

> +	if (!new_page) {
> +		ret = -ENOMEM;
> +		goto unlock_out;
> +	}
> +
> +	/*
> +	 * check if the page we are interested is read-only mapped
> +	 * Since we are interested in text pages, Our pages of interest
> +	 * should be mapped read-only.
> +	 */
> +	if ((vma->vm_flags && (VM_READ|VM_WRITE)) != VM_READ) {
> +		ret = -EINVAL;
> +		goto unlock_out;
> +	}
> +
> +	copy_user_highpage(new_page, old_page, vaddr, vma);
> +	maddr = kmap(new_page);
> +	offset = vaddr & (PAGE_SIZE - 1);
> +	memcpy(maddr + offset, &opcode, user_bkpt_opcode_sz);
> +	kunmap(new_page);

kmap_atomic() is preferred - it's faster.  kmap() is still deadlockable
I guess if a process ever kmaps two pages at the same time.  This code
doesn't do that, but kmap is still a bit sucky.

> +	old_pte = get_pte(mm, vaddr);
> +	if (!old_pte) {
> +		ret = -EINVAL;
> +		goto unlock_out;
> +	}
> +
> +	/*
> +	 * Now replace the page in vma with the new page.
> +	 * This is a text page; so, setup mapping and index.
> +	 */
> +	new_page->mapping = old_page->mapping;
> +	new_page->index = old_page->index;
> +	ret = replace_page(vma, old_page, new_page, *old_pte);
> +
> +unlock_out:
> +	if (ret != 0)
> +		page_cache_release(new_page);
> +	put_page(old_page); /* we did a get_page in the beginning */
> +	up_read(&mm->mmap_sem);
> +mmput_out:
> +	mmput(mm);
> +	return ret;
> +}
> +
>
> ...
>
> +/**
> + * user_bkpt_validate_insn_addr - Validate if the instruction is an
> + * executable vma.

It used to be the case that the above linebreak is "wrong".  (Nobody
ever tests their kerneldoc output!) I have a vague feeling that this
might have been fixed lately.  Randy?

> + * Returns 0 if the vaddr is a valid instruction address.
> + * @tsk: the probed task
> + * @vaddr: virtual address of the instruction to be verified.
> + *
> + * Possible errors:
> + * -%EINVAL: Instruction passed is not a valid instruction address.
> + */
> +int user_bkpt_validate_insn_addr(struct task_struct *tsk, unsigned long vaddr)
> +{
> +	int result;
> +
> +	result = check_vma(tsk, vaddr);
> +	if (result != 0)
> +		return result;
> +	if (arch->validate_address)
> +		result = arch->validate_address(tsk, vaddr);
> +	return result;
> +}
> +
>
> ...
>
> +int user_bkpt_insert_bkpt(struct task_struct *tsk, struct user_bkpt *user_bkpt)
> +{
> +	int result, len;
> +
> +	BUG_ON(!tsk || !user_bkpt);

If this BUG_ON triggers, we won't know which of these pointers was NULL,
which makes us sad.

> +	if (!validate_strategy(user_bkpt->strategy, USER_BKPT_HNT_MASK))
> +		return -EINVAL;
> +
>
> ...
>
> +int user_bkpt_pre_sstep(struct task_struct *tsk, struct user_bkpt *user_bkpt,
> +		struct user_bkpt_task_arch_info *tskinfo, struct pt_regs *regs)
> +{
> +	int result;
> +
> +	BUG_ON(!tsk || !user_bkpt || !regs);

ditto.

Really, there's never much point in

	BUG_ON(!some_pointer);

Just go ahead and dereference the pointer.  If it's NULL then we'll get
an oops which gives all the information which the BUG_ON would have
given.

> +	if (uses_xol_strategy(user_bkpt->strategy)) {
> +		BUG_ON(!user_bkpt->xol_vaddr);
> +		return arch->pre_xol(tsk, user_bkpt, tskinfo, regs);
> +	}
> +
> +	/*
> +	 * Single-step this instruction inline.  Replace the breakpoint
> +	 * with the original opcode.
> +	 */
> +	result = arch->set_orig_insn(tsk, user_bkpt, false);
> +	if (result == 0)
> +		arch->set_ip(regs, user_bkpt->vaddr);
> +	return result;
> +}
> +
> +/**
> + * user_bkpt_post_sstep - prepare to resume execution after single-step
> + * @tsk: the probed task
> + * @user_bkpt: the probepoint information, as with @user_bkpt_pre_sstep()
> + * @tskinfo: the @user_bkpt_task_arch_info object, if any, passed to
> + *	@user_bkpt_pre_sstep()
> + * @regs: reflects the saved state of @tsk after the single-step
> + *	operation.  @user_bkpt_post_sstep() adjusts @tsk's state as needed,
> + *	including pointing the instruction pointer at the instruction
> + *	following the probed instruction.
> + * Possible errors:
> + * -%EFAULT: Failed to read or write @tsk's address space as needed.
> + */
> +int user_bkpt_post_sstep(struct task_struct *tsk, struct user_bkpt *user_bkpt,
> +		struct user_bkpt_task_arch_info *tskinfo, struct pt_regs *regs)
> +{
> +	BUG_ON(!tsk || !user_bkpt || !regs);

More dittoes.  It's just bloat.

> +	if (uses_xol_strategy(user_bkpt->strategy))
> +		return arch->post_xol(tsk, user_bkpt, tskinfo, regs);
> +
> +	/*
> +	 * Single-stepped this instruction inline.  Put the breakpoint
> +	 * instruction back.
> +	 */
> +	return arch->set_bkpt(tsk, user_bkpt);
> +}
> +
>
> ...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/