[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120522080513.GC10829@linux.vnet.ibm.com>
Date: Tue, 22 May 2012 13:35:13 +0530
From: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: mingo@...hat.com, a.p.zijlstra@...llo.nl,
torvalds@...ux-foundation.org, peterz@...radead.org,
anton@...hat.com, rostedt@...dmis.org, tglx@...utronix.de,
oleg@...hat.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
hpa@...or.com, jkenisto@...ibm.com, andi@...stfloor.org,
hch@...radead.org, ananth@...ibm.com, vda.linux@...glemail.com,
masami.hiramatsu.pt@...achi.com, acme@...radead.org,
sfr@...b.auug.org.au, roland@...k.frob.com, mingo@...e.hu,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perf/uprobes] uprobes, mm, x86: Add the ability to
install and remove uprobes breakpoints
>
> static void unmap_single_vma(struct mmu_gather *tlb,
> struct vm_area_struct *vma, unsigned long start_addr,
> unsigned long end_addr,
> struct zap_details *details)
> {
> unsigned long start = max(vma->vm_start, start_addr);
> unsigned long end;
>
> if (start >= vma->vm_end)
> return;
> end = min(vma->vm_end, end_addr);
> if (end <= vma->vm_start)
> return;
>
> <<<<<<< HEAD
> =======
> if (vma->vm_file)
> uprobe_munmap(vma, start, end);
>
> if (vma->vm_flags & VM_ACCOUNT)
> *nr_accounted += (end - start) >> PAGE_SHIFT;
>
> >>>>>>> linux-next/akpm-base
> if (unlikely(is_pfn_mapping(vma)))
> untrack_pfn_vma(vma, 0, 0);
>
>
> It made me look at uprobes. Noticed a few things...
>
I have responded to why I had to add a callback in unmap_single_vma in
response to Linus.
> > ...
> >
> > +static struct rb_root uprobes_tree = RB_ROOT;
> > +static DEFINE_SPINLOCK(uprobes_treelock); /* serialize rbtree access */
> > +
> > +#define UPROBES_HASH_SZ 13
> > +/* serialize (un)register */
> > +static struct mutex uprobes_mutex[UPROBES_HASH_SZ];
> > +#define uprobes_hash(v) (&uprobes_mutex[((unsigned long)(v)) %\
> > + UPROBES_HASH_SZ])
> > +
> > +/* serialize uprobe->pending_list */
> > +static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ];
> > +#define uprobes_mmap_hash(v) (&uprobes_mmap_mutex[((unsigned long)(v)) %\
> > + UPROBES_HASH_SZ])
>
> Presumably these locks were hashed for scalability reasons?
Yes,
uprobe_mmap_mutex is taken on every mmap/munmap operation.
Since we do a per file operation per mm operation, (walk thro the rmap and
insert/remove breakpoints), we looked at using i_mutex. However
Christoph wasnt happy to overload the usage of i_mutex. He suggested two
options,
1. adding another mutex in the inode structure
2. adding global hash locks. (which he recommended)
Adding a mutex in the inode structure, is a overkill.
But having just one mutex to guard all uprobe_mmap is a contention on
different mmaps. So we narrowed down to a hash mutex.
>
> If so, this won't be terribly effective when we have multiple mutexes
> occupying a single cacheline - the array entries should be padded out.
> Of course, that's all a complete waste of space on uniprocessor
> machines, but nobody seems to think of that any more ;(
>
Okay, I agree that having each mutex in a different cacheline helps.
If everyone agrees to this, I will have a addon patch that will move the
mutexes.
> There was no need to code the accessor functions as macros. It is, as
> always, better to use a nice C function which takes an argument which
> is as strictly typed as possible. ie, it *could* take a void*, but it
> would be better if it required an inode*.
>
I will add this change as part of the add-on patch.
> >
> > ...
> >
> > +static int read_opcode(struct mm_struct *mm, unsigned long vaddr,
> > + uprobe_opcode_t *opcode)
> > +{
[.....]
> > + vaddr_new = kmap_atomic(page);
> > + vaddr &= ~PAGE_MASK;
> > + memcpy(opcode, vaddr_new + vaddr, uprobe_opcode_sz);
> > + kunmap_atomic(vaddr_new);
>
> This is modifying user memory? flush_dcache_page() needed? Or perhaps
> we will need different primitives to diddle the instruction memory on
> architectures which care.
>
Here, we are just reading from the user memory,
The part where we insert/remove the breakpoint (write_opcode) does the flush.
> > +int mmap_uprobe(struct vm_area_struct *vma)
> > +{
> > + struct list_head tmp_list;
> > + struct uprobe *uprobe, *u;
> > + struct inode *inode;
> > + int ret = 0;
> > +
> > + if (!atomic_read(&uprobe_events) || !valid_vma(vma, true))
> > + return ret; /* Bail-out */
> > +
> > + inode = vma->vm_file->f_mapping->host;
> > + if (!inode)
> > + return ret;
> > +
> > + INIT_LIST_HEAD(&tmp_list);
> > + mutex_lock(uprobes_mmap_hash(inode));
> > + build_probe_list(inode, &tmp_list);
> > + list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
> > + loff_t vaddr;
> > +
> > + list_del(&uprobe->pending_list);
> > + if (!ret) {
> > + vaddr = vma_address(vma, uprobe->offset);
> > + if (vaddr < vma->vm_start || vaddr >= vma->vm_end) {
> > + put_uprobe(uprobe);
> > + continue;
> > + }
> > + ret = install_breakpoint(vma->vm_mm, uprobe, vma,
> > + vaddr);
> > + if (ret == -EEXIST)
> > + ret = 0;
>
> This now has the comment "Ignore double add:". That is a poor
> comment, because it doesn't tell us *why* a double-add is ignored.
>
We actually dont ignore the "Double-add".
install_breakpoint() has comments on when we return EEXIST.
uprobe_mmap() has comments on why EEXIST should be considered successful
as part of commit 682968e0 (uprobes/core: Optimize probe hits with the
help of a counter) which is
/*
* Unable to insert a breakpoint, but
* breakpoint lies underneath. Increment the
* probe count
*/
i.e insert_breakpoint() needs to insert a breakpoint, but if a
breakpoint is already there, then it doesnt need to do anything.
I will go ahead and remove the "Ignore double-add" comment.
--
thanks and regards
Srikar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists