[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111003133710.GA28118@redhat.com>
Date: Mon, 3 Oct 2011 15:37:10 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...e.hu>,
Steven Rostedt <rostedt@...dmis.org>,
Linux-mm <linux-mm@...ck.org>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ananth N Mavinakayanahalli <ananth@...ibm.com>,
Hugh Dickins <hughd@...gle.com>,
Christoph Hellwig <hch@...radead.org>,
Jonathan Corbet <corbet@....net>,
Thomas Gleixner <tglx@...utronix.de>,
Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Jim Keniston <jkenisto@...ux.vnet.ibm.com>,
Roland McGrath <roland@...k.frob.com>,
Andi Kleen <andi@...stfloor.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 3.1.0-rc4-tip 4/26] uprobes: Define hooks for
mmap/munmap.
On 09/20, Srikar Dronamraju wrote:
>
> @@ -739,6 +740,10 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> mm->pmd_huge_pte = NULL;
> #endif
> +#ifdef CONFIG_UPROBES
> + atomic_set(&mm->mm_uprobes_count,
> + atomic_read(&oldmm->mm_uprobes_count));
Hmm. Why this can't race with install_breakpoint/remove_breakpoint
between _read and _set ?
What about VM_DONTCOPY vma's with breakpoints ?
> -static int match_uprobe(struct uprobe *l, struct uprobe *r)
> +static int match_uprobe(struct uprobe *l, struct uprobe *r, int *match_inode)
> {
> + /*
> + * if match_inode is non NULL then indicate if the
> + * inode atleast match.
> + */
> + if (match_inode)
> + *match_inode = 0;
> +
> if (l->inode < r->inode)
> return -1;
> if (l->inode > r->inode)
> return 1;
> else {
> + if (match_inode)
> + *match_inode = 1;
> +
It is very possible I missed something, but imho this looks confusing.
This close_match logic is only needed for build_probe_list() and
dec_mm_uprobes_count(), and both do not actually need the returned
uprobe.
Instead of complicating match_uprobe() and __find_uprobe(), perhaps
it makes sense to add "struct rb_node *__find_close_rb_node(inode)" ?
> +static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe)
> {
> /* Placeholder: Yet to be implemented */
> + if (!uprobe->consumers)
> + return 0;
How it is possible to see ->consumers == NULL?
OK, afaics it _is_ possible, but only because unregister does del_consumer()
without ->i_mutex, but this is bug afaics (see the previous email).
Another user is mmap_uprobe() and it checks ->consumers != NULL itself (but
see below).
> +int mmap_uprobe(struct vm_area_struct *vma)
> +{
> + struct list_head tmp_list;
> + struct uprobe *uprobe, *u;
> + struct inode *inode;
> + int ret = 0;
> +
> + if (!valid_vma(vma))
> + return ret; /* Bail-out */
> +
> + inode = igrab(vma->vm_file->f_mapping->host);
> + if (!inode)
> + return ret;
> +
> + INIT_LIST_HEAD(&tmp_list);
> + mutex_lock(&uprobes_mmap_mutex);
> + build_probe_list(inode, &tmp_list);
> + list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) {
> + loff_t vaddr;
> +
> + list_del(&uprobe->pending_list);
> + if (!ret && uprobe->consumers) {
> + vaddr = vma->vm_start + uprobe->offset;
> + vaddr -= vma->vm_pgoff << PAGE_SHIFT;
> + if (vaddr < vma->vm_start || vaddr >= vma->vm_end)
> + continue;
> + ret = install_breakpoint(vma->vm_mm, uprobe);
So. We are adding the new mapping, we should find all breakpoints this
file has in the start/end range.
We are holding ->mmap_sem... this seems enough to protect against the
races with register/unregister. Except, what if __register_uprobe()
fails? In this case __unregister_uprobe() does delete_uprobe() at the
very end. What if mmap mmap_uprobe() is called right before delete_?
> +static void dec_mm_uprobes_count(struct vm_area_struct *vma,
> + struct inode *inode)
> +{
> + struct uprobe *uprobe;
> + struct rb_node *n;
> + unsigned long flags;
> +
> + n = uprobes_tree.rb_node;
> + spin_lock_irqsave(&uprobes_treelock, flags);
> + uprobe = __find_uprobe(inode, 0, &n);
> +
> + /*
> + * If indeed there is a probe for the inode and with offset zero,
> + * then lets release its reference. (ref got thro __find_uprobe)
> + */
> + if (uprobe)
> + put_uprobe(uprobe);
> + for (; n; n = rb_next(n)) {
> + loff_t vaddr;
> +
> + uprobe = rb_entry(n, struct uprobe, rb_node);
> + if (uprobe->inode != inode)
> + break;
> + vaddr = vma->vm_start + uprobe->offset;
> + vaddr -= vma->vm_pgoff << PAGE_SHIFT;
> + if (vaddr < vma->vm_start || vaddr >= vma->vm_end)
> + continue;
> + atomic_dec(&vma->vm_mm->mm_uprobes_count);
So, this does atomic_dec() for each bp in this vma?
And the caller is
> @@ -1337,6 +1338,9 @@ unsigned long unmap_vmas(struct mmu_gather *tlb,
> if (unlikely(is_pfn_mapping(vma)))
> untrack_pfn_vma(vma, 0, 0);
>
> + if (vma->vm_file)
> + munmap_uprobe(vma);
Doesn't look right...
munmap_uprobe() assumes that the whole region goes away. This is
true in munmap() case afaics, it does __split_vma() if necessary.
But what about truncate() ? In this case this vma is not unmapped,
but unmap_vmas() is called anyway and [start, end) can be different.
IOW, unless I missed something (this is very possible) we can do
more atomic_dec's then needed.
Also, truncate() obviously changes ->i_size. Doesn't this mean
unregister_uprobe() should return if offset > i_size ? We need to
free uprobes anyway.
MADV_DONTNEED? It calls unmap_vmas() too. And application can do
madvise(DONTNEED) in a loop.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists