[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1409121812550.4178@nanos>
Date: Fri, 12 Sep 2014 19:34:06 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Dave Hansen <dave.hansen@...el.com>
cc: Qiaowei Ren <qiaowei.ren@...el.com>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
x86@...nel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v8 08/10] x86, mpx: add prctl commands PR_MPX_REGISTER,
PR_MPX_UNREGISTER
On Fri, 12 Sep 2014, Dave Hansen wrote:
> On 09/12/2014 02:24 AM, Thomas Gleixner wrote:
> > On Fri, 12 Sep 2014, Thomas Gleixner wrote:
> >> On Thu, 11 Sep 2014, Dave Hansen wrote:
> >>> Well, we use it to figure out whether we _potentially_ need to tear down
> >>> an VM_MPX-flagged area. There's no guarantee that there will be one.
> >>
> >> So what you are saying is, that if user space sets the pointer to NULL
> >> via the unregister prctl, kernel can safely ignore vmas which have the
> >> VM_MPX flag set. I really can't follow that logic.
> >>
> >> mmap_mpx();
> >> prctl(enable mpx);
> >> do lots of crap which uses mpx;
> >> prctl(disable mpx);
> >>
> >> So after that point the previous use of MPX is irrelevant, just
> >> because we set a pointer to NULL? Does it just look like crap because
> >> I do not get the big picture how all of this is supposed to work?
> >
> > do_bounds() will happily map new BTs no matter whether the prctl was
> > invoked or not. So what's the value of the prctl at all?
>
> The behavior as it stands is wrong. We should at least have the kernel
> refuse to map new BTs if the prctl() hasn't been issued. We'll fix it up.
>
> > The mapping is flagged VM_MPX. Why is this not sufficient?
>
> The comment is confusing and only speaks to half of what the if() in
> question is doing. We'll get a better comment in there. But, for the
> sake of explaining it fully:
>
> There are two mappings in play:
> 1. The mapping with the actual data, which userspace is munmap()ing or
> brk()ing away, etc... (never tagged VM_MPX)
It's not tagged that way because it is mapped by user space. This is
the directory, right?
> 2. The mapping for the bounds table *backing* the data (is tagged with
> VM_MPX)
That's the stuff, which gets magically allocated from do_bounds(). And
the reason you do that from the #BR is that user space would have to
allocate a gazillion of bound tables to make sure that every corner
case is covered. With the allocation from #BR you make that behaviour
dynamic and you just provide an empty "no bounds" table to make the
bound checker happy.
> The code ends up looking like this:
>
> vm_munmap()
> {
> do_unmap(vma); // #1 above
> if (mm->bd_addr && !(vma->vm_flags & VM_MPX))
> // lookup the backing vma (#2 above)
> vm_munmap(vma2)
> }
>
> The bd_addr check is intended to say "could the kernel have possibly
> created some VM_MPX vmas?" As you noted above, we will happily go
> creating VM_MPX vmas without mm->bd_addr being set. That's will get fixed.
>
> The VM_MPX _flags_ check on the VMA is there simply to prevent
> recursion. vm_munmap() of the VM_MPX vma is called _under_ vm_munmap()
> of the data VMA, and we've got to ensure it doesn't recurse. *This*
> part of the if() in question is not addressed in the comment. That's
> something we can fix up in the next version.
Ok, slowly I get the puzzle together :)
Now, the question is whether this magic fragile fixup is the right
thing to do in the context of unmap/brk.
So if the directory is unmapped, you want to free the bounds tables
which are referenced from the directory, i.e. those which you
allocated in do_bounds().
So you call arch_unmap() at the very end of do_unmap(). This walks the
directory to look at the entries and unmaps the bounds table which is
referenced from the directory and then clears the directory entry.
Now, I have a hard time to see how that is supposed to work.
do_unmap()
detach_vmas_to_be_unmapped()
unmap_region()
free_pgtables()
arch_unmap()
mpx_unmap()
So at the point where you try to access the directory to gather the
information about the entries which might be affected, that stuff is
unmapped already and the page tables are gone.
Brilliant idea, really. And if you run into the fault in mpx_unmap()
you plan to delegate the fixup to a work queue. How is that thing
going to find what belonged to the unmapped directory?
Even if the stuff would be accessible at that point, it is a damned
stupid idea to rely on anything userspace is providing to you. I
learned that the hard way in futex.c
The proper solution to this problem is:
do_bounds()
bd_addr = get_bd_addr_from_xsave();
bd_entry = bndstatus & ADDR_MASK:
bt = mpx_mmap(bd_addr, bd_entry, len);
set_bt_entry_in_bd(bd_entry, bt);
And in mpx_mmap()
.....
vma = find_vma();
vma->bd_addr = bd_addr;
vma->bd_entry = bd_entry;
Now on mpx_unmap()
for_each_vma()
if (is_affected(vma->bd_addr, vma->bd_entry))
unmap(vma);
That does not require a prctl, no fault handling in the unmap path, it
just works and is robust by design because it does not rely on any
user space crappola. You store the directory context at allocation
time and free it when that context goes away. It's that simple, really.
So you can still think about a prctl in order to enable/disable the
automatic mapping stuff, but that's a completely different story.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists