lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 10 Oct 2006 12:30:07 -0700
From:	"Chen, Kenneth W" <kenneth.w.chen@...el.com>
To:	"'Hugh Dickins'" <hugh@...itas.com>
Cc:	"'David Gibson'" <david@...son.dropbear.id.au>,
	"Andrew Morton" <akpm@...l.org>, <linux-kernel@...r.kernel.org>
Subject: RE: Hugepage regression

Hugh Dickins wrote on Tuesday, October 10, 2006 12:18 PM
> On Tue, 10 Oct 2006, Chen, Kenneth W wrote:
> > 
> > With the pending shared page table for hugetlb currently sitting in -mm,
> > we serialize the all hugetlb unmap with a per file i_mmap_lock.  This
> > race could well be solved by that pending patch?
> > 
> >
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc1/2.6.19-rc1-mm1/broken-out/shared-page-table-for-hugetlb-page-v
4.patch
> 
> Hey, nice try, Ken!  But I don't think we can let you sneak shared
> pagetables into 2.6.19 that way ;)

It wasn't my intention to sneak in shared page table, though it does
sort of look like so.


> Yes, I'd expect your i_mmap_lock to solve the problem: and since
> you're headed in that direction anyway, it makes most sense to use
> that solution rather than get into defining arrays, or sacrificing
> the lazy flush, or risking page_count races.
> 
> So please extract the __unmap_hugepage_range mods from your shared
> pagetable patch, and use that to fix the bug.

OK, I was about to do so too.


> But again, I protest
> the "if (vma->vm_file)" in your unmap_hugepage_range - how would a
> hugepage area ever have NULL vma->vm_file?

It's coming from do_mmap_pgoff(), file->f_op->mmap can fail with error
code (e.g. not enough hugetlb page) and in the error recovery path, it
nulls out vma->vm_file first before calls down to unmap_region().  I
asked that question before: can we reverse that order (call unmap_region
and then nulls out vma->vmfile and fput)?

unmap_and_free_vma:
        if (correct_wcount)
                atomic_inc(&inode->i_writecount);
        vma->vm_file = NULL;
        fput(file);

        /* Undo any partial mapping done by a device driver. */
        unmap_region(mm, vma, prev, vma->vm_start, vma->vm_end);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists