lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 29 Nov 2022 16:19:52 -0500
From:   Peter Xu <peterx@...hat.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        James Houghton <jthoughton@...gle.com>,
        Jann Horn <jannh@...gle.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Rik van Riel <riel@...riel.com>,
        Nadav Amit <nadav.amit@...il.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Muchun Song <songmuchun@...edance.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for
 pmd unshare

Hi, Andrew,

On Tue, Nov 29, 2022 at 12:49:44PM -0800, Andrew Morton wrote:
> On Tue, 29 Nov 2022 14:35:16 -0500 Peter Xu <peterx@...hat.com> wrote:
> 
> > Based on latest mm-unstable (9ed079378408).
> > 
> > This can be seen as a follow-up series to Mike's recent hugetlb vma lock
> > series for pmd unsharing, but majorly covering safe use of huge_pte_offset.
> 
> We're at -rc7 (a -rc8 appears probable this time) and I'm looking to
> settle down and stabilize things...

I targeted this series for the next release not current, because there's no
known report for it per my knowledge.

The reproducer needs explicit kernel delays to trigger as mentioned in the
cover letter.  So far I didn't try to reproduce with a generic kernel yet
but just to verify the existance of the problem.

> 
> > 
> > ...
> >
> > huge_pte_offset() is always called with mmap lock held with either read or
> > write.  It was assumed to be safe but it's actually not.  One race
> > condition can easily trigger by: (1) firstly trigger pmd share on a memory
> > range, (2) do huge_pte_offset() on the range, then at the meantime, (3)
> > another thread unshare the pmd range, and the pgtable page is prone to lost
> > if the other shared process wants to free it completely (by either munmap
> > or exit mm).
> 
> That sounds like a hard-to-hit memory leak, but what we have here is a
> user-triggerable use-after-free and an oops.  Ugh.

IIUC it's not a leak, but it's just that huge_pte_offset() can walk the
(pmd-shared) pgtable page and also trying to take the pgtable lock even
though the page can already be freed in parallel, hence accessing either
the page or the pgtable lock after the pgtable page got freed.

E.g., the 1st warning was trigger by:

static inline struct lock_class *hlock_class(struct held_lock *hlock)
{
	unsigned int class_idx = hlock->class_idx;

	/* Don't re-read hlock->class_idx, can't use READ_ONCE() on bitfield */
	barrier();

	if (!test_bit(class_idx, lock_classes_in_use)) {
		/*
		 * Someone passed in garbage, we give up.
		 */
		DEBUG_LOCKS_WARN_ON(1);  <----------------------------
		return NULL;
	}
        ...
}

I think it's because the spin lock got freed along with the pgtable page,
so when we want to lock the pgtable lock we see weird lock state in
dep_map, as the lock pointer is not valid at all.

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ