lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 22 Dec 2016 19:34:39 -0800 (PST)
From:   Hugh Dickins <hughd@...gle.com>
To:     Dashi DS1 Cao <caods1@...ovo.com>
cc:     Hugh Dickins <hughd@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Michal Hocko <mhocko@...nel.org>,
        Sasha Levin <alexander.levin@...izon.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: A small window for a race condition in
 mm/rmap.c:page_lock_anon_vma_read

On Fri, 23 Dec 2016, Dashi DS1 Cao wrote:

> The kernel version is "RELEASE: 3.10.0-327.36.3.el7.x86_64". It was the latest kernel release of CentOS 7.2 at that time, or maybe still now.

Okay, thanks: so, basically a v3.10 kernel, with lots of added patches,
but also lacking many more recent fixes.

> I've tried to print the value of anon_vma from other three dumps, but the content is not available in the dump. 
> "gdb: page excluded: kernel virtual address: ffff882b47ddadc0"
> I guess it is not copied out because it has already been put into some unused list.

Useful info: that suggests that the anon_vma was rightly freed, and that
it's the page->_mapcount that's wrong.  The page isn't really mapped
anywhere now, but appearing to be still page_mapped(), it has tricked
page_lock_anon_vma_read() into thinking the stale anon_vma pointer is
safe to access.

That can happen if there's a race, and a page gets mapped with one pte
on top of another: only one of them will be unmapped later.  Incorrect
handling of page table entries.  But I cannot remember anywhere that
was shown to happen - beyond a project of my own, which never reached
the tree.

If it's a file page, that usually ends up as BUG_ON(page_mapped(page))
in __delete_from_page_cache() (in v3.10, changed a little later on),
when truncating or unlinking the file or unmounting the filesystem.
Those have been seen in the past, on rare occasions, but I don't
remember actually root-causing any of them.  If it's an anon page,
there is no equivalent place for such a BUG_ON.

mremap move has a tricky job to do, and might cause such a problem
if its locking were inadequate: but the only example I see since
v3.10 was dd18dbc2d42a "mm, thp: close race between mremap() and
split_huge_page()", and that used to crash in __split_huge_page().

Or see c0d73261f5c1 "mm/memory.c: use entry = ACCESS_ONCE(*pte)
in handle_pte_fault()", which brings us back to Peter's topic of
over-imaginative compilers; but none of us believed that change
really made a difference in practice.

Cc'ing Sasha Levin, long-time trinity-runner, just in case he might
remember any time when a BUG_ON(page_mapped(page)) was really solved:
if so, there's a chance the explanation might also apply to anonymous
pages, and be responsible for your page_lock_anon_vma_read() crashes.

Hugh

Powered by blists - more mailing lists