lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Dec 2013 20:00:54 +0100
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Dave Jones <davej@...hat.com>, Oleg Nesterov <oleg@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Darren Hart <dvhart@...ux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Mel Gorman <mgorman@...e.de>
Subject: Re: process 'stuck' at exit.

Hello,

On Tue, Dec 10, 2013 at 01:57:49PM -0800, Linus Torvalds wrote:
> On Tue, Dec 10, 2013 at 1:41 PM, Dave Jones <davej@...hat.com> wrote:
> >
> > http://codemonkey.org.uk/junk/trace
> 
> Hmm. Ok, so something is calling [__]get_user_pages_fast() and
> put_page() in a loop, but the trace doesn't show what that "something"
> is, because it is itself not ever called.
> 
> However, that pattern does seem to imply that the loop is in
> get_futex_key(), because all the other loops I see seem to be calling
> other things as well.
> 
> And the __get_user_pages_fast() call implies that it's the THP case
> that triggers the "unlikely(PageTail(page))" case. And anyway,
> otherwise we'd see lock_page()/unlock_page() too.
> 
> So it looks like __get_user_pages_fast() fails, and keeps failing.
> Andrea, this is your code, any ideas? Commit a5b338f2b0b1f ("thp:
> update futex compound knowledge") to be exact.

I can only agree that the __get_user_pages_fast "write" parameter
should have been !ro instead of 1.

However it wasn't me introducing the bug, my code when patched in
early 2011 would work fine. The bug was introduced half a year later
in commit 9ea71503a8ed9184d2d0b8ccc4d269d05f7940ae .

@@ -262,8 +264,18 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key)
 
 again:
        err = get_user_pages_fast(address, 1, 1, &page);
+       /*
+        * If write access is not required (eg. FUTEX_WAIT), try
+        * and get read-only access.
+        */
+       if (err == -EFAULT && rw == VERIFY_READ) {
+               err = get_user_pages_fast(address, 1, 0, &page);
+               ro = 1;
+       }
        if (err < 0)
                return err;
+       else
+               err = 0;
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
        page_head = page;
@@ -305,6 +317,13 @@ again:
        if (!page_head->mapping) {
                unlock_page(page_head);
                put_page(page_head);
+               /*
+               * ZERO_PAGE pages don't have a mapping. Avoid a busy
        loop
+               * trying to find one. RW mapping would have COW'd (and
        thus
+               * have a mapping) so this page is RO and won't ever
        change.
+               */
+               if ((page_head == ZERO_PAGE(address)))
+                       return -EFAULT;
                goto again;
        }

[..]

This commit didn't update the __get_user_pages_fast to use !ro instead
of 1, as it would have been required after the above change.

I'll now review Oleg proposed cleanup.

Thanks for tracking this down!

Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ