lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Aug 2017 13:44:40 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     "Liang, Kan" <kan.liang@...el.com>, Mel Gorman <mgorman@...e.de>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Cc:     Tim Chen <tim.c.chen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...e.hu>, Andi Kleen <ak@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>, Jan Kara <jack@...e.cz>,
        linux-mm <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk

On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan <kan.liang@...el.com> wrote:
>
> Here is the call stack of wait_on_page_bit_common
> when the queue is long (entries >1000).
>
> # Overhead  Trace output
> # ........  ..................
> #
>    100.00%  (ffffffff931aefca)
>             |
>             ---wait_on_page_bit
>                __migration_entry_wait
>                migration_entry_wait
>                do_swap_page
>                __handle_mm_fault
>                handle_mm_fault
>                __do_page_fault
>                do_page_fault
>                page_fault

Hmm. Ok, so it does seem to very much be related to migration. Your
wake_up_page_bit() profile made me suspect that, but this one seems to
pretty much confirm it.

So it looks like that wait_on_page_locked() thing in
__migration_entry_wait(), and what probably happens is that your load
ends up triggering a lot of migration (or just migration of a very hot
page), and then *every* thread ends up waiting for whatever page that
ended up getting migrated.

And so the wait queue for that page grows hugely long.

Looking at the other profile, the thing that is locking the page (that
everybody then ends up waiting on) would seem to be
migrate_misplaced_transhuge_page(), so this is _presumably_ due to
NUMA balancing.

Does the problem go away if you disable the NUMA balancing code?

Adding Mel and Kirill to the participants, just to make them aware of
the issue, and just because their names show up when I look at blame.

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ