linux-kernel - RE: [PATCH 1/2] sched/wait: Break up long wake list walk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <37D7C6CF3E00A74B8858931C1DB2F07753787920@SHSMSX103.ccr.corp.intel.com>
Date:   Fri, 18 Aug 2017 13:06:04 +0000
From:   "Liang, Kan" <kan.liang@...el.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
CC:     Tim Chen <tim.c.chen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...e.hu>, Andi Kleen <ak@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Johannes Weiner" <hannes@...xchg.org>, Jan Kara <jack@...e.cz>,
        linux-mm <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 1/2] sched/wait: Break up long wake list walk



> On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan <kan.liang@...el.com> wrote:
> >
> > Here is the call stack of wait_on_page_bit_common when the queue is
> > long (entries >1000).
> >
> > # Overhead  Trace output
> > # ........  ..................
> > #
> >    100.00%  (ffffffff931aefca)
> >             |
> >             ---wait_on_page_bit
> >                __migration_entry_wait
> >                migration_entry_wait
> >                do_swap_page
> >                __handle_mm_fault
> >                handle_mm_fault
> >                __do_page_fault
> >                do_page_fault
> >                page_fault
> 
> Hmm. Ok, so it does seem to very much be related to migration. Your
> wake_up_page_bit() profile made me suspect that, but this one seems to
> pretty much confirm it.
> 
> So it looks like that wait_on_page_locked() thing in __migration_entry_wait(),
> and what probably happens is that your load ends up triggering a lot of
> migration (or just migration of a very hot page), and then *every* thread
> ends up waiting for whatever page that ended up getting migrated.
> 
> And so the wait queue for that page grows hugely long.
> 
> Looking at the other profile, the thing that is locking the page (that everybody
> then ends up waiting on) would seem to be
> migrate_misplaced_transhuge_page(), so this is _presumably_ due to NUMA
> balancing.
> 
> Does the problem go away if you disable the NUMA balancing code?
> 

Yes, the problem goes away when NUMA balancing is disabled.


Thanks,
Kan