lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 8 Jul 2019 12:35:44 +0200
From:   Max Kellermann <max@...rg.de>
To:     linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Kernel 5.1.15 stuck in compaction

Hi,

one of our web servers got repeatedly stuck in the memory compaction
code; two PHP processes have been busy at 100% inside memory
compaction after a page fault:

   100.00%     0.00%  php-cgi7.0  [kernel.vmlinux]  [k] page_fault
            |
            ---page_fault
               __do_page_fault
               handle_mm_fault
               __handle_mm_fault
               do_huge_pmd_anonymous_page
               __alloc_pages_nodemask
               __alloc_pages_slowpath
               __alloc_pages_direct_compact
               try_to_compact_pages
               compact_zone_order
               compact_zone
               |          
               |--61.30%--isolate_migratepages_block
               |          |          
               |          |--20.44%--node_page_state
               |          |          
               |          |--5.88%--compact_unlock_should_abort.isra.33
               |          |          
               |           --3.28%--_cond_resched
               |                     |          
               |                      --2.19%--rcu_all_qs
               |          
                --3.37%--pageblock_skip_persistent

ftrace:

           <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: _cond_resched <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: rcu_all_qs <-_cond_resched
           <...>-962300 [033] .... 236536.493919: compact_unlock_should_abort.isra.33 <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: pageblock_skip_persistent <-compact_zone
           <...>-962300 [033] .... 236536.493919: isolate_migratepages_block <-compact_zone
           <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493919: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493920: _cond_resched <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493920: rcu_all_qs <-_cond_resched
           <...>-962300 [033] .... 236536.493920: compact_unlock_should_abort.isra.33 <-isolate_migratepages_block
           <...>-962300 [033] .... 236536.493920: pageblock_skip_persistent <-compact_zone
           <...>-962300 [033] .... 236536.493920: isolate_migratepages_block <-compact_zone
           <...>-962300 [033] .... 236536.493920: node_page_state <-isolate_migratepages_block

Nothing useful in /proc/PID/{stack,wchan,syscall}.

slabinfo/kmalloc-{16,32} are going through the roof (~ 15 GB each),
and this memleak-lookalike triggering the oomkiller all the time is
what drew our attention to this server.

Right now, the server is still stuck, and I can attempt to collect
more information on request.

Max

Powered by blists - more mailing lists