linux-kernel - Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <29c02986-f065-d3be-f176-0c190a72bc58@I-love.SAKURA.ne.jp>
Date:   Tue, 2 May 2017 19:44:47 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     Marc MERLIN <marc@...lins.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Michal Hocko <mhocko@...nel.org>, Vlastimil Babka <vbabka@...e.cz>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Tejun Heo <tj@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of
 RAM that should be free

On 2017/05/02 13:12, Marc MERLIN wrote:
> Well, sadly, the problem is more or less back is 4.11.0. The system doesn't really 
> crash but it goes into an infinite loop with
> [34776.826800] BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 stuck for 33s!

Wow, two of workqueues are reaching max active.

[34777.202267] workqueue btrfs-endio-write: flags=0xe
[34777.218313]   pwq 16: cpus=0-7 flags=0x4 nice=0 active=8/8
[34777.236548]     in-flight: 15168:btrfs_endio_write_helper, 13855:btrfs_endio_write_helper, 3360:btrfs_endio_write_helper, 14241:btrfs_endio_write_helper, 27092:btrfs_endio_write_helper, 15194:btrfs_endio_write_helper, 15169:btrfs_endio_write_helper, 27093:btrfs_endio_write_helper
[34777.316225]     delayed: btrfs_endio_write_helper, btrfs_endio_write_helper, btrfs_endio_write_helper, btrfs_endio_write_helper, btrfs_endio_write_helper, btrfs_endio_write_helper

[34777.450684] workqueue bcache: flags=0x8
[34779.956462]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=256/256
[34779.978283]     in-flight: 15320:cached_dev_read_done [bcache], 23385:cached_dev_read_done [bcache], 23371:cached_dev_read_done [bcache], 15321:cached_dev_read_done [bcache], 15395:cached_dev_read_done [bcache], 11101:cached_dev_read_done [bcache], 15300:cached_dev_read_done [bcache], 23349:cached_dev_read_done [bcache], 23425:cached_dev_read_done [bcache], 23399:cached_dev_read_done [bcache], 15293:cached_dev_read_done [bcache], 20529:cached_dev_read_done [bcache], 15402:cached_dev_read_done [bcache], 23422:cached_dev_read_done [bcache], 23417:cached_dev_read_done [bcache], 23409:cached_dev_read_done [bcache], 20539:cached_dev_read_done [bcache], 23431:cached_dev_read_done [bcache], 20544:cached_dev_read_done [bcache], 15355:cached_dev_read_done [bcache], 11085:cached_dev_read_done [bcache], 6511:cached_dev_read_done [bcache]   

Googling with btrfs_endio_write_helper shows a stuck report with 4.8-rc5, but
seems no response ( https://www.spinics.net/lists/linux-btrfs/msg58633.html ).

> Any idea what I should do next?

Maybe you can try collecting list of all in-flight allocations with backtraces
using kmallocwd patches at
http://lkml.kernel.org/r/1489578541-81526-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
and http://lkml.kernel.org/r/201704272019.JEH26057.SHFOtMLJOOVFQF@I-love.SAKURA.ne.jp
which also tracks mempool allocations.
(Well, the

-	cond_resched();
+	//cond_resched();

change in the latter patch would not be preferable.)