linux-kernel - Re: INFO: rcu detected stall in shmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 9 Oct 2018 21:11:48 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
cc:     syzbot <syzbot+77e6b28a7a7106ad0def@...kaller.appspotmail.com>,
        hannes@...xchg.org, mhocko@...nel.org, akpm@...ux-foundation.org,
        guro@...com, kirill.shutemov@...ux.intel.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        syzkaller-bugs@...glegroups.com, yang.s@...baba-inc.com
Subject: Re: INFO: rcu detected stall in shmem_fault

On Wed, 10 Oct 2018, Tetsuo Handa wrote:

> syzbot is hitting RCU stall due to memcg-OOM event.
> https://syzkaller.appspot.com/bug?id=4ae3fff7fcf4c33a47c1192d2d62d2e03efffa64
> 
> What should we do if memcg-OOM found no killable task because the allocating task
> was oom_score_adj == -1000 ? Flooding printk() until RCU stall watchdog fires 
> (which seems to be caused by commit 3100dab2aa09dc6e ("mm: memcontrol: print proper
> OOM header when no eligible victim left") because syzbot was terminating the test
> upon WARN(1) removed by that commit) is not a good behavior.
> 

Not printing anything would be the obvious solution but the ideal solution 
would probably involve

 - adding feedback to the memcg oom killer that there are no killable 
   processes,

 - adding complete coverage for memcg_oom_recover() in all uncharge paths
   where the oom memcg's page_counter is decremented, and

 - having all processes stall until memcg_oom_recover() is called so 
   looping back into try_charge() has a reasonable expectation to succeed.