linux-kernel - Re: WARNING in try

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Ystnwv4M6Uh+HBKbdADAnJ6otfR0GoA20crzqV+b2onQ@mail.gmail.com>
Date:   Mon, 6 Aug 2018 11:30:37 +0200
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     syzbot <syzbot+bab151e82a4e973fa325@...kaller.appspotmail.com>,
        cgroups@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>
Subject: Re: WARNING in try_charge

On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko <mhocko@...nel.org> wrote:
> On Sat 04-08-18 06:33:02, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    d1e0b8e0cb7a Add linux-next specific files for 20180725
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c770400000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325
>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+bab151e82a4e973fa325@...kaller.appspotmail.com
>>
>> Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB,
>> file-rss:32768kB, shmem-rss:0kB
>> oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB,
>> file-rss:32000kB, shmem-rss:0kB
>
> More interesting stuff is higher in the kernel log
> : [  366.435015] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
> : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>
> Are you sure you want to have hard limit set to 0?

syzkaller really does not mind to have it.

> : [  366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
>
> There are only 3 pages charged to this memcg!
>
> : [  366.487490] Tasks state (memory values in pages):
> : [  366.492349] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> : [  366.501237] [  23766]     0 23766    17620     8221   126976        0             0 syz-executor3
> : [  366.510367] [  23767]     0 23767    17618     8218   126976        0             0 syz-executor2
> : [  366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) score 8252000 or sacrifice child
> : [  366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB
> : [  366.540456] oom_reaper: reaped process 23766 (syz-executor3), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
>
> The oom reaper cannot reclaim file backed memory  from a large part. I
> assume this is are shared mappings which are living outside of memcg
> because of the counter.
>
> : [...]
> : [  367.085870] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0
> : [  367.100073] memory: usage 112kB, limit 0kB, failcnt 1615
> : [  367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
> : [  367.138136] Tasks state (memory values in pages):
> : [  367.142986] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
> : [  367.151889] [  23766]     0 23766    17620     8002   126976        0             0 syz-executor3
> : [  367.160946] [  23767]     0 23767    17618     8218   126976        0             0 syz-executor2
> : [  367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) score 8249000 or sacrifice child
> : [  367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB
> : [  367.192101] oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
> : [  367.202986] ------------[ cut here ]------------
> : [  367.207845] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug.
> : [  367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680
> : [  367.227540] Kernel panic - not syncing: panic_on_warn set ...
>
> This is unexpected though. We have killed a task (23767) which is trying
> to charge the memory which means it should
> trigger the charge retry and that one should force the charge
>
>         /*
>          * Unlike in global OOM situations, memcg is not in a physical
>          * memory shortage.  Allow dying and OOM-killed tasks to
>          * bypass the last charges so that they can exit quickly and
>          * free their memory.
>          */
>         if (unlikely(tsk_is_oom_victim(current) ||
>                      fatal_signal_pending(current) ||
>                      current->flags & PF_EXITING))
>                 goto force;
>
> There doesn't seem to be any other sign of OOM killer invocation which
> could then indeed lead to the warning as there is no other task to kill
> (both syz-executor[23] have been killed and oom_reaped already). So I
> would be curious what happened between 367.180119 which was the last
> successful oom invocation and 367.207845. An additional printk in
> mem_cgroup_out_of_memory might tell us more.
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..852cd3dbdcd9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
>         bool ret;
>
>         mutex_lock(&oom_lock);
> +       pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
> +                       current->comm, current->pid, tsk_is_oom_victim(current));
>         ret = out_of_memory(&oc);
>         mutex_unlock(&oom_lock);
>         return ret;
>
> Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
> and basically no memory charged by existing tasks is not going to fly
> and the warning is exactly to call that out.


Please-please-please do not mix kernel bugs and notices to user into
the same bucket:

https://lore.kernel.org/patchwork/patch/949071/