linux-kernel - Re: [PATCH 00/10] OOM Debug print selection and additional information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM3twVR_9CgQvqpjVn6TZ0Xq3==eR_KkfMsBO1gLWarbYNtfmQ@mail.gmail.com>
Date:   Thu, 29 Aug 2019 08:48:59 -0700
From:   Edward Chron <echron@...sta.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Ivan Delalande <colona@...sta.com>
Subject: Re: [PATCH 00/10] OOM Debug print selection and additional information

On Thu, Aug 29, 2019 at 7:09 AM Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
>
> On 2019/08/29 20:56, Michal Hocko wrote:
> >> But please be aware that, I REPEAT AGAIN, I don't think neither eBPF nor
> >> SystemTap will be suitable for dumping OOM information. OOM situation means
> >> that even single page fault event cannot complete, and temporary memory
> >> allocation for reading from kernel or writing to files cannot complete.
> >
> > And I repeat that no such reporting is going to write to files. This is
> > an OOM path afterall.
>
> The process who fetches from e.g. eBPF event cannot involve page fault.
> The front-end for iovisor/bcc is a python userspace process. But I think
> that such process can't run under OOM situation.
>
> >
> >> Therefore, we will need to hold all information in kernel memory (without
> >> allocating any memory when OOM event happened). Dynamic hooks could hold
> >> a few lines of output, but not all lines we want. The only possible buffer
> >> which is preallocated and large enough would be printk()'s buffer. Thus,
> >> I believe that we will have to use printk() in order to dump OOM information.
> >> At that point,
> >
> > Yes, this is what I've had in mind.
>
> Probably I incorrectly shortcut.
>
> Dynamic hooks could hold a few lines of output, but dynamic hooks can not hold
> all lines when dump_tasks() reports 32000+ processes. We have to buffer all output
> in kernel memory because we can't complete even a page fault event triggered by
> the python process monitoring eBPF event (and writing the result to some log file
> or something) while out_of_memory() is in flight.
>
> And "set /proc/sys/vm/oom_dump_tasks to 0" is not the right reaction. What I'm
> saying is "we won't be able to hold output from dump_tasks() if output from
> dump_tasks() goes to buffer preallocated for dynamic hooks". We have to find
> a way that can handle the worst case.

With the patch series we sent the addition of vmalloc entries print
required us to
add a small piece of code to vmalloc.c but we thought this should be core
OOM Reporting function. However you want to limit which vmalloc entries you
print, probably only very large memory users. For us this generates just a few
entries and has proven useful.

The changes to limit how many processes get printed so you don't have the all
or nothing would be nice to have. It would be easiest if there was a standard
mechanism to specify which entries to print, probably by a minimum size which
is what we did. We used debugfs to set the controls but sysctl or some other
mechanism could be used.

The rest of what we did might be implemented with hooks as they only output
a line or two and I've already got rid of information we had that was
redundant.