lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM3twVRbhGL8pj0oa9NOu4pO2FWx3tTu928pW0g5CiE-K-meYw@mail.gmail.com>
Date:   Wed, 28 Aug 2019 13:04:00 -0700
From:   Edward Chron <echron@...sta.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Michal Hocko <mhocko@...nel.org>, Qian Cai <cai@....pw>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Roman Gushchin <guro@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Ivan Delalande <colona@...sta.com>
Subject: Re: [PATCH 00/10] OOM Debug print selection and additional information

On Wed, Aug 28, 2019 at 3:12 AM Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
>
> On 2019/08/28 16:08, Michal Hocko wrote:
> > On Tue 27-08-19 19:47:22, Edward Chron wrote:
> >> For production systems installing and updating EBPF scripts may someday
> >> be very common, but I wonder how data center managers feel about it now?
> >> Developers are very excited about it and it is a very powerful tool but can I
> >> get permission to add or replace an existing EBPF on production systems?
> >
> > I am not sure I understand. There must be somebody trusted to take care
> > of systems, right?
> >
>
> Speak of my cases, those who take care of their systems are not developers.
> And they afraid changing code that runs in kernel mode. They unlikely give
> permission to install SystemTap/eBPF scripts. As a result, in many cases,
> the root cause cannot be identified.

+1. Exactly. The only thing we could think of Tetsuo is if Linux OOM Reporting
uses a an eBPF script then systems have to load them to get any kind of
meaningful report. Frankly, if using eBPF is the route to go than essentially
the whole OOM reporting should go there. We can adjust as we need and
have precedent for wanting to load the script. That's the best we could come
up with.

>
> Moreover, we are talking about OOM situations, where we can't expect userspace
> processes to work properly. We need to dump information we want, without
> counting on userspace processes, before sending SIGKILL.

+1. We've tried and as you point out and for best results the kernel
has to provide
 the state.

Again a full system dump would be wonderful, but taking a full dump for
every OOM event on production systems? I am not nearly a good enough salesman
to sell that one. So we need an alternate mechanism.

If we can't agree on some sort of extensible, configurable approach then put
the standard OOM Report in eBPF and make it mandatory to load it so we can
justify having to do that. Linux should load it automatically.
We'll just make a few changes and additions as needed.

Sounds like a plan that we could live with.
Would be interested if this works for others as well.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ