lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA1_w3_zGTXQd2dyr1EH51_avHXr5KresgM44B9bASF2CgTAcQ@mail.gmail.com>
Date:   Thu, 11 Apr 2019 09:51:21 +0300
From:   Juha-Matti Tilli <juha-matti.tilli@...eca.com>
To:     David Miller <davem@...emloft.net>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Juha-Matti Tilli <juha-matti.tilli@...eca.com>,
        LKML <linux-kernel@...r.kernel.org>,
        netdev <netdev@...r.kernel.org>,
        Rafael Aquini <aquini@...hat.com>,
        Murphy Zhou <xzhou@...hat.com>,
        Yongcheng Yang <yoyang@...hat.com>,
        Jianhong Yin <jiyin@...hat.com>
Subject: Re: [PATCH] net: add big honking pfmemalloc OOM warning

On Wed, Apr 10, 2019 at 10:11 PM David Miller <davem@...emloft.net> wrote:
> > SNMP counters are per netns, and more useful in the modern computing
> > era,  where a host is shared by many different containers.
>
> +1  There is no way I am applying this patch.
>
> The kernel should not "big honking" anything in the logs.

Just to check, is the opposition to the patch related to the
expectation that it will log the condition too often despite the rate
limit, if many packets are dropped? Because if it is, that might be
possible to fix.

I think it might be possible to check the SNMP counter value, and if
zero, log the first instance of pfmemalloc drop, and then omit logging
afterwards. There could be race conditions, so in the absolute worst
case, you could have let's say 2 or 3 of these log lines instead of 1,
but I don't see that as an issue, because 99% of the time there would
be just one, and 2 or 3 lines won't fill the logs.

In our case, the existence of such a log message and the helpful
suggestion to bump up vm.min_free_kbytes would have saved us
approximately one month of debugging (or 2-3 weeks if the SNMP counter
was there in this kernel version). Even one such log message would be
enough. Our production systems were hanging daily during this
debugging happening.

In my opinion, the ideal count of pfmemalloc drops is exactly 0, and
the interesting event is the first instance of pfmemalloc drop
occurring.

If there's a bug in the kernel, I think the user should be notified,
so I see this as similar to some WARN_ON line -- which is even more
"big honking" log event because it's associated with a backtrace.

BR, Juha-Matti

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ