linux-kernel - Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <08f3d2e7-4410-8cb2-351f-99a6d28836cc@suse.cz>
Date:   Mon, 2 Jan 2017 08:39:23 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Michal Hocko <mhocko@...nel.org>,
        Michal Nazarewicz <mina86@...a86.com>
Cc:     Eric Anholt <eric@...olt.net>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        linux-stable <stable@...r.kernel.org>,
        "Robin H. Johnson" <robbat2@...is-terrarum.net>,
        Marek Szyprowski <m.szyprowski@...sung.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

On 12/30/2016 11:52 AM, Michal Hocko wrote:
> On Thu 29-12-16 23:22:20, Michal Nazarewicz wrote:
>> On Thu, Dec 29 2016, Eric Anholt wrote:
>>> Michal Hocko <mhocko@...nel.org> writes:
>>>
>>>> This has been already brought up
>>>> http://lkml.kernel.org/r/20161130092239.GD18437@dhcp22.suse.cz and there
>>>> was a proposed patch for that which ratelimited the output
>>>> http://lkml.kernel.org/r/20161130132848.GG18432@dhcp22.suse.cz resp.
>>>> http://lkml.kernel.org/r/robbat2-20161130T195244-998539995Z@orbis-terrarum.net
>>>>
>>>> then the email thread just died out because the issue turned out to be a
>>>> configuration issue. Michal indicated that the message might be useful
>>>> so dropping it completely seems like a bad idea. I do agree that
>>>> something has to be done about that though. Can we reconsider the
>>>> ratelimit thing?

Agree about ratelimiting.

>>> I agree that the rate of the message has gone up during 4.9 -- it used
>>> to be a few per second.
>>
>> Sounds like a regression which should be fixed.
>>
>> This is why I don’t think removing the message is a good idea.  If you
>> suddenly see a lot of those messages, something changed for the worse.
>> If you remove this message, you will never know.
> 
> I agree, that removing the message completely is not going to help to
> find out regressions. Swamping logs with zillions of messages is,
> however, not acceptable. It just causes even more problems. See the
> previous report.
> 
>>> However, if this is an expected path during normal operation,
>>
>> This depends on your definition of ‘expected’ and ‘normal’.
>>
>> In general, I would argue that the fact those ever happen is a bug
>> somewhere in the kernel – if memory is allocated as movable, it should
>> be movable damn it!
> 
> Yes, it should be movable but there is no guarantee it is movable
> immediately. Those pages might be pinned for some time. This is
> unavoidable AFAICS.

There was a VM_PINNED patchset some years ago from PeterZ where
long-term pins would use wrappers over get_page() that would e.g.
migrate the page from CMA blocks or movable zones. That's possible
solution, but it would always be a bit of a whack-a-mole with code that
would do longer than expected pins, but not use the VM_PINNED API.

> So while this might be a regression which should be investigated there
> should be another fix to prevent from swamping the logs as well.

Yeah, the logs indicated rather static pfn's being logged, so either
really long-term pins or maybe outright wrong migratetype used by the
allocation, possibly as regression. page_owner functionality would make
it possible to confirm the wrong migratetype and dump the allocating
stacktrace. Perhaps we can enhance the printk's here to do exactly that
automatically if page_owner is enabled, which would make it easier for
bug reporters.

If it's pinning, then it's trickier. Joonsoo added relevant tracepoints
recently, but it's easy to flood the system with tracing output,
especially when one would want backtraces of the pins.

It should be also possible to check for such problematic pages
periodically (outside of CMA attempts) via some script that would
combine kpagecount and page_owner output.