lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 30 Dec 2016 08:11:12 +0100
From:   Michal Nazarewicz <mina86@...a86.com>
To:     Eric Anholt <eric@...olt.net>, Michal Hocko <mhocko@...nel.org>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-stable <stable@...r.kernel.org>,
        "Robin H. Johnson" <robbat2@...is-terrarum.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        Marek Szyprowski <m.szyprowski@...sung.com>
Subject: Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

On Thu, Dec 29 2016, Eric Anholt wrote:
> Michal Nazarewicz <mina86@...a86.com> writes:
>
>> On Thu, Dec 29 2016, Eric Anholt wrote:
>>> Michal Hocko <mhocko@...nel.org> writes:
>>>
>>>> This has been already brought up
>>>> http://lkml.kernel.org/r/20161130092239.GD18437@dhcp22.suse.cz and there
>>>> was a proposed patch for that which ratelimited the output
>>>> http://lkml.kernel.org/r/20161130132848.GG18432@dhcp22.suse.cz resp.
>>>> http://lkml.kernel.org/r/robbat2-20161130T195244-998539995Z@orbis-terrarum.net
>>>>
>>>> then the email thread just died out because the issue turned out to be a
>>>> configuration issue. Michal indicated that the message might be useful
>>>> so dropping it completely seems like a bad idea. I do agree that
>>>> something has to be done about that though. Can we reconsider the
>>>> ratelimit thing?
>>>
>>> I agree that the rate of the message has gone up during 4.9 -- it used
>>> to be a few per second.
>>
>> Sounds like a regression which should be fixed.
>>
>> This is why I don’t think removing the message is a good idea.  If you
>> suddenly see a lot of those messages, something changed for the worse.
>> If you remove this message, you will never know.
>>
>>> However, if this is an expected path during normal operation,
>>
>> This depends on your definition of ‘expected’ and ‘normal’.
>>
>> In general, I would argue that the fact those ever happen is a bug
>> somewhere in the kernel – if memory is allocated as movable, it should
>> be movable damn it!
>
> I was taking "expected" from dae803e165a11bc88ca8dbc07a11077caf97bbcb --
> if this is a actually a bug, how do we go about debugging it?

That’s why I’ve pointed out that this depends on the definition.  In my
opinion it’s a design bug which is now nearly impossible to fix in
efficient way.

The most likely issues is that some subsystem is allocating movable
memory but then either does not provide a way to actually move it
(that’s an obvious bug in the code IMO) or pins the memory while some
transaction is performed and at the same time CMA tries to move it.

The latter case is really unavoidable at this point which is why this
message is ‘expected’.

But if suddenly, the rate of the messages increases dramatically, you
have yourself a performance regression.

> I've had Raspbian carrying a patch downstream to remove the error
> message for 2 years now, and I either need to get this fixed or get this
> patch merged to Fedora and Debian as well, now that they're shipping
> some support for Raspberry Pi.

-- 
Best regards
ミハウ “𝓶𝓲𝓷𝓪86” ナザレヴイツ
«If at first you don’t succeed, give up skydiving»

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ