lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 23 Oct 2019 15:48:36 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Mel Gorman <mgorman@...e.de>, Waiman Long <longman@...hat.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <guro@...com>,
        Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
        Jann Horn <jannh@...gle.com>, Song Liu <songliubraving@...com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Rafael Aquini <aquini@...hat.com>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 2/2] mm, vmstat: reduce zone->lock holding time by
 /proc/pagetypeinfo

On 10/23/19 3:37 PM, Michal Hocko wrote:
> On Wed 23-10-19 15:32:05, Vlastimil Babka wrote:
>> On 10/23/19 12:27 PM, Michal Hocko wrote:
>>> From: Michal Hocko <mhocko@...e.com>
>>>
>>> pagetypeinfo_showfree_print is called by zone->lock held in irq mode.
>>> This is not really nice because it blocks both any interrupts on that
>>> cpu and the page allocator. On large machines this might even trigger
>>> the hard lockup detector.
>>>
>>> Considering the pagetypeinfo is a debugging tool we do not really need
>>> exact numbers here. The primary reason to look at the outuput is to see
>>> how pageblocks are spread among different migratetypes therefore putting
>>> a bound on the number of pages on the free_list sounds like a reasonable
>>> tradeoff.
>>>
>>> The new output will simply tell
>>> [...]
>>> Node    6, zone   Normal, type      Movable >100000 >100000 >100000 >100000  41019  31560  23996  10054   3229    983    648
>>>
>>> instead of
>>> Node    6, zone   Normal, type      Movable 399568 294127 221558 102119  41019  31560  23996  10054   3229    983    648
>>>
>>> The limit has been chosen arbitrary and it is a subject of a future
>>> change should there be a need for that.
>>>
>>> Suggested-by: Andrew Morton <akpm@...ux-foundation.org>
>>> Signed-off-by: Michal Hocko <mhocko@...e.com>
>>
>> Hmm dunno, I would rather e.g. hide the file behind some config or boot
>> option than do this. Or move it to /sys/kernel/debug ?
> 
> But those wouldn't really help to prevent from the lockup, right?

No, but it would perhaps help ensure that only people who know what they
are doing (or been told so by a developer e.g. on linux-mm) will try to
collect the data, and not some automatic monitoring tools taking
periodic snapshots of stuff in /proc that looks interesting.

> Besides that who would enable that config and how much of a difference
> would root only vs. debugfs make?

I would hope those tools don't scrap debugfs as much as /proc, but I
might be wrong of course :)

> Is the incomplete value a real problem?

Hmm perhaps not. If the overflow happens only for one migratetype, one
can use also /proc/buddyinfo to get to the exact count, as was proposed
in this thread for Movable migratetype.

>>> ---
>>>  mm/vmstat.c | 19 ++++++++++++++++++-
>>>  1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>>> index 4e885ecd44d1..762034fc3b83 100644
>>> --- a/mm/vmstat.c
>>> +++ b/mm/vmstat.c
>>> @@ -1386,8 +1386,25 @@ static void pagetypeinfo_showfree_print(struct seq_file *m,
>>>  
>>>  			area = &(zone->free_area[order]);
>>>  
>>> -			list_for_each(curr, &area->free_list[mtype])
>>> +			list_for_each(curr, &area->free_list[mtype]) {
>>>  				freecount++;
>>> +				/*
>>> +				 * Cap the free_list iteration because it might
>>> +				 * be really large and we are under a spinlock
>>> +				 * so a long time spent here could trigger a
>>> +				 * hard lockup detector. Anyway this is a
>>> +				 * debugging tool so knowing there is a handful
>>> +				 * of pages in this order should be more than
>>> +				 * sufficient
>>> +				 */
>>> +				if (freecount > 100000) {
>>> +					seq_printf(m, ">%6lu ", freecount);
>>> +					spin_unlock_irq(&zone->lock);
>>> +					cond_resched();
>>> +					spin_lock_irq(&zone->lock);
>>> +					continue;
>>> +				}
>>> +			}
>>>  			seq_printf(m, "%6lu ", freecount);
>>>  		}
>>>  		seq_putc(m, '\n');
>>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ