lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 6 Oct 2015 08:42:32 +0000
From:	Vineet Gupta <Vineet.Gupta1@...opsys.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	"arc-linux-dev@...opsys.com" <arc-linux-dev@...opsys.com>,
	Robin Holt <robin.m.holt@...il.com>,
	Nathan Zimmer <nzimmer@....com>, Jiang Liu <liuj97@...il.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
	lkml <linux-kernel@...r.kernel.org>, Mel Gorman <mgorman@...e.de>
Subject: Re: [arc-linux-dev] Re: New helper to free highmem pages in larger
 chunks

On Tuesday 06 October 2015 11:06 AM, Vineet Gupta wrote:
> On Tuesday 06 October 2015 03:40 AM, Andrew Morton wrote:
>> On Sat, 3 Oct 2015 18:25:13 +0530 Vineet Gupta <Vineet.Gupta1@...opsys.com> wrote:
>>
>>> Hi,
>>>
>>> I noticed increased boot time when enabling highmem for ARC. Turns out that
>>> freeing highmem pages into buddy allocator is done page at a time, while it is
>>> batched for low mem pages. Below is call flow.
>>>
>>> I'm thinking of writing free_highmem_pages() which takes start and end pfn and
>>> want to solicit some ideas whether to write it from scratch or preferably call
>>> existing __free_pages_memory() to reuse the logic to convert a pfn range into
>>> {pfn, order} tuples.
>>>
>>> For latter however there are semantical differences as you can see below which I'm
>>> not sure of:
>>>   -highmem page->count is set to 1, while 0 for low mem
>> That would be weird.
>>
>> Look more closely at __free_pages_boot_core() - it uses
>> set_page_refcounted() to set the page's refcount to 1.  Those
>> set_page_count() calls look superfluous to me.
> If you closer still, set_page_refcounted() is called outside the loop for the
> first page only. For all pages, loop iterator sets them to 1. Turns out there's
> more fun here....
>
> I ran this under a debugger and much earlier in boot process, there's existing
> setting of page count to 1 for *all* pages of *all* zones (include highmem pages).
> See call flow below.
>
> free_area_init_node
>     free_area_init_core
>         loops thru all zones
>             memmap_init_zone
>                loops thru all pages of zones
>                __init_single_page
>
> This means the subsequent setting of page count to 0 (or 1 for the special first
> page) is superfluous - actually buggy at best. I will send a patch to fix that. I
> hope I don't break some obscure init path which doesn't hit the above init.

So I took a stab at it and broke it royally. I was too naive for this to begin
with. The explicit setting to 1 for high mem pages, 0 for all low mem pages except
1st page in @order which has 1 is all by design.

__free_pages() called by both code paths,  always decrements the refcount of
struct page. In case of page batch (order !=0) it only decrements the first page's
refcount. This was my find of the month - but you probably have known this for
longest amount of time ! Live and learn.

The current High mem page only uses order == 0, so init ref count of 1 is needed
(although done from __init_single_page is sufficient - no need to do that again in
free_highmem_page()). The low mem pages though typically call free_pages() with
order > 0, thus the caller carefully setsup the first page in @order to refcount 1
(using set_page_refcounted()), while rest of pages are set to 0 refcount in the loop.

Thus the seeming redundant setting of 0 seems to be fine IMHO - perhaps better to
document it - assuming I got it right so far.


>>>   -atomic clearing of page reserved flag vs. non atomic
>> I doubt if the atomic is needed - who else can be looking at this page
>> at this time?
> I'll send another one to separately fix that as well. Seems like boot mem setup is
> a relatively neglect part of kernel.
>
> -Vineet
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists