[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201123120432.3c0cb9b7e2f46150f132d592@linux-foundation.org>
Date: Mon, 23 Nov 2020 12:04:32 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Lin Feng <linf@...gsu.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
mgorman@...hsingularity.net
Subject: Re: [PATCH] [RFC] init/main: fix broken buffer_init when
DEFERRED_STRUCT_PAGE_INIT set
On Mon, 23 Nov 2020 19:05:00 +0800 Lin Feng <linf@...gsu.com> wrote:
> In the booting phase if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set,
> we have following callchain:
>
> start_kernel
> ...
> mm_init
> mem_init
> memblock_free_all
> reset_all_zones_managed_pages
> free_low_memory_core_early
> ...
> buffer_init
> nr_free_buffer_pages
> zone->managed_pages
> ...
> rest_init
> kernel_init
> kernel_init_freeable
> page_alloc_init_late
> kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid);
> wait_for_completion(&pgdat_init_all_done_comp);
> ...
> files_maxfiles_init
>
> It's clear that buffer_init depends on zone->managed_pages, but it's reset
> in reset_all_zones_managed_pages after that pages are readded into
> zone->managed_pages, but when buffer_init runs this process is half done
> and most of them will finally be added till deferred_init_memmap done.
> In large memory couting of nr_free_buffer_pages drifts too much, also
> drifting from kernels to kernels on same hardware.
>
> Fix is simple, it delays buffer_init run till deferred_init_memmap all done.
>
> But as corrected by this patch, max_buffer_heads becomes very large,
> the value is roughly as many as 4 times of totalram_pages, formula:
> max_buffer_heads = nrpages * (10%) * (PAGE_SIZE / sizeof(struct buffer_head));
>
> Say in a 64GB memory box we have 16777216 pages, then max_buffer_heads
> turns out to be roughly 67,108,864.
> In common cases, should a buffer_head be mapped to one page/block(4KB)?
> So max_buffer_heads never exceeds totalram_pages.
> IMO it's likely to make buffer_heads_over_limit bool value alwasy false,
> then make codes 'if (buffer_heads_over_limit)' test in vmscan unnecessary.
> Correct me if it's not true.
I agree - seems that on such a system we'll allow enough buffer_heads
to manage about 250GB worth of pagecache, for a 4kb filesystem
blocksize.
Perhaps this code is all a remnant of highmem systems, where
ZONE_NORMAL is considerably smaller than ZONE_HIGHMEM, and we don't
want to be consuming all of ZONE_NORMAL for highmem-attached
buffer_heads.
I'm not sure that it's all very harmful - we don't *need* to be
trimming away at the buffer_heads on a 64GB 4-bit system so the code is
really only functional on highmem machines. And as far as I know, it
works OK on such machines.
Powered by blists - more mailing lists