[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c3e3117d-e752-9a34-0000-05811c7327e2@google.com>
Date: Mon, 19 Feb 2024 21:10:13 -0800 (PST)
From: Hugh Dickins <hughd@...gle.com>
To: Charan Teja Kalla <quic_charante@...cinc.com>
cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>,
willy@...radead.org, rientjes@...gle.com, surenb@...gle.com,
fvdl@...gle.com, quic_pkondeti@...cinc.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED
for shmem
On Wed, 14 Feb 2024, Charan Teja Kalla wrote:
> Hello Hugh,
>
> Based on offline discussion with some folks in the list, it seems that
> this syscall can be helpful. This patch might have forgotten and I hope
> this ping helps in resurrecting this thread.
Charan, it's not forgotten, but it was relayed to you through another
channel a month ago, that I did not expect to have time to think about
this for 3 months. Countdown says 2 months to go now.
I realize that it's frustrating for you; it's unpleasant for me too.
>
> On 5/18/2023 6:16 PM, Charan Teja Kalla wrote:
> > On 5/17/2023 5:02 PM, Hugh Dickins wrote:
> >>> Sure, will include those range calculations for shmem pages too.
> >> Oh, I forgot this issue, you would have liked me to look at V8 by now,
> >> to see whether I agree with your resolution there. Sorry, no, I've
> >> not been able to divert my concentration to it yet.
> >>
> >> And it's quite likely that I shall disagree, because I've a history of
> >> disagreeing even with myself on such range widening/narrowing issues -
> >> reconciling conflicting precedents is difficult 🙁
> >>
> > If you can at least help by commenting which part of the patch you
> > disagree with, I can try hard to convince you there:) .
> >
> >>> Please let me know if I'm missing something where I should be counting
> >>> these as NR_ISOLATED.
> >> Please grep for NR_ISOLATED, to see where and how they get manipulated
> >> already, and follow the existing examples. The case that sticks in my
> >> mind is in mm/mempolicy.c, where the migrate_pages() syscall can build
> >> up a gigantic quantity of transiently isolated pages: your syscall can
> >> do the same, so should account for itself in the same way.
>
> Based on the grep, it seems almost all the call stacks that isolates the
> folios is for migrating the pages where after migration the NR_ISOLATED
> is decremented (in migrate_folio_done()). The call paths are(compaction,
> memory hotplug, mempolicy).
>
> The another call path is reclaim where we isolate 'nr' pages belongs to
> a pgdat, account/unaccount them in NR_ISOLATED across the reclaim.
>
> I think it is easy to account for the above call paths as we know "which
> folio corresponds to which pgdat".
>
> Where as in this patch, we are isolating a set of folios(can corresponds
> to different nodes) and relying on the reclaim_pages() to do the swap
> out. It is straightforward to account NR_ISOLATED while isolating, but
> it requires unaccounting changes in the shrink_folio_list() where folio
> is being freed after swap out. Doing so requires changes in all the
> code places(eg: shrink_inactive_list()), where it now requires to
> account NR_ISOLATED while isolating and the shrink_folio_list()
> unaccounts it.
>
> So, accounting NR_ISOLATED requires changes in other code places where
> this patch has not touched.
That surprises me, though I do recall that there's an irritating
asymmetry in where NR_ISOLATED is accounted and unaccounted.
I have not checked what you say there, may do so in 2 months.
>
> If isolating a large amount of pages and not being recorded in
> NR_ISOLATED is really a problem, then may I please know your opinion on
> isolating(with out accounting) and reclaiming in small batches? The
> batch size can be considered as SWAP_CLUSTER_MAX of pages.
In most circumstances, omitting to account NR_ISOLATED wouldn't show
up as a problem; in low memory it would. Splitting into small batches
without accounting might be an easier and better way; but whatever I
say in a hurried unthoughtful reply is likely to be wrong.
I am not convinced that isolating is even appropriate: I think I
hinted before that I would want to compare what you do here with what
shmem_swapin_range() does in mm/madvise.c, and the shmem_collapse_swapin()
I'll be proposing to avoid swapin while building up THP in collapse_file().
But it may well be that you've found the switching of LRUs essential:
I'm not prejudging, just saying I cannot rush to judgment. And this
is also a new UAPI for tmpfs, so should not be rushed then regretted.
But if you can find another champion to force this into mm/shmem.c
for you faster than I can manage, well, I don't own any Linux source.
It's not unusual for me to limp along later and rearrange things to
suit my preference.
Hugh
>
> > I had a V8 posted without this into accounting. Let me make the changes
> > to account for the NR_ISOLATED too.
>
> Thanks,
> Charan
Powered by blists - more mailing lists