lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c3e3117d-e752-9a34-0000-05811c7327e2@google.com>
Date: Mon, 19 Feb 2024 21:10:13 -0800 (PST)
From: Hugh Dickins <hughd@...gle.com>
To: Charan Teja Kalla <quic_charante@...cinc.com>
cc: Hugh Dickins <hughd@...gle.com>, Andrew Morton <akpm@...ux-foundation.org>, 
    willy@...radead.org, rientjes@...gle.com, surenb@...gle.com, 
    fvdl@...gle.com, quic_pkondeti@...cinc.com, linux-mm@...ck.org, 
    linux-kernel@...r.kernel.org, Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH V7 2/2] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED
 for shmem

On Wed, 14 Feb 2024, Charan Teja Kalla wrote:

> Hello Hugh,
> 
> Based on offline discussion with some folks in the list, it seems that
> this syscall can be helpful. This patch might have forgotten and I hope
> this ping helps in resurrecting this thread.

Charan, it's not forgotten, but it was relayed to you through another
channel a month ago, that I did not expect to have time to think about
this for 3 months.  Countdown says 2 months to go now.

I realize that it's frustrating for you; it's unpleasant for me too.

> 
> On 5/18/2023 6:16 PM, Charan Teja Kalla wrote:
> > On 5/17/2023 5:02 PM, Hugh Dickins wrote:
> >>> Sure, will include those range calculations for shmem pages too.
> >> Oh, I forgot this issue, you would have liked me to look at V8 by now,
> >> to see whether I agree with your resolution there.  Sorry, no, I've
> >> not been able to divert my concentration to it yet.
> >>
> >> And it's quite likely that I shall disagree, because I've a history of
> >> disagreeing even with myself on such range widening/narrowing issues -
> >> reconciling conflicting precedents is difficult 🙁
> >>
> > If you can at least help by commenting which part of the patch you
> > disagree with, I can try hard to convince you there:) .
> > 
> >>> Please let me know if I'm missing something where I should be counting
> >>> these as NR_ISOLATED.
> >> Please grep for NR_ISOLATED, to see where and how they get manipulated
> >> already, and follow the existing examples.  The case that sticks in my
> >> mind is in mm/mempolicy.c, where the migrate_pages() syscall can build
> >> up a gigantic quantity of transiently isolated pages: your syscall can
> >> do the same, so should account for itself in the same way.
> 
> Based on the grep, it seems almost all the call stacks that isolates the
> folios is for migrating the pages where after migration the NR_ISOLATED
> is decremented (in migrate_folio_done()). The call paths are(compaction,
> memory hotplug, mempolicy).
> 
> The another call path is reclaim where we isolate 'nr' pages belongs to
> a pgdat, account/unaccount them in NR_ISOLATED across the reclaim.
> 
> I think it is easy to account for the above call paths as we know "which
> folio corresponds to which pgdat".
> 
> Where as in this patch, we are isolating a set of folios(can corresponds
> to different nodes) and relying on the reclaim_pages() to do the swap
> out. It is straightforward to account NR_ISOLATED while isolating, but
> it requires unaccounting changes in the shrink_folio_list() where folio
> is being freed after swap out.  Doing so requires changes in all the
> code places(eg: shrink_inactive_list()), where it now requires to
> account NR_ISOLATED while isolating and the shrink_folio_list()
> unaccounts it.
> 
> So, accounting NR_ISOLATED requires changes in other code places where
> this patch has not touched.

That surprises me, though I do recall that there's an irritating
asymmetry in where NR_ISOLATED is accounted and unaccounted.
I have not checked what you say there, may do so in 2 months.

> 
> If isolating a large amount of pages and not being recorded in
> NR_ISOLATED is really a problem, then may I please know your opinion on
> isolating(with out accounting) and reclaiming in small batches? The
> batch size can be considered as SWAP_CLUSTER_MAX of pages.

In most circumstances, omitting to account NR_ISOLATED wouldn't show
up as a problem; in low memory it would.  Splitting into small batches
without accounting might be an easier and better way; but whatever I
say in a hurried unthoughtful reply is likely to be wrong.

I am not convinced that isolating is even appropriate: I think I
hinted before that I would want to compare what you do here with what
shmem_swapin_range() does in mm/madvise.c, and the shmem_collapse_swapin()
I'll be proposing to avoid swapin while building up THP in collapse_file().

But it may well be that you've found the switching of LRUs essential:
I'm not prejudging, just saying I cannot rush to judgment.  And this
is also a new UAPI for tmpfs, so should not be rushed then regretted.

But if you can find another champion to force this into mm/shmem.c
for you faster than I can manage, well, I don't own any Linux source.
It's not unusual for me to limp along later and rearrange things to
suit my preference.

Hugh

> 
> > I had a V8 posted without this into accounting. Let me make the changes
> > to account for the NR_ISOLATED too.
> 
> Thanks,
> Charan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ