[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zo5v_hefrYFImqBC@localhost.localdomain>
Date: Wed, 10 Jul 2024 13:26:54 +0200
From: Oscar Salvador <osalvador@...e.de>
To: David Hildenbrand <david@...hat.com>
Cc: Peter Xu <peterx@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Muchun Song <muchun.song@...ux.dev>, SeongJae Park <sj@...nel.org>,
Miaohe Lin <linmiaohe@...wei.com>, Michal Hocko <mhocko@...e.com>,
Matthew Wilcox <willy@...radead.org>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Jason Gunthorpe <jgg@...dia.com>
Subject: Re: [PATCH 00/45] hugetlb pagewalk unification
On Wed, Jul 10, 2024 at 05:52:43AM +0200, David Hildenbrand wrote:
> I understand that. And it would all be easier+more straight forward if we
> wouldn't have that hugetlb CONT-PTE / CONT-PMD stuff in there that works
> similar, but different to "ordinary" cont-pte for thp.
>
> I'm sure you stumbled over the set_huge_pte_at() on arm64 for example. If
> we, at one point *don't* use these hugetlb functions right now to modify
> hugetlb entries, we might be in trouble.
>
> That's why I think we should maybe invest our time and effort in having a
> new pagewalker that will just batch such things naturally, and users that
> can operate on that naturally. For example: a hugetlb cont-pte-mapped folio
> will just naturally be reported as a "fully mapped folio", just like a THP
> would be if mapped in a compatible way.
>
> Yes, this requires more work, but as raised in some patches here, working on
> individual PTEs/PMDs for hugetlb is problematic.
>
> You have to batch every operation, to essentially teach ordinary code to do
> what the hugetlb_* special code would have done on cont-pte/cont-pmd things.
>
>
> (as a side note, cont-pte/cont-pmd should primarily be a hint from arch code
> on how many entries we can batch, like we do in folio_pte_batch(); point is
> that we want to batch also on architectures where we don't have such bits,
> and prepare for architectures that implement various sizes of batching;
> IMHO, having cont-pte/cont-pmd checks in common code is likely the wrong
> approach. Again, folio_pte_batch() is where we tackled the problem
> differently from the THP perspective)
I must say I did not check folio_pte_batch() and I am totally ignorant
of what/how it does things.
I will have a look.
> I have an idea for a better page table walker API that would try batching
> most entries (under one PTL), and walkers can just register for the types
> they want. Hoping I will find some time to at least scetch the user
> interface soon.
>
> That doesn't mean that this should block your work, but the
> cont-pte/cont/pmd hugetlb stuff is really nasty to handle here, and I don't
> particularly like where this is going.
Ok, let me take a step back then.
Previous versions of that RFC did not handle cont-{pte-pmd} wide in the
open, so let me go back to the drawing board and come up with something
that does not fiddle with cont- stuff in that way.
I might post here a small diff just to see if we are on the same page.
As usual, thanks a lot for your comments David!
--
Oscar Salvador
SUSE Labs
Powered by blists - more mailing lists