[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <79c5513b-b3f2-4fbb-a3c7-a09894d54d22@redhat.com>
Date: Mon, 8 Apr 2024 12:24:10 +0200
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>, Huang Ying <ying.huang@...el.com>,
Gao Xiang <xiang@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
Yang Shi <shy828301@...il.com>, Michal Hocko <mhocko@...e.com>,
Kefeng Wang <wangkefeng.wang@...wei.com>, Barry Song <21cnbao@...il.com>,
Chris Li <chrisl@...nel.org>, Lance Yang <ioworker0@...il.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 2/6] mm: swap: free_swap_and_cache_nr() as batched
free_swap_and_cache()
On 08.04.24 12:07, Ryan Roberts wrote:
> On 08/04/2024 10:43, David Hildenbrand wrote:
>>
>>>>> +
>>>>> +/**
>>>>> + * swap_pte_batch - detect a PTE batch for a set of contiguous swap entries
>>>>> + * @start_ptep: Page table pointer for the first entry.
>>>>> + * @max_nr: The maximum number of table entries to consider.
>>>>> + * @entry: Swap entry recovered from the first table entry.
>>>>> + *
>>>>> + * Detect a batch of contiguous swap entries: consecutive (non-present) PTEs
>>>>> + * containing swap entries all with consecutive offsets and targeting the same
>>>>> + * swap type.
>>>>> + *
>>>>
>>>> Likely you should document that any swp pte bits are ignored? ()
>>>
>>> Sorry I don't understand this comment. I thought any non-none, non-present PTE
>>> was always considered to contain only a "swap entry" and a swap entry consists
>>> of a "type" and an "offset" only. (and its a special "non-swap" swap entry if
>>> type > SOME_CONSTANT) Are you saying there are additional fields in the PTE that
>>> are not part of the swap entry?
>>
>>
>> pte_swp_soft_dirty()
>> pte_swp_clear_exclusive()
>> pte_swp_uffd_wp()
>>
>> Are PTE bits used for swp PTE.
>
> Ahh wow. mind blown. Looks like a massive hack... why not store them in the
> arch-independent swap entry, rather than have them squat independently in the PTE?
I think that was discussed at some point, but it not only requires quite
some churn to change it (all that swp entry code is a mess), these bits
are conceptually really per PTE and not something you would want to pass
into actual swap handling code that couldn't care less about all of these.
I looked at this when I added SWP exclusive, but accidentally losing the
SWP exclusive marker when converting back and forth made me go the PTE
route instead.
Then, the available PTE bits are a bit scattered on some architectures,
and converting entry<->PTE gets even uglier if we don't want to "lose"
available bits.
Probably the whole "unsigned long swp_entry" stuff should be replaced by
a proper struct where we could more easily add flags and have the arch
code handle the conversion to the PTE just once. The arch-specific
swp_entry stuff is another nasty thing IMHO.
>
> OK, my implementation is buggy. I'll re-spin to fix this.
>
>
>>
>> There is also dirty/young for migration entries, but that's not of a concern
>> here, because we stop for non_swap_entry().
>
> Looks like these are part of the offset field in the arch-independent swap entry
> - much cleaner ;-).
Note that it only applies to migration entries, and only when we have
some spare bits due to PFN < offset.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists