lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c12fbea5-0d5d-4c06-b063-dab3dd00e704@redhat.com>
Date: Fri, 1 Mar 2024 18:00:00 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Matthew Wilcox <willy@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 Huang Ying <ying.huang@...el.com>, Gao Xiang <xiang@...nel.org>,
 Yu Zhao <yuzhao@...gle.com>, Yang Shi <shy828301@...il.com>,
 Michal Hocko <mhocko@...e.com>, Kefeng Wang <wangkefeng.wang@...wei.com>,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v3 1/4] mm: swap: Remove CLUSTER_FLAG_HUGE from
 swap_cluster_info:flags

On 01.03.24 17:44, Ryan Roberts wrote:
> On 01/03/2024 16:31, Matthew Wilcox wrote:
>> On Fri, Mar 01, 2024 at 04:27:32PM +0000, Ryan Roberts wrote:
>>> I've implemented the batching as David suggested, and I'm pretty confident it's
>>> correct. The only problem is that during testing I can't provoke the code to
>>> take the path. I've been pouring through the code but struggling to figure out
>>> under what situation you would expect the swap entry passed to
>>> free_swap_and_cache() to still have a cached folio? Does anyone have any idea?
>>>
>>> This is the original (unbatched) function, after my change, which caused David's
>>> concern that we would end up calling __try_to_reclaim_swap() far too much:
>>>
>>> int free_swap_and_cache(swp_entry_t entry)
>>> {
>>> 	struct swap_info_struct *p;
>>> 	unsigned char count;
>>>
>>> 	if (non_swap_entry(entry))
>>> 		return 1;
>>>
>>> 	p = _swap_info_get(entry);
>>> 	if (p) {
>>> 		count = __swap_entry_free(p, entry);
>>> 		if (count == SWAP_HAS_CACHE)
>>> 			__try_to_reclaim_swap(p, swp_offset(entry),
>>> 					      TTRS_UNMAPPED | TTRS_FULL);
>>> 	}
>>> 	return p != NULL;
>>> }
>>>
>>> The trouble is, whenever its called, count is always 0, so
>>> __try_to_reclaim_swap() never gets called.
>>>
>>> My test case is allocating 1G anon memory, then doing madvise(MADV_PAGEOUT) over
>>> it. Then doing either a munmap() or madvise(MADV_FREE), both of which cause this
>>> function to be called for every PTE, but count is always 0 after
>>> __swap_entry_free() so __try_to_reclaim_swap() is never called. I've tried for
>>> order-0 as well as PTE- and PMD-mapped 2M THP.
>>
>> I think you have to page it back in again, then it will have an entry in
>> the swap cache.  Maybe.  I know little about anon memory ;-)
> 
> Ahh, I was under the impression that the original folio is put into the swap
> cache at swap out, then (I guess) its removed once the IO is complete? I'm sure
> I'm miles out... what exactly is the lifecycle of a folio going through swap out?

I thought with most (disk) backends you will add it to the swapcache and 
leave it there until there is actual memory pressure. Only then, under 
memory pressure, you'd actually reclaim the folio.

You can fault it back in from the swapcache without having to go to disk.

That's how you can today end up with a THP in the swapcache: during 
swapin from disk (after the folio was reclaimed) you'd currently only 
get order-0 folios.

At least that was my assumption with my MADV_PAGEOUT testing so far :)

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ