linux-kernel - Re: [PATCH] memcg: charge before adding to swapcache on swapin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YDBZFY8WnLewRqLg@cmpxchg.org>
Date:   Fri, 19 Feb 2021 19:34:29 -0500
From:   Johannes Weiner <hannes@...xchg.org>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Hugh Dickins <hughd@...gle.com>, Roman Gushchin <guro@...com>,
        Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] memcg: charge before adding to swapcache on swapin

On Fri, Feb 19, 2021 at 02:44:05PM -0800, Shakeel Butt wrote:
> Currently the kernel adds the page, allocated for swapin, to the
> swapcache before charging the page. This is fine but now we want a
> per-memcg swapcache stat which is essential for folks who wants to
> transparently migrate from cgroup v1's memsw to cgroup v2's memory and
> swap counters.
> 
> To correctly maintain the per-memcg swapcache stat, one option which
> this patch has adopted is to charge the page before adding it to
> swapcache. One challenge in this option is the failure case of
> add_to_swap_cache() on which we need to undo the mem_cgroup_charge().
> Specifically undoing mem_cgroup_uncharge_swap() is not simple.
> 
> This patch circumvent this specific issue by removing the failure path
> of  add_to_swap_cache() by providing __GFP_NOFAIL. Please note that in
> this specific situation ENOMEM was the only possible failure of
> add_to_swap_cache() which is removed by using __GFP_NOFAIL.
> 
> Another option was to use __mod_memcg_lruvec_state(NR_SWAPCACHE) in
> mem_cgroup_charge() but then we need to take of the do_swap_page() case
> where synchronous swap devices bypass the swapcache. The do_swap_page()
> already does hackery to set and reset PageSwapCache bit to make
> mem_cgroup_charge() execute the swap accounting code and then we would
> need to add additional parameter to tell to not touch NR_SWAPCACHE stat
> as that code patch bypass swapcache.
> 
> This patch added memcg charging API explicitly foe swapin pages and
> cleaned up do_swap_page() to not set and reset PageSwapCache bit.
> 
> Signed-off-by: Shakeel Butt <shakeelb@...gle.com>

The patch makes sense to me. While it extends the charge interface, I
actually quite like that it charges the page earlier - before putting
it into wider circulation. It's a step in the right direction.

But IMO the semantics of mem_cgroup_charge_swapin_page() are a bit too
fickle: the __GFP_NOFAIL in add_to_swap_cache() works around it, but
having a must-not-fail-after-this line makes the code tricky to work
on and error prone.

It would be nicer to do a proper transaction sequence.

> @@ -497,16 +497,15 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
>  	__SetPageLocked(page);
>  	__SetPageSwapBacked(page);
>  
> -	/* May fail (-ENOMEM) if XArray node allocation failed. */
> -	if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) {
> -		put_swap_page(page, entry);
> +	if (mem_cgroup_charge_swapin_page(page, NULL, gfp_mask, entry))
>  		goto fail_unlock;
> -	}
>  
> -	if (mem_cgroup_charge(page, NULL, gfp_mask)) {
> -		delete_from_swap_cache(page);
> -		goto fail_unlock;
> -	}
> +	/*
> +	 * Use __GFP_NOFAIL to not worry about undoing the changes done by
> +	 * mem_cgroup_charge_swapin_page() on failure of add_to_swap_cache().
> +	 */
> +	add_to_swap_cache(page, entry,
> +			  (gfp_mask|__GFP_NOFAIL) & GFP_RECLAIM_MASK, &shadow);

How about:

	mem_cgroup_charge_swapin_page()
	add_to_swap_cache()
	mem_cgroup_finish_swapin_page()

where finish_swapin_page() only uncharges the swap entry (on cgroup1)
once the swap->memory transition is complete?

Otherwise the patch looks good to me.