linux-kernel - Re: [PATCH] mm, memcg: sync allocation and memcg charge gfp flags for THP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150318150257.GL17241@dhcp22.suse.cz>
Date:	Wed, 18 Mar 2015 16:02:57 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	Vlastimil Babka <vbabka@...e.cz>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>, linux-mm@...ck.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm, memcg: sync allocation and memcg charge gfp flags
 for THP

On Wed 18-03-15 15:34:50, Vlastimil Babka wrote:
> On 03/16/2015 03:08 PM, Michal Hocko wrote:
> >memcg currently uses hardcoded GFP_TRANSHUGE gfp flags for all THP
> >charges. THP allocations, however, might be using different flags
> >depending on /sys/kernel/mm/transparent_hugepage/{,khugepaged/}defrag
> >and the current allocation context.
> >
> >The primary difference is that defrag configured to "madvise" value will
> >clear __GFP_WAIT flag from the core gfp mask to make the allocation
> >lighter for all mappings which are not backed by VM_HUGEPAGE vmas.
> >If memcg charge path ignores this fact we will get light allocation but
> >the a potential memcg reclaim would kill the whole point of the
> >configuration.
> >
> >Fix the mismatch by providing the same gfp mask used for the
> >allocation to the charge functions. This is quite easy for all
> >paths except for hugepaged kernel thread with !CONFIG_NUMA which is
> >doing a pre-allocation long before the allocated page is used in
> >collapse_huge_page via khugepaged_alloc_page. To prevent from cluttering
> >the whole code path from khugepaged_do_scan we simply return the current
> >flags as per khugepaged_defrag() value which might have changed since
> >the preallocation. If somebody changed the value of the knob we would
> >charge differently but this shouldn't happen often and it is definitely
> >not critical because it would only lead to a reduced success rate of
> >one-off THP promotion.
> >
> >Signed-off-by: Michal Hocko <mhocko@...e.cz>
> 
> Acked-by: Vlastimil Babka <vbabka@...e.cz>

Thanks!

[...]
> >@@ -1080,6 +1080,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >  	unsigned long haddr;
> >  	unsigned long mmun_start;	/* For mmu_notifiers */
> >  	unsigned long mmun_end;		/* For mmu_notifiers */
> >+	gfp_t huge_gfp = GFP_TRANSHUGE;	/* for allocation and charge */
> 
> This value is actually never used. Is it here because the compiler emits a
> spurious non-initialized value warning otherwise? It should be easy for it
> to prove that setting new_page to something non-null implies initializing
> huge_gfp (in the hunk below), and NULL new_page means it doesn't reach the
> mem_cgroup_try_charge() call?

No, I haven't tried to workaround the compiler. It just made the code
more obvious to me. I can remove the initialization if you prefer, of
course.

> >  	ptl = pmd_lockptr(mm, pmd);
> >  	VM_BUG_ON_VMA(!vma->anon_vma, vma);
> >@@ -1106,10 +1107,8 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma,
> >  alloc:
> >  	if (transparent_hugepage_enabled(vma) &&
> >  	    !transparent_hugepage_debug_cow()) {
> >-		gfp_t gfp;
> >-
> >-		gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0);
> >-		new_page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
> >+		huge_gfp = alloc_hugepage_gfpmask(transparent_hugepage_defrag(vma), 0);
> >+		new_page = alloc_hugepage_vma(huge_gfp, vma, haddr, HPAGE_PMD_ORDER);
> >  	} else
> >  		new_page = NULL;
> >
> >@@ -1131,7 +1130,7 @@ alloc:
> >  	}
> >
> >  	if (unlikely(mem_cgroup_try_charge(new_page, mm,
> >-					   GFP_TRANSHUGE, &memcg))) {
> >+					   huge_gfp, &memcg))) {
> >  		put_page(new_page);
> >  		if (page) {
> >  			split_huge_page(page);
> 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/