lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150520140353.GC28678@dhcp22.suse.cz>
Date:	Wed, 20 May 2015 16:03:53 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	Mel Gorman <mgorman@...e.de>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>,
	Linux-CGroups <cgroups@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm, memcg: Try charging a page before setting page
 up to date

On Wed 20-05-15 13:50:44, Mel Gorman wrote:
> Historically memcg overhead was high even if memcg was unused. This has
> improved a lot but it still showed up in a profile summary as being a
> problem.
> 
> /usr/src/linux-4.0-vanilla/mm/memcontrol.c                           6.6441   395842
>   mem_cgroup_try_charge                                                        2.950%   175781
>   __mem_cgroup_count_vm_event                                                  1.431%    85239
>   mem_cgroup_page_lruvec                                                       0.456%    27156
>   mem_cgroup_commit_charge                                                     0.392%    23342
>   uncharge_list                                                                0.323%    19256
>   mem_cgroup_update_lru_size                                                   0.278%    16538
>   memcg_check_events                                                           0.216%    12858
>   mem_cgroup_charge_statistics.isra.22                                         0.188%    11172
>   try_charge                                                                   0.150%     8928
>   commit_charge                                                                0.141%     8388
>   get_mem_cgroup_from_mm                                                       0.121%     7184
> 
> That is showing that 6.64% of system CPU cycles were in memcontrol.c and
> dominated by mem_cgroup_try_charge. The annotation shows that the bulk of
> the cost was checking PageSwapCache which is expected to be cache hot but is
> very expensive. The problem appears to be that __SetPageUptodate is called
> just before the check which is a write barrier. It is required to make sure
> struct page and page data is written before the PTE is updated and the data
> visible to userspace. memcg charging does not require or need the barrier
> but gets unfairly hit with the cost so this patch attempts the charging
> before the barrier.  Aside from the accidental cost to memcg there is the
> added benefit that the barrier is avoided if the page cannot be charged.
> When applied the relevant profile summary is as follows.
> 
> /usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c                  3.7907   223277
>   __mem_cgroup_count_vm_event                                                  1.143%    67312
>   mem_cgroup_page_lruvec                                                       0.465%    27403
>   mem_cgroup_commit_charge                                                     0.381%    22452
>   uncharge_list                                                                0.332%    19543
>   mem_cgroup_update_lru_size                                                   0.284%    16704
>   get_mem_cgroup_from_mm                                                       0.271%    15952
>   mem_cgroup_try_charge                                                        0.237%    13982
>   memcg_check_events                                                           0.222%    13058
>   mem_cgroup_charge_statistics.isra.22                                         0.185%    10920
>   commit_charge                                                                0.140%     8235
>   try_charge                                                                   0.131%     7716
> 
> That brings the overhead down to 3.79% and leaves the memcg fault accounting
> to the root cgroup but it's an improvement. The difference in headline
> performance of the page fault microbench is marginal as memcg is such a
> small component of it.
> 
> pft faults
>                                        4.0.0                  4.0.0
>                                      vanilla            chargefirst
> Hmean    faults/cpu-1 1443258.1051 (  0.00%) 1509075.7561 (  4.56%)
> Hmean    faults/cpu-3 1340385.9270 (  0.00%) 1339160.7113 ( -0.09%)
> Hmean    faults/cpu-5  875599.0222 (  0.00%)  874174.1255 ( -0.16%)
> Hmean    faults/cpu-7  601146.6726 (  0.00%)  601370.9977 (  0.04%)
> Hmean    faults/cpu-8  510728.2754 (  0.00%)  510598.8214 ( -0.03%)
> Hmean    faults/sec-1 1432084.7845 (  0.00%) 1497935.5274 (  4.60%)
> Hmean    faults/sec-3 3943818.1437 (  0.00%) 3941920.1520 ( -0.05%)
> Hmean    faults/sec-5 3877573.5867 (  0.00%) 3869385.7553 ( -0.21%)
> Hmean    faults/sec-7 3991832.0418 (  0.00%) 3992181.4189 (  0.01%)
> Hmean    faults/sec-8 3987189.8167 (  0.00%) 3986452.2204 ( -0.02%)
> 
> It's only visible at single threaded. The overhead is there for higher
> threads but other factors dominate.
> 
> Signed-off-by: Mel Gorman <mgorman@...e.de>

Very well spotted and I wouldn't have figured that out from the profiles
posted previously!

The patch makes sense. I am wondering why do we still have both
__SetPageUptodate and SetPageUptodate when they are same. Historically
they were slightly different but this is no longer the case.

Acked-by: Michal Hocko <mhocko@...e.cz>

Thanks!

> ---
>  mm/memory.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 97839f5c8c30..80a03628bd77 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2158,11 +2158,12 @@ gotten:
>  			goto oom;
>  		cow_user_page(new_page, old_page, address, vma);
>  	}
> -	__SetPageUptodate(new_page);
>  
>  	if (mem_cgroup_try_charge(new_page, mm, GFP_KERNEL, &memcg))
>  		goto oom_free_new;
>  
> +	__SetPageUptodate(new_page);
> +
>  	mmun_start  = address & PAGE_MASK;
>  	mmun_end    = mmun_start + PAGE_SIZE;
>  	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);
> @@ -2594,6 +2595,10 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	page = alloc_zeroed_user_highpage_movable(vma, address);
>  	if (!page)
>  		goto oom;
> +
> +	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
> +		goto oom_free_page;
> +
>  	/*
>  	 * The memory barrier inside __SetPageUptodate makes sure that
>  	 * preceeding stores to the page contents become visible before
> @@ -2601,9 +2606,6 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	 */
>  	__SetPageUptodate(page);
>  
> -	if (mem_cgroup_try_charge(page, mm, GFP_KERNEL, &memcg))
> -		goto oom_free_page;
> -
>  	entry = mk_pte(page, vma->vm_page_prot);
>  	if (vma->vm_flags & VM_WRITE)
>  		entry = pte_mkwrite(pte_mkdirty(entry));
> -- 
> 2.3.5
> 

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ