linux-kernel - Re: [HMM-CDM 4/5] mm/memcontrol: support MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170615131030.35fe8d57@firefly.ozlabs.ibm.com>
Date:   Thu, 15 Jun 2017 13:10:30 +1000
From:   Balbir Singh <bsingharora@...il.com>
To:     Jerome Glisse <jglisse@...hat.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        John Hubbard <jhubbard@...dia.com>,
        David Nellans <dnellans@...dia.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        cgroups@...r.kernel.org
Subject: Re: [HMM-CDM 4/5] mm/memcontrol: support MEMORY_DEVICE_PRIVATE and
 MEMORY_DEVICE_PUBLIC

On Wed, 14 Jun 2017 22:04:55 -0400
Jerome Glisse <jglisse@...hat.com> wrote:

> On Thu, Jun 15, 2017 at 11:41:59AM +1000, Balbir Singh wrote:
> > On Wed, 14 Jun 2017 16:11:43 -0400
> > Jérôme Glisse <jglisse@...hat.com> wrote:
> >   
> > > HMM pages (private or public device pages) are ZONE_DEVICE page and
> > > thus need special handling when it comes to lru or refcount. This
> > > patch make sure that memcontrol properly handle those when it face
> > > them. Those pages are use like regular pages in a process address
> > > space either as anonymous page or as file back page. So from memcg
> > > point of view we want to handle them like regular page for now at
> > > least.
> > > 
> > > Signed-off-by: Jérôme Glisse <jglisse@...hat.com>
> > > Cc: Johannes Weiner <hannes@...xchg.org>
> > > Cc: Michal Hocko <mhocko@...nel.org>
> > > Cc: Vladimir Davydov <vdavydov.dev@...il.com>
> > > Cc: cgroups@...r.kernel.org
> > > ---
> > >  kernel/memremap.c |  2 ++
> > >  mm/memcontrol.c   | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
> > >  2 files changed, 55 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/memremap.c b/kernel/memremap.c
> > > index da74775..584984c 100644
> > > --- a/kernel/memremap.c
> > > +++ b/kernel/memremap.c
> > > @@ -479,6 +479,8 @@ void put_zone_device_private_or_public_page(struct page *page)
> > >  		__ClearPageActive(page);
> > >  		__ClearPageWaiters(page);
> > >  
> > > +		mem_cgroup_uncharge(page);
> > > +  
> > 
> > A zone device page could have a mem_cgroup charge if
> > 
> > 1. The old page was charged to a cgroup and the new page from ZONE_DEVICE then
> > gets the charge that we need to drop here
> > 
> > And should not be charged
> > 
> > 2. If the driver allowed mmap based allocation (these pages are not on LRU
> > 
> > 
> > Since put_zone_device_private_or_public_page() is called from release_pages(),
> > I think the assumption is that 2 is not a problem? I've not tested the mmap
> > bits yet.  
> 
> Well that is one of the big question. Do we care about memory cgroup despite
> page not being on lru and thus not being reclaimable through the usual path ?
> 
> I believe we do want to keep charging ZONE_DEVICE page against memory cgroup
> so that userspace limit are enforced. This is important especialy for device
> private when migrating back to system memory due to CPU page fault. We do not
> want the migration back to fail because of memory cgroup limit.
> 
> Hence why i do want to charge ZONE_DEVICE page just like regular page. If we
> have people that run into OOM because of this then we can start thinking about
> how to account those pages slightly differently inside the memory cgroup.
> 
> For now i believe we do want this patch.
> 

Yes, we do need the patch, I was trying to check if we'll end up trying to uncharge
a page that is not charged, just double checking

> 
> [...]
> 
> > > @@ -4610,6 +4637,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
> > >  		 */
> > >  		if (page->mem_cgroup == mc.from) {
> > >  			ret = MC_TARGET_PAGE;
> > > +			if (is_device_private_page(page) ||
> > > +			    is_device_public_page(page))
> > > +				ret = MC_TARGET_DEVICE;
> > >  			if (target)
> > >  				target->page = page;
> > >  		}
> > > @@ -4669,6 +4699,11 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> > >  
> > >  	ptl = pmd_trans_huge_lock(pmd, vma);
> > >  	if (ptl) {
> > > +		/*
> > > +		 * Note their can not be MC_TARGET_DEVICE for now as we do not  
> >                         there  
> > > +		 * support transparent huge page with MEMORY_DEVICE_PUBLIC or
> > > +		 * MEMORY_DEVICE_PRIVATE but this might change.  
> > 
> > I am trying to remind myself why THP and MEMORY_DEVICE_* pages don't work well
> > together today, the driver could allocate a THP size set of pages and migrate it.
> > There are patches to do THP migration, not upstream yet. Could you remind me
> > of any other limitations?  
> 
> No there is nothing that would be problematic AFAICT. Persistent memory already
> use huge page so we should be in the clear. But i would rather enable that as
> a separate patchset alltogether and have proper testing specificaly for such
> scenario.

Agreed
Balbir Singh.