lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 4 Dec 2015 22:35:37 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Michal Hocko <mhocko@...nel.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Johannes Weiner <hannes@...xchg.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: memcg uncharge page counter mismatch

On Fri, Dec 04, 2015 at 10:58:15AM +0100, Michal Hocko wrote:
> On Fri 04-12-15 18:16:34, Minchan Kim wrote:
> > On Fri, Dec 04, 2015 at 09:52:27AM +0100, Michal Hocko wrote:
> > > On Fri 04-12-15 14:35:15, Minchan Kim wrote:
> > > > On Thu, Dec 03, 2015 at 04:47:29PM +0100, Michal Hocko wrote:
> > > > > On Thu 03-12-15 15:58:50, Michal Hocko wrote:
> > > > > [....]
> > > > > > Warning, this looks ugly as hell.
> > > > > 
> > > > > I was thinking about it some more and it seems that we should rather not
> > > > > bother with partial thp at all and keep it in the original memcg
> > > > > instead. It is way much less code and I do not think this will be too
> > > > > disruptive. Somebody should be holding the thp head, right?
> > > > > 
> > > > > Minchan, does this fix the issue you are seeing.
> > > > 
> > > > This patch solves the issue but not sure it's right approach.
> > > > I think it could make regression that in old, we could charge
> > > > a THP page but we can't now.
> > > 
> > > The page would still get charged when allocated. It just wouldn't get
> > > moved when mapped only partially. IIUC there will be still somebody
> > > mapping the THP head via pmd, right? That process will move the page to
> > 
> > If I read code correctly, No. The split_huge_pmd splits just pmd,
> > not page itself. IOW, it could be possible !pmd_trans_huge(pmd) &&
> > PageTransHuge although there is only process owns the page.
> 
> I am not sure I follow you. I thought there would still be other pmd
> which will hold the THP. Why should we keep the page as huge when all
> processes which map it have already split it up?

I didn't follow Kirill's work but just read part of code to implement
MADV_FREE so I just guess.
(high-order-alloc-and-compaction/split/collapse) are costly operations
so new work tried to avoid split page as far as possible.
For example, if it works with splitting pmd, not THP page,
it doesn't split the THP page where in mprotect path.
Even, it could do delay split-page via deferred _split_huge_page
even if THP page is freed.

> 
> On the other hand it is true that the last process which maps the whole
> thp might have exited and leave others to map it partially.
>  
> > > the new memcg when moved. Or is it possible that we will end up only
> > > with pte mapped THP from all processes? Kirill?
> > 
> > I'm not Kirill but I think it's possible.
> > If so, a thing we can use is page_mapcount(page) == 1. With that,
> > it could gaurantee only a process owns the page so charge 512 instead of 1?
> 
> Alright the exclusive holder should indeed move it. I will think how to
> simplify the previous patch (has it helped in your testing btw.?).

At least, your patch doesn't make the WARNING but I didn't check
the accouting was right.

Thanks.

> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ