lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200921141830.GE5962@xz-x1>
Date:   Mon, 21 Sep 2020 10:18:30 -0400
From:   Peter Xu <peterx@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Tejun Heo <tj@...nel.org>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jason Gunthorpe <jgg@...pe.ca>,
        John Hubbard <jhubbard@...dia.com>,
        Leon Romanovsky <leonro@...dia.com>,
        Linux-MM <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Maya B . Gokhale" <gokhale2@...l.gov>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Marty Mcfadden <mcfadden8@...l.gov>,
        Kirill Shutemov <kirill@...temov.name>,
        Oleg Nesterov <oleg@...hat.com>, Jann Horn <jannh@...gle.com>,
        Jan Kara <jack@...e.cz>, Kirill Tkhai <ktkhai@...tuozzo.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 1/4] mm: Trial do_wp_page() simplification

Hi, Michal,

On Mon, Sep 21, 2020 at 03:42:00PM +0200, Michal Hocko wrote:
> [Cc Tejun and Christian - this is a part of a larger discussion which is
>  not directly related to this particular question so let me trim the
>  original email to the bare minimum.]
> 
> On Fri 18-09-20 12:40:32, Peter Xu wrote:
> [...]
> > One issue is when we charge for cgroup we probably can't do that onto the new
> > mm/task, since copy_namespaces() is called after copy_mm().  I don't know
> > enough about cgroup, I thought the child will inherit the parent's, but I'm not
> > sure.  Or, can we change that order of copy_namespaces() && copy_mm()?  I don't
> > see a problem so far but I'd like to ask first..
> 
> I suspect you are referring to CLONE_INTO_CGROUP, right?

Thanks for raising this question up to a broader audience.

I was not referring to that explicilty, but it would definitely be good to know
this new feature before I post the (probably stupid :) patch. Because I noticed
it's only done until cgroup_can_fork() or later, so it's definitely also later
than copy_mm() even if I do the move.

> I have only now
> learned about this feature so I am not deeply familiar with all the
> details and I might be easily wrong. Normally all the cgroup aware
> resources are accounted to the parent's cgroup. For memcg that includes
> all the page tables, early CoW and other allocations with __GFP_ACCOUNT.
> IIUC CLONE_INTO_CGROUP properly then this hasn't changed as the child is
> associated to its new cgroup (and memcg) only in cgroup_post_fork. If
> that is correct then we might have quite a lot of resources bound to
> child's lifetime but accounted to the parent's memcg which can lead to
> all sorts of interesting problems (e.g. unreclaimable memory - even by
> the oom killer).

Right that's one of the things that I'm confused too, on that if we always
account to the parent, then when the child quits whether we uncharge them or
not, and how..  Not sure whether the accounting of the parent could steadily
grow as it continues the fork()s.

So is it by design that we account all these to the parents?

> 
> Christian, Tejun is this the expected semantic or I am just misreading
> the code?

I'll try to summarize the question here too - what we'd like to do right now is
to do cgroup accounting (e.g., mem_cgroup_charge()) upon the newly created
process within copy_mm().  A quick summary of why we want to do this is to
"trigger early copy-on-write for pinned pages during fork()".

Initially I thought moving copy_mm() to be after copy_namespaces() would be the
right thing to do, but now I highly doubt it..

Thanks,

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ