lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220103164501.es3oagajgzxnmtmm@revolver>
Date:   Mon, 3 Jan 2022 16:45:08 +0000
From:   Liam Howlett <liam.howlett@...cle.com>
To:     Vlastimil Babka <vbabka@...e.cz>
CC:     "maple-tree@...ts.infradead.org" <maple-tree@...ts.infradead.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Song Liu <songliubraving@...com>,
        Davidlohr Bueso <dave@...olabs.net>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Laurent Dufour <ldufour@...ux.ibm.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Rik van Riel <riel@...riel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Michel Lespinasse <walken.cr@...il.com>,
        Jerome Glisse <jglisse@...hat.com>,
        Minchan Kim <minchan@...gle.com>,
        Joel Fernandes <joelaf@...gle.com>,
        Rom Lemarchand <romlem@...gle.com>
Subject: Re: [PATCH v4 12/66] kernel/fork: Use maple tree for dup_mmap()
 during forking

* Vlastimil Babka <vbabka@...e.cz> [211216 06:09]:
> On 12/1/21 15:29, Liam Howlett wrote:
> > From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
> > 
> > The maple tree was already tracking VMAs in this function by an earlier
> > commit, but the rbtree iterator was being used to iterate the list.
> > Change the iterator to use a maple tree native iterator and switch to
> > the maple tree advanced API to avoid multiple walks of the tree during
> > insert operations.  Unexport the now-unused vma_store() function.
> > 
> > We track whether we need to free the VMAs and tree nodes through RCU
> > (ie whether there have been multiple threads that can see the mm_struct
> > simultaneously; by pthread(), ptrace() or looking at /proc/$pid/maps).
> > This setting is sticky because it's too tricky to decide when it's safe
> > to exit RCU mode.
> 
> I don't immediately see why enabling the RCU tracking in mmget is part of
> the dup_mmap() change?

Enabling the RCU tracking during the dup_mmap() change seems the correct
place to put a change dealing with counting the number of users of the
mmap.  This commit switches to using the maple tree as the primary
method of tracking the mmap, so it seems logical to me.

> 
> > For performance reasons we bulk allocate the maple tree nodes.  The node
> > calculations are done internally to the tree and use the VMA count and
> > assume the worst-case node requirements.  The VM_DONT_COPY flag does
> > not allow for the most efficient copy method of the tree and so a bulk
> > loading algorithm is used.
> > 
> > Signed-off-by: Liam R. Howlett <Liam.Howlett@...cle.com>
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> 
> Acked-by: Vlastimil Babka <vbabka@...e.cz>
> 
> >  static inline bool mmget_not_zero(struct mm_struct *mm)
> >  {
> > +	/*
> > +	 * There is a race below during task tear down that can cause the maple
> 
> What does 'below' refer to here?

If mm_users is zero and we arrive at mmget_not_zero, the maple tree will
enter RCU mode.  The only way mm_users is zero would be task teardown,
so we could have a race with teardown and another thread.. but the
result would be that the maple tree would delay freeing nodes - which is
not tragic.

I had a discussion with Hilf Danton in v2 of this patch set with the
code slightly different, but with the same outcome [1].

So 'below' means below the comment block.

> 
> > +	 * tree to enter rcu mode with only a single user.  If this race
> > +	 * happens, the result would be that the maple tree nodes would remain
> > +	 * active for an extra RCU read cycle.
> > +	 */
> > +	if (!mt_in_rcu(&mm->mm_mt))
> > +		mm_set_in_rcu(mm);
> >  	return atomic_inc_not_zero(&mm->mm_users);
> >  }
> >  
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index cc9bb95c7678..c9f8465d8ae2 100644

1.  https://lore.kernel.org/lkml/20210820174544.cvbdwpp6cfebos2m@revolver/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ