linux-kernel - Re: [PATCH v1 2/2] mm: add priority threshold to __purge_vmap_area

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190129173936.4sscooiybzbhos77@pc636>
Date:   Tue, 29 Jan 2019 18:39:36 +0100
From:   Uladzislau Rezki <urezki@...il.com>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     "Uladzislau Rezki (Sony)" <urezki@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>,
        Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Garnier <thgarnie@...gle.com>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...e.hu>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v1 2/2] mm: add priority threshold to
 __purge_vmap_area_lazy()

On Mon, Jan 28, 2019 at 05:45:28PM -0500, Joel Fernandes wrote:
> On Thu, Jan 24, 2019 at 12:56:48PM +0100, Uladzislau Rezki (Sony) wrote:
> > commit 763b218ddfaf ("mm: add preempt points into
> > __purge_vmap_area_lazy()")
> > 
> > introduced some preempt points, one of those is making an
> > allocation more prioritized over lazy free of vmap areas.
> > 
> > Prioritizing an allocation over freeing does not work well
> > all the time, i.e. it should be rather a compromise.
> > 
> > 1) Number of lazy pages directly influence on busy list length
> > thus on operations like: allocation, lookup, unmap, remove, etc.
> > 
> > 2) Under heavy stress of vmalloc subsystem i run into a situation
> > when memory usage gets increased hitting out_of_memory -> panic
> > state due to completely blocking of logic that frees vmap areas
> > in the __purge_vmap_area_lazy() function.
> > 
> > Establish a threshold passing which the freeing is prioritized
> > back over allocation creating a balance between each other.
> 
> I'm a bit concerned that this will introduce the latency back if vmap_lazy_nr
> is greater than half of lazy_max_pages(). Which IIUC will be more likely if
> the number of CPUs is large.
> 
The threshold that we establish is two times more than lazy_max_pages(),
i.e. in case of 4 system CPUs lazy_max_pages() is 24576, therefore the
threshold is 49152, if PAGE_SIZE is 4096.

It means that we allow rescheduling if vmap_lazy_nr < 49152. If vmap_lazy_nr 
is higher then we forbid rescheduling and free areas until it becomes lower
again to stabilize the system. By doing that, we will not allow vmap_lazy_nr
to be enormously increased.

>
> In fact, when vmap_lazy_nr is high, that's when the latency will be the worst
> so one could say that that's when you *should* reschedule since the frees are
> taking too long and hurting real-time tasks.
> 
> Could this be better solved by tweaking lazy_max_pages() such that purging is
> more aggressive?
> 
> Another approach could be to detect the scenario you brought up (allocations
> happening faster than free), somehow, and avoid a reschedule?
> 
This is what i am trying to achieve by this change. 

Thank you for your comments.

--
Vlad Rezki
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@...il.com>
> > ---
> >  mm/vmalloc.c | 18 ++++++++++++------
> >  1 file changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index fb4fb5fcee74..abe83f885069 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -661,23 +661,27 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
> >  	struct llist_node *valist;
> >  	struct vmap_area *va;
> >  	struct vmap_area *n_va;
> > -	bool do_free = false;
> > +	int resched_threshold;
> >  
> >  	lockdep_assert_held(&vmap_purge_lock);
> >  
> >  	valist = llist_del_all(&vmap_purge_list);
> > +	if (unlikely(valist == NULL))
> > +		return false;
> > +
> > +	/*
> > +	 * TODO: to calculate a flush range without looping.
> > +	 * The list can be up to lazy_max_pages() elements.
> > +	 */
> >  	llist_for_each_entry(va, valist, purge_list) {
> >  		if (va->va_start < start)
> >  			start = va->va_start;
> >  		if (va->va_end > end)
> >  			end = va->va_end;
> > -		do_free = true;
> >  	}
> >  
> > -	if (!do_free)
> > -		return false;
> > -
> >  	flush_tlb_kernel_range(start, end);
> > +	resched_threshold = (int) lazy_max_pages() << 1;
> >  
> >  	spin_lock(&vmap_area_lock);
> >  	llist_for_each_entry_safe(va, n_va, valist, purge_list) {
> > @@ -685,7 +689,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
> >  
> >  		__free_vmap_area(va);
> >  		atomic_sub(nr, &vmap_lazy_nr);
> > -		cond_resched_lock(&vmap_area_lock);
> > +
> > +		if (atomic_read(&vmap_lazy_nr) < resched_threshold)
> > +			cond_resched_lock(&vmap_area_lock);
> >  	}
> >  	spin_unlock(&vmap_area_lock);
> >  	return true;
> > -- 
> > 2.11.0
> >