linux-kernel - Re: [RFC PATCH 0/2] improve vmalloc allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181023063648.GB22110@joelaf.mtv.corp.google.com>
Date:   Mon, 22 Oct 2018 23:36:48 -0700
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Uladzislau Rezki <urezki@...il.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>,
        Michal Hocko <mhocko@...e.com>,
        Thomas Garnier <thgarnie@...gle.com>,
        Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Joel Fernandes <joelaf@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...e.hu>, Tejun Heo <tj@...nel.org>
Subject: Re: [RFC PATCH 0/2] improve vmalloc allocation

On Mon, Oct 22, 2018 at 04:50:06PM +0200, Uladzislau Rezki wrote:
> On Fri, Oct 19, 2018 at 05:11:45PM -0700, Joel Fernandes wrote:
> > On Fri, Oct 19, 2018 at 07:35:36PM +0200, Uladzislau Rezki (Sony) wrote:
> > > Objective
> > > ---------
> > > Initiative of improving vmalloc allocator comes from getting many issues
> > > related to allocation time, i.e. sometimes it is terribly slow. As a result
> > > many workloads which are sensitive for long (more than 1 millisecond) preemption
> > > off scenario are affected by that slowness(test cases like UI or audio, etc.).
> > > 
> > > The problem is that, currently an allocation of the new VA area is done over
> > > busy list iteration until a suitable hole is found between two busy areas.
> > > Therefore each new allocation causes the list being grown. Due to long list
> > > and different permissive parameters an allocation can take a long time on
> > > embedded devices(milliseconds).
> > 
> > I am not super familiar with the vmap allocation code, it has been some
> > years. But I have 2 comments:
> > 
> > (1) It seems the issue you are reporting is the walking of the list in
> > alloc_vmap_area().
> > 
> > Can we not solve this by just simplifying the following code?
> > 
> > 	/* from the starting point, walk areas until a suitable hole is found
> > 	 */
> > 	while (addr + size > first->va_start && addr + size <= vend) {
> > 		if (addr + cached_hole_size < first->va_start)
> > 			cached_hole_size = first->va_start - addr;
> > 		addr = ALIGN(first->va_end, align);
> > 		if (addr + size < addr)
> > 			goto overflow;
> > 
> > 		if (list_is_last(&first->list, &vmap_area_list))
> > 			goto found;
> > 
> > 		first = list_next_entry(first, list);
> > 	}
> > 
> > Instead of going through the vmap_area_list, can we not just binary search
> > the existing address-sorted vmap_area_root rbtree to find a hole? If yes,
> > that would bring down the linear search overhead. If not, why not?
> >
> vmap_area_root rb-tree is used for fast access to vmap_area knowing
> the address(any va_start). That is why we use the tree. To use that tree
> in order to check holes will require to start from the left most node or
> specified "vstart" and move forward by rb_next(). What is much slower
> than regular(list_next_entry O(1)) access in this case. 

Ah, sorry. Don't know what I was thinking, you are right.  By the way the
binder driver does something similar too for buffer allocations, maintains an
rb tree of free areas:
https://github.com/torvalds/linux/blob/master/drivers/android/binder_alloc.c#L415

> > (2) I am curious, do you have any measurements of how much time
> > alloc_vmap_area() is taking? You mentioned it takes milliseconds but I was
> > wondering if you had more finer grained function profiling measurements. And
> > also any data on how big are the lists at the time you see this issue.
> > 
> Basically it depends on how much or heavily your system uses vmalloc
> allocations. I was using CONFIG_DEBUG_PREEMPT with an extra patch. See it
> here: ftp://vps418301.ovh.net/incoming/0001-tracing-track-preemption-disable-callers.patch
> 
> As for list size. It can be easily thousands.

Understood. I will go through your patches more in the coming days, thanks!

 - Joel