lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACZ9PQWbZ7m1LQLs+bOjtHNsKDmSZmkjAH8vmnc2VBgCLDdhDg@mail.gmail.com>
Date:	Tue, 17 Mar 2015 14:12:14 +0900
From:	Roman Peniaev <r.peniaev@...il.com>
To:	Joonsoo Kim <iamjoonsoo.kim@....com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Nick Piggin <npiggin@...nel.dk>,
	Eric Dumazet <edumazet@...gle.com>,
	David Rientjes <rientjes@...gle.com>,
	WANG Chao <chaowang@...hat.com>,
	Fabian Frederick <fabf@...net.be>,
	Christoph Lameter <cl@...ux.com>, Gioh Kim <gioh.kim@....com>,
	Rob Jones <rob.jones@...ethink.co.uk>, linux-mm@...ck.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space
 caused by vm_map_ram allocator

On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@....com> wrote:
> On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
>> If suitable block can't be found, new block is allocated and put into a head
>> of a free list, so on next iteration this new block will be found first.
>>
>> That's bad, because old blocks in a free list will not get a chance to be fully
>> used, thus fragmentation will grow.
>>
>> Let's consider this simple example:
>>
>>  #1 We have one block in a free list which is partially used, and where only
>>     one page is free:
>>
>>     HEAD |xxxxxxxxx-| TAIL
>>                    ^
>>                    free space for 1 page, order 0
>>
>>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>>     since we do not have free space to complete this request. New block is put
>>     into a head of a free list:
>>
>>     HEAD |----------|xxxxxxxxx-| TAIL
>>
>>  #3 Two pages were occupied in a new found block:
>>
>>     HEAD |xx--------|xxxxxxxxx-| TAIL
>>           ^
>>           two pages mapped here
>>
>>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>>     on #2 step, is located at the beginning of a free list, so it will be found
>>     first:
>>
>>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>>           ^                 ^
>>           page mapped here, but better to use this hole
>>
>> It is obvious, that it is better to complete request of #4 step using the old
>> block, where free space is left, because in other case fragmentation will be
>> highly increased.
>>
>> But fragmentation is not only the case.  The most worst thing is that I can
>> easily create scenario, when the whole vmalloc space is exhausted by blocks,
>> which are not used, but already dirty and have several free pages.
>>
>> Let's consider this function which execution should be pinned to one CPU:
>>
>>  ------------------------------------------------------------------------------
>> /* Here we consider that our block is equal to 1MB, thus 256 pages */
>> static void exhaust_virtual_space(struct page *pages[256], int iters)
>> {
>>       /* Firstly we have to map a big chunk, e.g. 16 pages.
>>        * Then we have to occupy the remaining space with smaller
>>        * chunks, i.e. 8 pages. At the end small hole should remain.
>>        * So at the end of our allocation sequence block looks like
>>        * this:
>>        *                XX  big chunk
>>        * |XXxxxxxxx-|    x  small chunk
>>        *                 -  hole, which is enough for a small chunk,
>>        *                    but not for a big chunk
>>        */
>>       unsigned big_allocs   = 1;
>>       /* -1 for hole, which should be left at the end of each block
>>        * to keep it partially used, with some free space available */
>>       unsigned small_allocs = (256 - 16) / 8 - 1;
>>       void    *vaddrs[big_allocs + small_allocs];
>>
>>       while (iters--) {
>>               int i = 0, j;
>>
>>               /* Map big chunk */
>>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
>>
>>               /* Map small chunks */
>>               for (j = 0; j < small_allocs; j++)
>>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
>>                                                PAGE_KERNEL);
>>
>>               /* Unmap everything */
>>               while (i--)
>>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
>>       }
>> }
>>  ------------------------------------------------------------------------------
>>
>> On every iteration new block (1MB of vm area in my case) will be allocated and
>> then will be occupied, without attempt to resolve small allocation request
>> using previously allocated blocks in a free list.
>>
>> In current patch I simply put newly allocated block to the tail of a free list,
>> thus reduce fragmentation, giving a chance to resolve allocation request using
>> older blocks with possible holes left.
>
> Hello,
>
> I think that if you put newly allocated block to the tail of a free
> list, below example would results in enormous performance degradation.
>
> new block: 1MB (256 pages)
>
> while (iters--) {
>   vm_map_ram(3 or something else not dividable for 256) * 85
>   vm_unmap_ram(3) * 85
> }
>
> On every iteration, it needs newly allocated block and it is put to the
> tail of a free list so finding it consumes large amount of time.
>
> Is there any other solution to prevent your problem?

Hello.

My second patch fixes this problem.
I occupy the block on allocation and avoid jumping to the search loop.

Also the problem is much wider.  Since we allocate a block on one CPU, but
search of a free block can be done on another CPU (preemption was turned on),
then allocation can happen again.  In worst case allocation will happen for
each CPU available on the system.

This scenario also should be fixed by occupying block on allocation.

--
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ