[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d28bc808-0aab-d36a-f401-9925680fd131@virtuozzo.com>
Date: Wed, 5 Apr 2017 13:31:23 +0300
From: Andrey Ryabinin <aryabinin@...tuozzo.com>
To: Michal Hocko <mhocko@...nel.org>
CC: Thomas Hellstrom <thellstrom@...are.com>,
<akpm@...ux-foundation.org>, <penguin-kernel@...ove.SAKURA.ne.jp>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<hpa@...or.com>, <chris@...is-wilson.co.uk>, <hch@....de>,
<mingo@...e.hu>, <jszhang@...vell.com>, <joelaf@...gle.com>,
<joaodias@...gle.com>, <willy@...radead.org>, <tglx@...utronix.de>,
<stable@...r.kernel.org>
Subject: Re: [PATCH 1/4] mm/vmalloc: allow to call vfree() in atomic context
On 04/04/2017 12:41 PM, Michal Hocko wrote:
> On Thu 30-03-17 17:48:39, Andrey Ryabinin wrote:
>> From: Andrey Ryabinin <aryabinin@...tuozzo.com>
>> Subject: mm/vmalloc: allow to call vfree() in atomic context fix
>>
>> Don't spawn worker if we already purging.
>>
>> Signed-off-by: Andrey Ryabinin <aryabinin@...tuozzo.com>
>
> I would rather put this into a separate patch. Ideally with some numners
> as this is an optimization...
>
It's quite simple optimization and don't think that this deserves to be a separate patch.
But I did some measurements though. With enabled VMAP_STACK=y and NR_CACHED_STACK changed to 0
running fork() 100000 times gives this:
With optimization:
~ # grep try_purge /proc/kallsyms
ffffffff811d0dd0 t try_purge_vmap_area_lazy
~ # perf stat --repeat 10 -ae workqueue:workqueue_queue_work --filter 'function == 0xffffffff811d0dd0' ./fork
Performance counter stats for 'system wide' (10 runs):
15 workqueue:workqueue_queue_work ( +- 0.88% )
1.615368474 seconds time elapsed ( +- 0.41% )
Without optimization:
~ # grep try_purge /proc/kallsyms
ffffffff811d0dd0 t try_purge_vmap_area_lazy
~ # perf stat --repeat 10 -ae workqueue:workqueue_queue_work --filter 'function == 0xffffffff811d0dd0' ./fork
Performance counter stats for 'system wide' (10 runs):
30 workqueue:workqueue_queue_work ( +- 1.31% )
1.613231060 seconds time elapsed ( +- 0.38% )
So there is no measurable difference on the test itself, but we queue twice more jobs without this optimization.
It should decrease load of kworkers.
>> ---
>> mm/vmalloc.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index ea1b4ab..88168b8 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -737,7 +737,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
>> /* After this point, we may free va at any time */
>> llist_add(&va->purge_list, &vmap_purge_list);
>>
>> - if (unlikely(nr_lazy > lazy_max_pages()))
>> + if (unlikely(nr_lazy > lazy_max_pages()) &&
>> + !mutex_is_locked(&vmap_purge_lock))
>> schedule_work(&purge_vmap_work);
>> }
>>
>> --
>> 2.10.2
>>
>
Powered by blists - more mailing lists