linux-kernel - Re: [PATCH v1 5/8] mm, kasan: Stackdepot implementation. Enable stackdepot for SLAB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG_fn=UV=42UQrmY9Sd4BTzX_bfFYwrN7pdBPNZKgDvu5nvbGg@mail.gmail.com>
Date:	Thu, 18 Feb 2016 16:01:12 +0100
From:	Alexander Potapenko <glider@...gle.com>
To:	Joonsoo Kim <js1304@...il.com>
Cc:	Joonsoo Kim <iamjoonsoo.kim@....com>, kasan-dev@...glegroups.com,
	Christoph Lameter <cl@...ux.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Dmitriy Vyukov <dvyukov@...gle.com>,
	Andrey Ryabinin <ryabinin.a.a@...il.com>,
	Linux Memory Management List <linux-mm@...ck.org>,
	Andrey Konovalov <adech.fo@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v1 5/8] mm, kasan: Stackdepot implementation. Enable
 stackdepot for SLAB

On Thu, Feb 18, 2016 at 9:13 AM, Joonsoo Kim <js1304@...il.com> wrote:
> 2016-02-18 3:29 GMT+09:00 Alexander Potapenko <glider@...gle.com>:
>> On Tue, Feb 16, 2016 at 7:37 PM, Alexander Potapenko <glider@...gle.com> wrote:
>>> On Mon, Feb 1, 2016 at 3:55 AM, Joonsoo Kim <iamjoonsoo.kim@....com> wrote:
>>>> On Thu, Jan 28, 2016 at 02:27:44PM +0100, Alexander Potapenko wrote:
>>>>> On Thu, Jan 28, 2016 at 1:51 PM, Alexander Potapenko <glider@...gle.com> wrote:
>>>>> >
>>>>> > On Jan 28, 2016 8:40 AM, "Joonsoo Kim" <iamjoonsoo.kim@....com> wrote:
>>>>> >>
>>>>> >> Hello,
>>>>> >>
>>>>> >> On Wed, Jan 27, 2016 at 07:25:10PM +0100, Alexander Potapenko wrote:
>>>>> >> > Stack depot will allow KASAN store allocation/deallocation stack traces
>>>>> >> > for memory chunks. The stack traces are stored in a hash table and
>>>>> >> > referenced by handles which reside in the kasan_alloc_meta and
>>>>> >> > kasan_free_meta structures in the allocated memory chunks.
>>>>> >>
>>>>> >> Looks really nice!
>>>>> >>
>>>>> >> Could it be more generalized to be used by other feature that need to
>>>>> >> store stack trace such as tracepoint or page owner?
>>>>> > Certainly yes, but see below.
>>>>> >
>>>>> >> If it could be, there is one more requirement.
>>>>> >> I understand the fact that entry is never removed from depot makes things
>>>>> >> very simpler, but, for general usecases, it's better to use reference
>>>>> >> count
>>>>> >> and allow to remove. Is it possible?
>>>>> > For our use case reference counting is not really necessary, and it would
>>>>> > introduce unwanted contention.
>>>>
>>>> Okay.
>>>>
>>>>> > There are two possible options, each having its advantages and drawbacks: we
>>>>> > can let the clients store the refcounters directly in their stacks (more
>>>>> > universal, but harder to use for the clients), or keep the counters in the
>>>>> > depot but add an API that does not change them (easier for the clients, but
>>>>> > potentially error-prone).
>>>>> > I'd say it's better to actually find at least one more user for the stack
>>>>> > depot in order to understand the requirements, and refactor the code after
>>>>> > that.
>>>>
>>>> I re-think the page owner case and it also may not need refcount.
>>>> For now, just moving this stuff to /lib would be helpful for other future user.
>>> I agree this code may need to be moved to /lib someday, but I wouldn't
>>> hurry with that.
>>> Right now it is quite KASAN-specific, and it's unclear yet whether
>>> anyone else is going to use it.
>>> I suggest we keep it in mm/kasan for now, and factor the common parts
>>> into /lib when the need arises.
>>>
>>>> BTW, is there any performance number? I guess that it could affect
>>>> the performance.
>>> I've compared the performance of KASAN with SLAB allocator on a small
>>> synthetic benchmark in two modes: with stack depot enabled and with
>>> kasan_save_stack() unconditionally returning 0.
>>> In the former case 8% more time was spent in the kernel than in the latter case.
>>>
>>> If I am not mistaking, for SLUB allocator the bookkeeping (enabled
>>> with the slub_debug=UZ boot options) take only 1.5 time, so the
>>> difference is worth looking into (at least before we switch SLUB to
>>> stack depot).
>>
>> I've made additional measurements.
>> Previously I had been using a userspace benchmark that created and
>> destroyed pipes in a loop
>> (https://github.com/google/sanitizers/blob/master/address-sanitizer/kernel_buildbot/slave/bench_pipes.c).
>>
>> Now I've made a kernel module that allocated and deallocated memory
>> chunks of different sizes in a loop.
>> There were two modes of operation:
>> 1) all the allocations were made from the same function, therefore all
>> allocation/deallocation stacks were similar and there always was a hit
>> in the stackdepot hashtable
>> 2) The allocations were made from 2^16 different stacks.
>>
>> In the first case SLAB+stackdepot turned out to be 13% faster than
>> SLUB+slub_debug, in the second SLAB was 11% faster.
>
> I don't know what version of kernel you tested but, until recently,
> slub_debug=UZ has a side effect not to using fastpath of SLUB. So,
> comparison between them isn't appropriate. Today's linux-next branch
> would have some improvements on this area so use it to compare them.
>
That's good to know.
I've been using https://github.com/torvalds/linux.git, which probably
didn't have those improvements.

>> Note that in both cases and for both allocators most of the time (more
>> than 90%) was spent in the x86 stack unwinder, which is common for
>> both approaches.
>
> If more than 90% time is spent in stack unwinder which is common for
> both cases, how something is better than the other by 13%?
On the second glance, this number (90%) may be inaccurate, because I
measured the stack unwinding times separately, which could have
introduced deviation (not to mention it was incorrect for SLUB).
Yet we're talking about a significant amount of time spent in the unwinder.
My numbers were 26.111 seconds for 1024K SLAB allocation/deallocation
pairs and 30.278 seconds for 1024K alloc/dealloc pairs with SLUB.
When measured separately in the same routine that did the allocations,
2048K calls to save_stack_trace() took 25.487 seconds.

>> Yet another observation regarding stackdepot: under a heavy load
>> (running Trinity for a hour, 101M allocations) the depot saturates at
>> around 20K records with the hashtable miss rate of 0.02%.
>> That said, I still cannot justify the results of the userspace
>> benchmark, but the slowdown of the stackdepot approach for SLAB sounds
>> acceptable, especially given the memory gain compared to SLUB
>> bookkeeping (which requires 128 bytes per memory allocation) and the
>> fact we'll be dealing with the fast path most of the time.
>
> In fact, I don't have much concern about performance because saving
> memory has enough merit to be merged. Anyway, it looks acceptable
> even for performance.
>
>> It will certainly be nice to compare SLUB+slub_debug to
>> SLUB+stackdepot once we start switching SLUB to stackdepot.
>
> Okay.
>
> Thanks.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind,
leiten Sie diese bitte nicht weiter, informieren Sie den
Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.
This e-mail is confidential. If you are not the right addressee please
do not forward it, please inform the sender, and please erase this
e-mail including any attachments. Thanks.