netdev - Re: [PATCH] mm/vmstats: add counters for the page frag cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <d6120888-344a-4449-4ca6-ac98508bb3cf@yandex-team.ru>
Date:   Mon, 4 Sep 2017 11:30:55 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Kyeongdon Kim <kyeongdon.kim@....com>, akpm@...ux-foundation.org,
        sfr@...b.auug.org.au
Cc:     ying.huang@...el.com, vbabka@...e.cz, hannes@...xchg.org,
        xieyisheng1@...wei.com, luto@...nel.org, shli@...com,
        mhocko@...e.com, mgorman@...hsingularity.net,
        hillf.zj@...baba-inc.com, kemi.wang@...el.com, rientjes@...gle.com,
        bigeasy@...utronix.de, iamjoonsoo.kim@....com, bongkyu.kim@....com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] mm/vmstats: add counters for the page frag cache

On 04.09.2017 04:35, Kyeongdon Kim wrote:
> Thanks for your reply,
> But I couldn't find "NR_FRAGMENT_PAGES" in linux-next.git .. is that vmstat counter? or others?
> 

I mean rather than adding bunch vmstat counters for operations it might be
worth to add page counter which will show current amount of these pages.
But this seems too low-level for tracking, common counters for all network
buffers would be more useful but much harder to implement.

As I can see page owner is able to save stacktrace where allocation happened,
this makes debugging mostly trivial without any counters. If it adds too much
overhead - just track random 1% of pages, should be enough for finding leak.

> As you know, page_frag_alloc() directly calls __alloc_pages_nodemask() function,
> so that makes too difficult to see memory usage in real time even though we have "/meminfo or /slabinfo.." information.
> If there was a way already to figure out the memory leakage from page_frag_cache in mainline, I agree your opinion
> but I think we don't have it now.
> 
> If those counters too much in my patch,
> I can say two values (pgfrag_alloc and pgfrag_free) are enough to guess what will happen
> and would remove pgfrag_alloc_calls and pgfrag_free_calls.
> 
> Thanks,
> Kyeongdon Kim
> 
> On 2017-09-01 오후 6:12, Konstantin Khlebnikov wrote:
>> IMHO that's too much counters.
>> Per-node NR_FRAGMENT_PAGES should be enough for guessing what's going on.
>> Perf probes provides enough features for furhter debugging.
>>
>> On 01.09.2017 02:37, Kyeongdon Kim wrote:
>> > There was a memory leak problem when we did stressful test
>> > on Android device.
>> > The root cause of this was from page_frag_cache alloc
>> > and it was very hard to find out.
>> >
>> > We add to count the page frag allocation and free with function call.
>> > The gap between pgfrag_alloc and pgfrag_free is good to to calculate
>> > for the amount of page.
>> > The gap between pgfrag_alloc_calls and pgfrag_free_calls is for
>> > sub-indicator.
>> > They can see trends of memory usage during the test.
>> > Without it, it's difficult to check page frag usage so I believe we
>> > should add it.
>> >
>> > Signed-off-by: Kyeongdon Kim <kyeongdon.kim@....com>
>> > ---