[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191007095658.GF45043@arrakis.emea.arm.com>
Date: Mon, 7 Oct 2019 10:56:58 +0100
From: Catalin Marinas <catalin.marinas@....com>
To: Alexey Kardashevskiy <aik@...abs.ru>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>,
Matthew Wilcox <willy@...radead.org>, Qian Cai <cai@....pw>
Subject: Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct
scan_area allocation failures
On Sat, Oct 05, 2019 at 01:08:43PM +1000, Alexey Kardashevskiy wrote:
> On 03/10/2019 18:41, Catalin Marinas wrote:
> > On Thu, Oct 03, 2019 at 04:13:07PM +1000, Alexey Kardashevskiy wrote:
> >> On 13/08/2019 02:06, Catalin Marinas wrote:
> >>> Object scan areas are an optimisation aimed to decrease the false
> >>> positives and slightly improve the scanning time of large objects known
> >>> to only have a few specific pointers. If a struct scan_area fails to
> >>> allocate, kmemleak can still function normally by scanning the full
> >>> object.
> >>>
> >>> Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
> >>> scan_area allocation fails.
> >>
> >> I came across this one while bisecting sudden drop in throughput of a
> >> 100Gbit Mellanox CX4 ethernet card in a PPC POWER9 system, the speed
> >> dropped from 100Gbit to about 40Gbit. Bisect pointed at dba82d943177,
> >> this are the relevant config options:
> >>
> >> [fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
> >> CONFIG_HAVE_DEBUG_KMEMLEAK=y
> >> CONFIG_DEBUG_KMEMLEAK=y
> >> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
> >> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
> >> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
> >> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y
> >
> > The throughput drop is probably caused caused by kmemleak slowing down
> > all memory allocations (including skb). So that's expected. You may get
> > similar drop with other debug options like lock proving, kasan.
>
> I was not precise. I meant that before dba82d943177 kmemleak would
> work but would not slow network down (at least 100Gbit) and now it
> does which is downgrade so I was wondering if kmemleak just got so
> much better to justify this change or there is a bug somewhere, so
> which one is it? Or "LOG_SIZE=400" never really worked?
I suspect LOG_SIZE=400 never worked on your system (you can check the
log for any kmemleak disabled messages).
Thanks for testing the patch.
--
Catalin
Powered by blists - more mailing lists