linux-kernel - Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.22.394.2007091429200.24933@www.lameter.com>
Date:   Thu, 9 Jul 2020 14:32:33 +0000 (UTC)
From:   Christopher Lameter <cl@...ux.com>
To:     Pekka Enberg <penberg@...il.com>
cc:     Xunlei Pang <xlpang@...ux.alibaba.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Wen Yang <wenyang@...ux.alibaba.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Roman Gushchin <guro@...com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial
 objects

On Tue, 7 Jul 2020, Pekka Enberg wrote:

> On Fri, Jul 3, 2020 at 12:38 PM xunlei <xlpang@...ux.alibaba.com> wrote:
> >
> > On 2020/7/2 PM 7:59, Pekka Enberg wrote:
> > > On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang <xlpang@...ux.alibaba.com> wrote:
> > >> The node list_lock in count_partial() spend long time iterating
> > >> in case of large amount of partial page lists, which can cause
> > >> thunder herd effect to the list_lock contention, e.g. it cause
> > >> business response-time jitters when accessing "/proc/slabinfo"
> > >> in our production environments.
> > >
> > > Would you have any numbers to share to quantify this jitter? I have no
> >
> > We have HSF RT(High-speed Service Framework Response-Time) monitors, the
> > RT figures fluctuated randomly, then we deployed a tool detecting "irq
> > off" and "preempt off" to dump the culprit's calltrace, capturing the
> > list_lock cost up to 100ms with irq off issued by "ss", this also caused
> > network timeouts.
>
> Thanks for the follow up. This sounds like a good enough motivation
> for this patch, but please include it in the changelog.


Well this is access via sysfs causing a holdoff. Another way of access to
the same information without adding atomics and counters would be best.

> > I also have no idea what's the standard SLUB benchmark for the
> > regression test, any specific suggestion?
>
> I don't know what people use these days. When I did benchmarking in
> the past, hackbench and netperf were known to be slab-allocation
> intensive macro-benchmarks. Christoph also had some SLUB
> micro-benchmarks, but I don't think we ever merged them into the tree.

They are still where they have been for the last decade or so. In my git
tree on kernel.org. They were also reworked a couple of times and posted
to linux-mm. There are historical posts going back over the years where
individuals have modified them and used them to create multiple other
tests.