lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.22.394.2007091429200.24933@www.lameter.com>
Date:   Thu, 9 Jul 2020 14:32:33 +0000 (UTC)
From:   Christopher Lameter <cl@...ux.com>
To:     Pekka Enberg <penberg@...il.com>
cc:     Xunlei Pang <xlpang@...ux.alibaba.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Wen Yang <wenyang@...ux.alibaba.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Roman Gushchin <guro@...com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] mm/slub: Introduce two counters for the partial
 objects

On Tue, 7 Jul 2020, Pekka Enberg wrote:

> On Fri, Jul 3, 2020 at 12:38 PM xunlei <xlpang@...ux.alibaba.com> wrote:
> >
> > On 2020/7/2 PM 7:59, Pekka Enberg wrote:
> > > On Thu, Jul 2, 2020 at 11:32 AM Xunlei Pang <xlpang@...ux.alibaba.com> wrote:
> > >> The node list_lock in count_partial() spend long time iterating
> > >> in case of large amount of partial page lists, which can cause
> > >> thunder herd effect to the list_lock contention, e.g. it cause
> > >> business response-time jitters when accessing "/proc/slabinfo"
> > >> in our production environments.
> > >
> > > Would you have any numbers to share to quantify this jitter? I have no
> >
> > We have HSF RT(High-speed Service Framework Response-Time) monitors, the
> > RT figures fluctuated randomly, then we deployed a tool detecting "irq
> > off" and "preempt off" to dump the culprit's calltrace, capturing the
> > list_lock cost up to 100ms with irq off issued by "ss", this also caused
> > network timeouts.
>
> Thanks for the follow up. This sounds like a good enough motivation
> for this patch, but please include it in the changelog.


Well this is access via sysfs causing a holdoff. Another way of access to
the same information without adding atomics and counters would be best.

> > I also have no idea what's the standard SLUB benchmark for the
> > regression test, any specific suggestion?
>
> I don't know what people use these days. When I did benchmarking in
> the past, hackbench and netperf were known to be slab-allocation
> intensive macro-benchmarks. Christoph also had some SLUB
> micro-benchmarks, but I don't think we ever merged them into the tree.

They are still where they have been for the last decade or so. In my git
tree on kernel.org. They were also reworked a couple of times and posted
to linux-mm. There are historical posts going back over the years where
individuals have modified them and used them to create multiple other
tests.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ