[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvZod48Fwua_VJvnzHatF-J4YRWqfMFnYjYN6W0_ioLtPZEfA@mail.gmail.com>
Date: Wed, 29 Mar 2023 21:51:56 -0700
From: Shakeel Butt <shakeelb@...gle.com>
To: Bagas Sanjaya <bagasdotme@...il.com>
Cc: Feng Tang <feng.tang@...el.com>, Jonathan Corbet <corbet@....net>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
Randy Dunlap <rdunlap@...radead.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
Joe Mario <jmario@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Eric Dumazet <edumazet@...gle.com>, dave.hansen@...el.com,
ying.huang@...el.com, tim.c.chen@...el.com, andi.kleen@...el.com
Subject: Re: [PATCH v2] Documentation: Add document for false sharing
On Wed, Mar 29, 2023 at 9:27 PM Bagas Sanjaya <bagasdotme@...il.com> wrote:
>
> On Wed, Mar 29, 2023 at 03:33:22PM +0800, Feng Tang wrote:
> > +False sharing hurting performance cases are seen more frequently with
> > +core count increasing. Because of these detrimental effects, many
> > +patches have been proposed across variety of subsystems (like
> > +networking and memory management) and merged. Some common mitigations
> > +(with examples) are:
> > +
> > +* Separate hot global data in its own dedicated cache line, even if it
> > + is just a 'short' type. The downside is more consumption of memory,
> > + cache line and TLB entries.
> > +
> > + - Commit 91b6d3256356 ("net: cache align tcp_memory_allocated, tcp_sockets_allocated")
> > +
> > +* Reorganize the data structure, separate the interfering members to
> > + different cache lines. One downside is it may introduce new false
> > + sharing of other members.
> > +
> > + - Commit 802f1d522d5f ("mm: page_counter: re-layout structure to reduce false sharing")
> > +
> > +* Replace 'write' with 'read' when possible, especially in loops.
> > + Like for some global variable, use compare(read)-then-write instead
> > + of unconditional write. For example, use::
> > +
> > + if (!test_bit(XXX))
> > + set_bit(XXX);
> > +
> > + instead of directly "set_bit(XXX);", similarly for atomic_t data.
> "... The similar technique is also applicable to atomic_t data".
>
> But how?
>
> > +
> > + - Commit 7b1002f7cfe5 ("bcache: fixup bcache_dev_sectors_dirty_add() multithreaded CPU false sharing")
> > + - Commit 292648ac5cf1 ("mm: gup: allow FOLL_PIN to scale in SMP")
> > +
> > +* Turn hot global data to 'per-cpu data + global data' when possible,
> > + or reasonably increase the threshold for syncing per-cpu data to
> > + global data, to reduce or postpone the 'write' to that global data.
> > +
> > + - Commit 520f897a3554 ("ext4: use percpu_counters for extent_status cache hits/misses")
> > + - Commit 56f3547bfa4d ("mm: adjust vm_committed_as_batch according to vm overcommit policy")
> > +
>
> Here's what I mean by bridging conjunctions to example commits as I reviewed
> in v1 [1]:
>
This is too much and unneeded nitpicking. The patch looks good as is.
Powered by blists - more mailing lists