[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFuBwq78nZOJJ8itg0Kj8B2K1z5uRh2VEVNuBM=6wp0Wg@mail.gmail.com>
Date: Wed, 23 Aug 2023 14:01:31 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: David Laight <David.Laight@...lab.com>
Cc: Jan Kara <jack@...e.cz>, Dennis Zhou <dennis@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tj@...nel.org" <tj@...nel.org>, "cl@...ux.com" <cl@...ux.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"shakeelb@...gle.com" <shakeelb@...gle.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 0/2] execve scalability issues, part 1
On 8/23/23, David Laight <David.Laight@...lab.com> wrote:
> From: Jan Kara
>> Sent: Wednesday, August 23, 2023 10:49 AM
> ....
>> > --- a/include/linux/mm_types.h
>> > +++ b/include/linux/mm_types.h
>> > @@ -737,7 +737,11 @@ struct mm_struct {
>> >
>> > unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for
>> > /proc/PID/auxv */
>> >
>> > - struct percpu_counter rss_stat[NR_MM_COUNTERS];
>> > + union {
>> > + struct percpu_counter rss_stat[NR_MM_COUNTERS];
>> > + u64 *rss_stat_single;
>> > + };
>> > + bool magic_flag_stuffed_elsewhere;
>
> I wouldn't use a union to save a pointer - it is asking for trouble.
>
I may need to abandon this bit anyway -- counter init adds counters to
a global list and I can't call easily call it like that.
>> >
>> > struct linux_binfmt *binfmt;
>> >
>> >
>> > Then for single-threaded case an area is allocated for NR_MM_COUNTERS
>> > countes * 2 -- first set updated without any synchro by current
>> > thread. Second set only to be modified by others and protected with
>> > mm->arg_lock. The lock protects remote access to the union to begin
>> > with.
>>
>> arg_lock seems a bit like a hack. How is it related to rss_stat? The
>> scheme
>> with two counters is clever but I'm not 100% convinced the complexity is
>> really worth it. I'm not sure the overhead of always using an atomic
>> counter would really be measurable as atomic counter ops in local CPU
>> cache
>> tend to be cheap. Did you try to measure the difference?
>
> A separate lock is worse than atomics.
> (Although some 32bit arch may have issues with 64bit atomics.)
>
But in my proposal the separate lock is used to facilitate *NOT* using
atomics by the most common consumer -- the only thread.
The lock is only used for the transition to multithreaded state for
updated by remote parties (both rare compared to updated by current).
> I think you'll be surprised just how slow atomic ops are.
> Even when present in the local cache.
> (Probably because any other copies have to be invalidated.)
>
Agreed. They have always been super expensive on x86-64 (and continue
to be). I keep running to claims they are not, I don't know where
that's coming from.
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists