[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ac8c272-6887-5406-50e3-7b87b302498d@huawei.com>
Date: Sat, 23 Dec 2023 16:54:38 +0800
From: Zeng Heng <zengheng4@...wei.com>
To: David Laight <David.Laight@...LAB.COM>, "mingo@...hat.com"
<mingo@...hat.com>, "will@...nel.org" <will@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>, "longman@...hat.com"
<longman@...hat.com>, "boqun.feng@...il.com" <boqun.feng@...il.com>
CC: "xiexiuqi@...wei.com" <xiexiuqi@...wei.com>, "liwei391@...wei.com"
<liwei391@...wei.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] locking/osq_lock: Avoid false sharing in
optimistic_spin_node
在 2023/12/22 20:40, David Laight 写道:
> From: Zeng Heng
>> Sent: 22 December 2023 12:11
>>
>> Using the UnixBench test suite, we clearly find that osq_lock() cause
>> extremely high overheads with perf tool in the File Copy items:
>>
>> Overhead Shared Object Symbol
>> 94.25% [kernel] [k] osq_lock
>> 0.74% [kernel] [k] rwsem_spin_on_owner
>> 0.32% [kernel] [k] filemap_get_read_batch
>>
>> In response to this, we conducted an analysis and made some gains:
>>
>> In the prologue of osq_lock(), it set `cpu` member of percpu struct
>> optimistic_spin_node with the local cpu id, after that the value of the
>> percpu struct would never change in fact. Based on that, we can regard
>> the `cpu` member as a constant variable.
>>
> ...
>> @@ -9,7 +11,13 @@
>> struct optimistic_spin_node {
>> struct optimistic_spin_node *next, *prev;
>> int locked; /* 1 if lock acquired */
>> - int cpu; /* encoded CPU # + 1 value */
>> +
>> + CACHELINE_PADDING(_pad1_);
>> + /*
>> + * Stores an encoded CPU # + 1 value.
>> + * Only read by other cpus, so split into different cache lines.
>> + */
>> + int cpu;
>> };
> Isn't this structure embedded in every mutex and rwsem (etc)?
> So that is a significant bloat especially on systems with
> large cache lines.
>
> Did you try just moving the initialisation of the per-cpu 'node'
> below the first fast-path (uncontended) test in osq_lock()?
>
> OTOH if you really have multiple cpu spinning on the same rwsem
> perhaps the test and/or filemap code are really at fault!
>
> David
Hi,
The File Copy items of UnixBench testsuite are using 1 read file and 1
write file
for file read/write/copy test. In multi-parallel scenario, that would
lead to high
file lock contention.
That is just a performance test suite and has nothing to do with whether
the user
program design is correct or not.
B.R.,
Zeng Heng
Powered by blists - more mailing lists