[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9b5a04e9-39e7-ffa1-b43e-831a4f0c4b73@huaweicloud.com>
Date: Tue, 28 Feb 2023 09:49:07 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: paulmck@...nel.org
Cc: Andrea Parri <parri.andrea@...il.com>,
Alan Stern <stern@...land.harvard.edu>,
Jonas Oberhauser <jonas.oberhauser@...wei.com>,
will@...nel.org, peterz@...radead.org, boqun.feng@...il.com,
npiggin@...il.com, dhowells@...hat.com, j.alglave@....ac.uk,
luc.maranget@...ia.fr, akiyks@...il.com, dlustig@...dia.com,
joel@...lfernandes.org, urezki@...il.com, quic_neeraju@...cinc.com,
frederic@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po
On 2/27/2023 11:21 PM, Paul E. McKenney wrote:
> On Mon, Feb 27, 2023 at 09:13:01PM +0100, Jonas Oberhauser wrote:
>>
>> On 2/27/2023 8:40 PM, Andrea Parri wrote:
>>>> The LKMM doesn't believe that a control or data dependency orders a
>>>> plain write after a marked read. Hence in this test it thinks that P1's
>>>> store to u0 can happen before the load of x1. I don't remember why we
>>>> did it this way -- probably we just wanted to minimize the restrictions
>>>> on when plain accesses can execute. (I do remember the reason for
>>>> making address dependencies induce order; it was so RCU would work.)
>>>>
>>>> The patch below will change what the LKMM believes. It eliminates the
>>>> positive outcome of the litmus test and the data race. Should it be
>>>> adopted into the memory model?
>>> (Unpopular opinion I know,) it should drop dependencies ordering, not
>>> add/promote it.
>>>
>>> Andrea
>> Maybe not as unpopular as you think... :)
>> But either way IMHO it should be consistent; either take all the
>> dependencies that are true and add them, or drop them all.
>> In the latter case, RCU should change to an acquire barrier. (also, one
>> would have to deal with OOTA in some yet different way).
>>
>> Generally my position is that unless there's a real-world benchmark with
>> proven performance benefits of relying on dependency ordering, one should
>> use an acquire barrier. I haven't yet met such a case, but maybe one of you
>> has...
> https://www.msully.net/thesis/thesis.pdf page 128 (PDF page 141).
>
> Though this is admittedly for ARMv7 and PowerPC.
>
Thanks for the link.
It's true that on architectures that don't have an acquire load (and
have to use a fence), the penalty will be bigger.
But the more obvious discussion would be what constitutes a real-world
benchmark : )
In my experience you can get a lot of performance benefits out of
optimizing barriers in code if all you execute is that code.
But once you embed that into a real-world application, often 90%-99% of
time spent will be in the business logic, not in the data structure.
And then the benefits suddenly disappear.
Note that a lot of barriers are a lot cheaper as well when there's no
contention.
Because of that, making optimization decisions based on microbenchmarks
can sometimes lead to a very poor "time invested" vs "total product
improvement" ratio.
Best wishes,
jonas
Powered by blists - more mailing lists