[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mhng-f42f1283-50c1-48f1-ab2e-c5700b352aa1@palmer-ri-x1c9>
Date: Thu, 23 Jun 2022 15:15:32 -0700 (PDT)
From: Palmer Dabbelt <palmer@...belt.com>
To: boqun.feng@...il.com
CC: Daniel Lustig <dlustig@...dia.com>, parri.andrea@...il.com,
guoren@...nel.org, Arnd Bergmann <arnd@...db.de>,
mark.rutland@....com, Will Deacon <will@...nel.org>,
peterz@...radead.org, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-riscv@...ts.infradead.org,
guoren@...ux.alibaba.com
Subject: Re: [PATCH V4 5/5] riscv: atomic: Optimize LRSC-pairs atomic ops with .aqrl annotation
On Thu, 23 Jun 2022 10:55:11 PDT (-0700), boqun.feng@...il.com wrote:
> On Thu, Jun 23, 2022 at 01:09:23PM -0400, Dan Lustig wrote:
>> On 6/22/2022 11:31 PM, Boqun Feng wrote:
>> > Hi,
>> >
>> > On Tue, Jun 14, 2022 at 01:03:47PM +0200, Andrea Parri wrote:
>> > [...]
>> >>> 5ce6c1f3535f ("riscv/atomic: Strengthen implementations with fences")
>> >>> is about fixup wrong spinlock/unlock implementation and not relate to
>> >>> this patch.
>> >>
>> >> No. The commit in question is evidence of the fact that the changes
>> >> you are presenting here (as an optimization) were buggy/incorrect at
>> >> the time in which that commit was worked out.
>> >>
>> >>
>> >>> Actually, sc.w.aqrl is very strong and the same with:
>> >>> fence rw, rw
>> >>> sc.w
>> >>> fence rw,rw
>> >>>
>> >>> So "which do not give full-ordering with .aqrl" is not writen in
>> >>> RISC-V ISA and we could use sc.w/d.aqrl with LKMM.
>> >>>
>> >>>>
>> >>>>>> describes the issue more specifically, that's when we added these
>> >>>>>> fences. There have certainly been complains that these fences are too
>> >>>>>> heavyweight for the HW to go fast, but IIUC it's the best option we have
>> >>>>> Yeah, it would reduce the performance on D1 and our next-generation
>> >>>>> processor has optimized fence performance a lot.
>> >>>>
>> >>>> Definately a bummer that the fences make the HW go slow, but I don't
>> >>>> really see any other way to go about this. If you think these mappings
>> >>>> are valid for LKMM and RVWMO then we should figure this out, but trying
>> >>>> to drop fences to make HW go faster in ways that violate the memory
>> >>>> model is going to lead to insanity.
>> >>> Actually, this patch is okay with the ISA spec, and Dan also thought
>> >>> it was valid.
>> >>>
>> >>> ref: https://lore.kernel.org/lkml/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/raw
>> >>
>> >> "Thoughts" on this regard have _changed_. Please compare that quote
>> >> with, e.g.
>> >>
>> >> https://lore.kernel.org/linux-riscv/ddd5ca34-805b-60c4-bf2a-d6a9d95d89e7@nvidia.com/
>> >>
>> >> So here's a suggestion:
>> >>
>> >> Reviewers of your patches have asked: How come that code we used to
>> >> consider as buggy is now considered "an optimization" (correct)?
>> >>
>> >> Denying the evidence or going around it is not making their job (and
>> >> this upstreaming) easier, so why don't you address it? Take time to
>> >> review previous works and discussions in this area, understand them,
>> >> and integrate such knowledge in future submissions.
>> >>
>> >
>> > I agree with Andrea.
>> >
>> > And I actually took a look into this, and I think I find some
>> > explanation. There are two versions of RISV memory model here:
>> >
>> > Model 2017: released at Dec 1, 2017 as a draft
>> >
>> > https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/hKywNHBkAXM/m/QzUtxEWLBQAJ
>> >
>> > Model 2018: released at May 2, 2018
>> >
>> > https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/xW03vmfmPuA/m/bMPk3UCWAgAJ
>> >
>> > Noted that previous conversation about commit 5ce6c1f3535f happened at
>> > March 2018. So the timeline is roughly:
>> >
>> > Model 2017 -> commit 5ce6c1f3535f -> Model 2018
>> >
>> > And in the email thread of Model 2018, the commit related to model
>> > changes also got mentioned:
>> >
>> > https://github.com/riscv/riscv-isa-manual/commit/b875fe417948635ed68b9644ffdf718cb343a81a
>> >
>> > in that commit, we can see the changes related to sc.aqrl are:
>> >
>> > to have occurred between the LR and a successful SC. The LR/SC
>> > sequence can be given acquire semantics by setting the {\em aq} bit on
>> > -the SC instruction. The LR/SC sequence can be given release semantics
>> > -by setting the {\em rl} bit on the LR instruction. Setting both {\em
>> > - aq} and {\em rl} bits on the LR instruction, and setting the {\em
>> > - aq} bit on the SC instruction makes the LR/SC sequence sequentially
>> > -consistent with respect to other sequentially consistent atomic
>> > -operations.
>> > +the LR instruction. The LR/SC sequence can be given release semantics
>> > +by setting the {\em rl} bit on the SC instruction. Setting the {\em
>> > + aq} bit on the LR instruction, and setting both the {\em aq} and the {\em
>> > + rl} bit on the SC instruction makes the LR/SC sequence sequentially
>> > +consistent, meaning that it cannot be reordered with earlier or
>> > +later memory operations from the same hart.
>> >
>> > note that Model 2018 explicitly says that "ld.aq+sc.aqrl" is ordered
>> > against "earlier or later memory operations from the same hart", and
>> > this statement was not in Model 2017.
>> >
>> > So my understanding of the story is that at some point between March and
>> > May 2018, RISV memory model folks decided to add this rule, which does
>> > look more consistent with other parts of the model and is useful.
>> >
>> > And this is why (and when) "ld.aq+sc.aqrl" can be used as a fully-ordered
>> > barrier ;-)
>> >
>> > Now if my understanding is correct, to move forward, it's better that 1)
>> > this patch gets resend with the above information (better rewording a
>> > bit), and 2) gets an Acked-by from Dan to confirm this is a correct
>> > history ;-)
>>
>> I'm a bit lost as to why digging into RISC-V mailing list history is
>> relevant here...what's relevant is what was ratified in the RVWMO
>> chapter of the RISC-V spec, and whether the code you're proposing
>> is the most optimized code that is correct wrt RVWMO.
>>
>> Is your claim that the code you're proposing to fix was based on a
>> pre-RVWMO RISC-V memory model definition, and you're updating it to
>> be more RVWMO-compliant?
>>
>
> Well, first it's not my code ;-)
>
> The thing is that this patch proposed by Guo Ren kinda fixes/revertes a
> previous commit 5ce6c1f3535f ("riscv/atomic: Strengthen implementations
> with fences"). It looks to me that Guo Ren's patch fits the current
> RISCV memory model and Linux kernel memory model, but the question is
> that commit 5ce6c1f3535f was also a fix, so why do we change things
> back and forth? If I understand correctly, this is also what Palmer and
> Andrea asked for.
That's essentially my confusion. I'm not really a formal memory model
guy so I can easily get lost in these bits, but when I saw it I
remembered having looked at a fix before. I dug that up, saw it was
from someone who likley knows a lot more about formal memory models than
I do, and thus figured I'd ask everyone to see what's up.
IMO if that original fix was made to a pre-ratification version of WMO,
this new version is legal WRT the ratified WMO then the code change is
fine to take on for-next. That said, we should be explicit about why
it's legal and why the reasoning in the previous patch is no loger
connect, just to make sure everyone can follow along.
> My understanding is that commit 5ce6c1f3535f was based on a draft RVWMO
> that was different than the current one. I'd love to record this
> difference in the commit log of Guo Ren's patch, so that later on we
> know why we changed things back and forth. To do so, the confirmation
> from RVWMO authors is helpful.
Agreed. IMO that's always good hygine, but it's extra important when
we're dealing with external specifications in a complex field like
formal models.
> Hope that I make things more clear ;-)
>
> Regards,
> Boqun
>
>> Dan
>>
>> > Regards,
>> > Boqun
>> >
>> >> Andrea
>> >>
>> >>
>> > [...]
Powered by blists - more mailing lists