[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9206b2b-64da-41f0-bc64-f2807129277a@alu.unizg.hr>
Date: Tue, 31 Oct 2023 14:39:19 +0100
From: Mirsad Todorovac <mirsad.todorovac@....hr>
To: Akira Yokosawa <akiyks@...il.com>, mirsad.todorovac@....unizg.hr
Cc: hkallweit1@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125
and 8125B start to reduce spinlocks
On 10/31/2023 9:23 AM, Akira Yokosawa wrote:
> Hello Mirsad,
>
> [most CCs dropped]
>
> I'm responding to your comment quoted below. It caught eyes of me
> who happens to be a reviewer of LKMM and a LaTeX advisor to perfbook.
>
> On Mon, 30 Oct 2023 16:02:28 +0100, Mirsad Todorovac wrote:
>> On 10/30/23 15:02, Heiner Kallweit wrote:
> [...]
>>>
>>> All this manual locking and unlocking makes the code harder
>>> to read and more error-prone. Maybe, as a rule of thumb:
>>> If you can replace a block with more than 10 mac ocp ops,
>>> then fine with me
>> As I worked with another German developer, Mr. Frank Heckenbach from the GNU Pascal project,
>> I know that Germans are pedantic and reliable :-)
>>
>> If this rtl_hw_start_8125_common() is called only once, then maybe every memory bus cycle
>> isn't worth saving, and then maybe the additional complexity isn't worth adding (but it
>> was fun doing, and it works with my NIC).
>>
>> AFAIK, a spin_lock_irqsave()/spin_unlock_irqrestore() isn't a free lunch as you know, and I read
>> from the manuals that on modern CPUs a locked ADD $0, -128(%esp) or something takes about 50
>> clock cycles, in which all cores have to wait.
>
> Do you mean, while one of x86 CPUs is executing "lock; addl $0, -4(%esp)"
> aka smp_mb(), bus locking prevents all the other CPUs in the system
> connected to the bus from doing any memory accesses ???
>
> If it is the case, I believe you are missing the state of the art
> optimization of x86 memory system architecture, where most of atomic
> operations are done using cache locking. Bus locking is used only
> when it is unavoidable.
>
> Hint: A short introduction can be found at stackoverflow.com [1].
> Quote of (then) section 7.1.4 from Intel's SDM vol 3A in the answer
> should suffice.
>
> [1]: https://stackoverflow.com/questions/3339141/x86-lock-question-on-multi-core-cpus
>
> A reachable link to Intel SDM should be somewhere in perfbook's bibliography.
> The relevant section in Vol 3A is now "2.8.5 Controlling the Processor".
>
> Hope this helps,
> Akira
Thanks for the tip. I really need to catch up with my homework and the
documentation.
Yesterday I've lost a friend who worked very hard for her PhD, so I
wonder again about the purpose of meaning. :-/
Thanks,
Mirsad
>> Doing that in storm of 10 lock/unlock pairs amounts to 500 cycles or 125 ns in the best case
>> on a 4 GHz CPU.
>>
>> But I trust that you as the maintainer have the big picture and greater insight in the actual hw.
>
Powered by blists - more mailing lists