linux-kernel - Re: [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125 and 8125B start to reduce spinlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d9206b2b-64da-41f0-bc64-f2807129277a@alu.unizg.hr>
Date:   Tue, 31 Oct 2023 14:39:19 +0100
From:   Mirsad Todorovac <mirsad.todorovac@....hr>
To:     Akira Yokosawa <akiyks@...il.com>, mirsad.todorovac@....unizg.hr
Cc:     hkallweit1@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 6/7] r8169: Coalesce mac ocp write and modify for 8125
 and 8125B start to reduce spinlocks

On 10/31/2023 9:23 AM, Akira Yokosawa wrote:
> Hello Mirsad,
> 
> [most CCs dropped]
> 
> I'm responding to your comment quoted below.  It caught eyes of me
> who happens to be a reviewer of LKMM and a LaTeX advisor to perfbook.
> 
> On Mon, 30 Oct 2023 16:02:28 +0100, Mirsad Todorovac wrote:
>> On 10/30/23 15:02, Heiner Kallweit wrote:
> [...]
>>>
>>> All this manual locking and unlocking makes the code harder
>>> to read and more error-prone. Maybe, as a rule of thumb:
>>> If you can replace a block with more than 10 mac ocp ops,
>>> then fine with me
>> As I worked with another German developer, Mr. Frank Heckenbach from the GNU Pascal project,
>> I know that Germans are pedantic and reliable :-)
>>
>> If this rtl_hw_start_8125_common() is called only once, then maybe every memory bus cycle
>> isn't worth saving, and then maybe the additional complexity isn't worth adding (but it
>> was fun doing, and it works with my NIC).
>>
>> AFAIK, a spin_lock_irqsave()/spin_unlock_irqrestore() isn't a free lunch as you know, and I read
>> from the manuals that on modern CPUs a locked ADD $0, -128(%esp) or something takes about 50
>> clock cycles, in which all cores have to wait.
> 
> Do you mean, while one of x86 CPUs is executing "lock; addl $0, -4(%esp)"
> aka smp_mb(), bus locking prevents all the other CPUs in the system
> connected to the bus from doing any memory accesses ???
> 
> If it is the case, I believe you are missing the state of the art
> optimization of x86 memory system architecture, where most of atomic
> operations are done using cache locking.  Bus locking is used only
> when it is unavoidable.
> 
> Hint: A short introduction can be found at stackoverflow.com [1].
> Quote of (then) section 7.1.4 from Intel's SDM vol 3A in the answer
> should suffice.
> 
> [1]: https://stackoverflow.com/questions/3339141/x86-lock-question-on-multi-core-cpus
> 
> A reachable link to Intel SDM should be somewhere in perfbook's bibliography.
> The relevant section in Vol 3A is now "2.8.5 Controlling the Processor".
> 
> Hope this helps,
> Akira

Thanks for the tip. I really need to catch up with my homework and the 
documentation.

Yesterday I've lost a friend who worked very hard for her PhD, so I 
wonder again about the purpose of meaning. :-/

Thanks,
Mirsad

>> Doing that in storm of 10 lock/unlock pairs amounts to 500 cycles or 125 ns in the best case
>> on a 4 GHz CPU.
>>
>> But I trust that you as the maintainer have the big picture and greater insight in the actual hw.
>