[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a8f2ec4d-2616-17dc-320b-b7ac5ca99255@loongson.cn>
Date: Tue, 24 Jun 2025 09:39:13 +0800
From: Tianyang Zhang <zhangtianyang@...ngson.cn>
To: Thomas Gleixner <tglx@...utronix.de>, chenhuacai@...nel.org,
kernel@...0n.name, corbet@....net, alexs@...nel.org, si.yanteng@...ux.dev,
jiaxun.yang@...goat.com, peterz@...radead.org, wangliupu@...ngson.cn,
lvjianmin@...ngson.cn, maobibo@...ngson.cn, siyanteng@...oftware.com.cn,
gaosong@...ngson.cn, yangtiezhu@...ngson.cn
Cc: loongarch@...ts.linux.dev, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/2] irq/irq-loongarch-ir:Add Redirect irqchip support
Hi, Thomas
在 2025/6/24 上午4:05, Thomas Gleixner 写道:
> On Mon, Jun 23 2025 at 17:33, Tianyang Zhang wrote:
>> 在 2025/6/23 下午4:27, Thomas Gleixner 写道:
>>> tail is the software managed part of the ringbuffer which is shared with
>>> the hardware, right?
>>>
>>> So even if the compiler would be allowed to reevalutate tail after the
>>> barrier (it is NOT), then tail would still contain the same value as
>>> before, no?
>>>
>>> The wmb() is required to ensure that the hardware can observe the full
>>> write of the command _before_ it can observe the update to the tail
>>> index.
>>>
>>> Anything else is voodoo.
>>>
>>> Thanks,
>>>
>>> tglx
>> In my previous understanding, tail'value is actually a part of 'full
>> write of the command '.
> Of course. The hardware observes the tail value. If it is not updated
> then the command is not executed. But these are two distinct things:
>
> The invalidate command is written to a location in the command buffer
> which is determined by tail:
>
> inv_addr = (struct irde_inv_cmd *)(rqueue->base + tail * sizeof(struct irde_inv_cmd));
> memcpy(inv_addr, cmd, sizeof(struct irde_inv_cmd));
>
> requeue::base points to an array of invalidate commands. The array
> size is INVALID_QUEUE_SIZE. tail is the position in the array to which
> the software writes the next command. tail is software managed and
> written to a completely different location via write_queue_tail(...):
>
> static phys_addr_t redirect_reg_base = 0x1fe00000;
>
> #define REDIRECT_REG_BASE(reg, node) \
> (UNCACHE_BASE | redirect_reg_base | (u64)(node) << NODE_ADDRSPACE_SHIFT | (reg))
> #define redirect_reg_queue_tail(node) REDIRECT_REG_BASE(LOONGARCH_IOCSR_REDIRECT_CQT, (node))
> #define read_queue_tail(node) (*((u32 *)(redirect_reg_queue_tail(node))))
> #define write_queue_tail(node, val) (*((u32 *)(redirect_reg_queue_tail(node))) = (val))
>
> The hardware maintains the head index. It's the last command index the
> hardware processed. When the hardware observes that head != tail then it
> processes the next entry and after completion it updates head with the
> next index. This repeats until head == tail.
>
>> We must ensure that tail is updated to the correct value first, and
>> then write this value into the register (perhaps by adding wmb in
>> write_queue_tail ).
> No. The local variable 'tail' is purely local to the CPU and the
> invalidation hardware does not even know that it exists.
>
> There are two things which are relevant to the hardware:
>
> 1) command is written to the hardware visible array at index of tail
>
> 2) hardware visible tail memory (register) is updated to tail + 1
>
> The memory barrier is required to prevent that #2 is written to the
> hardware _before_ #1 completed and is fully visible to the hardware.
>
>> In other words, this is originally to prevent the write register
>> instruction from being executed out of order before updating tail.
> No. The barrier is solely for the above #1 vs. #2 ordering.
>
> There is a difference between program flow ordering and memory ordering.
>
> The hardware _CANNOT_ execute the write _before_ the value it writes is
> computed. That's enforced by program flow order.
>
> So it's always guaranteed that the CPU executes
>
> tail + 1
>
> _before_ executing the write to the register because that's a program
> order dependency.
>
> If that would not be guaranteed, then your CPU would have more serious
> problems than this piece of code. It simply would not even survive the
> first five instructions in the boot loader.
>
> But what's not guaranteed is that consecutive writes become visible to
> other parts of the system (other CPUs, the invalidation hardware, ....)
> in the same consecutive order.
>
> To ensure that ordering you need a WMB(). If you would have CPU to CPU
> communication then you would need a RMB() on the reader side to ensure
> that the command is not read before the tail. In your case the hardware
> side takes care of that.
>
>> The above is just my personal understanding
> Which might need some slight adjustments :)
>
> Thanks,
>
> tglx
Hmm... it seems I confused program-flow-ordering and memory-ordering .
my understanding of the previous issue may not have been right.
Your reply has taught me a great deal, truly appreciate it
Thank you very much
Tianyang
Powered by blists - more mailing lists