[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e011b3e9-7c09-4214-8e9c-90e12c38bbaa@quicinc.com>
Date: Fri, 1 Dec 2023 23:34:40 +0800
From: "Aiqun(Maria) Yu" <quic_aiquny@...cinc.com>
To: Linus Walleij <linus.walleij@...aro.org>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
"open list:GPIO SUBSYSTEM" <linux-gpio@...r.kernel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Charles Keepax <ckeepax@...nsource.cirrus.com>,
Chester Lin <clin@...e.com>
Subject: Re: [GIT PULL] Pin control fixes for v6.7
On 12/1/2023 6:06 PM, Aiqun(Maria) Yu wrote:
> On 12/1/2023 4:10 PM, Linus Walleij wrote:
>> Hi Nathan, Nick,
>>
>> (just picking some LLVM compiler people I know of... and trust)
>>
>> Context is this patch:
>> https://lore.kernel.org/linux-gpio/20231115102824.23727-1-quic_aiquny@quicinc.com/
>>
>> On Thu, Nov 30, 2023 at 6:37 AM Aiqun(Maria) Yu
>> <quic_aiquny@...cinc.com> wrote:
>>> On 11/29/2023 11:08 PM, Linus Walleij wrote:
>>>> On Wed, Nov 29, 2023 at 3:56 PM Linus Torvalds
>>>> <torvalds@...ux-foundation.org> wrote:
>>>>> On Wed, 29 Nov 2023 at 04:09, Linus Walleij
>>>>> <linus.walleij@...aro.org> wrote:
>>>>>>
>>>>>> The most interesting patch is the list iterator fix in the core by
>>>>>> Maria
>>>>>> Yu, it took a while for me to realize what was going on there.
>>>>>
>>>>> That commit message still doesn't explain what the problem was.
>>>>>
>>>>> Why is p->state volatile there? It seems to be a serious locking bug
>>>>> if p->state can randomly change there, and the READ_ONCE() looks like
>>>>> a "this hides the problem" rather than an actual real fix.
>>>
>>> This is indeed an interesting issue. Thx for the comment, Linus.
>>> **Let me explain how: "p->state becomes volatile in the list iterator",
>>> and "why READ_ONCE is suggested".
>>>
>>> The current critical code is:
>>> list_for_each_entry(setting, &p->state->settings, node)
>>>
>>> after elaborating the define list_for_each_entry, so above critical code
>>> will be:
>>> for (setting = list_head(&p->state->settings, typeof(*setting),
>>> node); \
>>> &setting->node != (&p->state->settings); \
>>> setting = list_next(setting , node))
>>>
>>> The asm code(refer result from Clang version 10.0) can cleared explain
>>> the step of p->state reload actions:
>>> loop:
>>> ldr x22,[x22] ; x22=list_next(setting , node))
>>> add x9,x8,#0x18 ; x9=&p->state->setting
>>> cmp x22,x9 ; setting,x9
>>> b.eq 0xFFFFFF9B24483530
>>>
>>> ldr w9,[x22,#0x10] ; w9,[setting,#16]
>>> cmp w9,#0x2 ; w9,#2
>>> b.ne 0xFFFFFF9B24483504
>>>
>>> mov x0,x22 ; x0,setting
>>> bl 0xFFFFFF9B24486048 ; pinmux_disable_setting
>>>
>>> ldr x8,[x19,#0x28] ; x19=p, x8=[p->state], *reload p->state*
>>> b loop
>>>
>>> The *reload p->state* inside the loop for checking is not needed and can
>>> cause possible infinite loop. So READ_ONCE is highly suggested even if
>>> p->state is not randomly changed. And then unnecessary "ldr
>>> x8,[x19,#0x28]" can be removed from above loop code.
>>>
>>> **Comments about the locking bug:
>>> currently pinctrl_select_state is an export symbol and don't have
>>> effective reentrance protect design. That's why current infinite loop
>>> issue was observed with customer's multi thread call with
>>> pinctrl_select_state without lock protection. pinctrl_select_state
>>> totally rely on driver module user side to ensure the reentrant state.
>>>
>>> Usually the suggested usage from driver side who are using pinctrl
>>> would be:
>>> LOCK;
>>> pinctrl_select_state();
>>> gpio pulling;
>>> udelay();
>>> check state;
>>> other hardware behaviors;
>>> UNLOCK;
>>>
>>> So the locking bug fix I have told customer side to fix from their own
>>> driver part. Since usually not only a simple pinctrl_select_state call
>>> can finish the hardware state transaction.
>>>
>>> I myself also is fine to have a small per pinctrl lock to only protect
>>> the current pinctrl_select_state->pinctrl_commit_state reentrance
>>> issues. Pls any pinctrl maintainer help to comment to suggest or not and
>>> I can prepare a change as well.
>>
>> Luckily I am the pin control maintainer :D
>> And I also ha my morning coffee and looked over the patch again.
>>
>> So tilting the compiler to generate code that is less prone to race
>> conditions with READ_ONCE() isn't really the solution is it? We need
>> to introduce a proper lock that stops this from happening if it is
>> a problem people are facing.
>>
>> Can you try to make a patch that removes READ_ONCE()
>> and introduce a lock instead?
>>
>> Racing is rarely an issue in pin control for reasons explained
>> in another context here:
>> https://lore.kernel.org/linux-gpio/CACRpkdZ0cnJpYuzU=47-oW-7N_YGMo2vXpKOeXeNi5PhPY7QMA@mail.gmail.com/
>>
>> ...but if people still manage to run into it, we better have a lock
>> there. Just make sure it is not just an issue with outoftree code,
>> but a real problem?
>>
>> The change that changes the code to use the old_state variable
>> should stay in, it makes the code more readable, it's just the
>> READ_ONCE() macro which is dubious.
> Thx for confirm. I am preparing the new change now. :)
change uploaded link here:
https://lore.kernel.org/linux-gpio/20231201152931.31161-1-quic_aiquny@quicinc.com/
>
> READ_ONCE can only avoid the possible infinite loop and not crash the
> whole kernel, while the lock is needed to protect the multi parallel
> call of pinctrl_commit_state api have a consistent atomic hardware
> result as well.
>>
>> Yours,
>> Linus Walleij
>
--
Thx and BRs,
Aiqun(Maria) Yu
Powered by blists - more mailing lists