[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <E722F06F-FB78-4566-B5BD-754C1F6BD641@gmail.com>
Date: Fri, 21 Jul 2023 23:21:51 +0800
From: Alan Huang <mmpgouride@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
rcu@...r.kernel.org, "Paul E. McKenney" <paulmck@...nel.org>,
roman.gushchin@...ux.dev
Subject: Re: Question about the barrier() in hlist_nulls_for_each_entry_rcu()
> 2023年7月21日 22:47,Eric Dumazet <edumazet@...gle.com> 写道:
>
> On Fri, Jul 21, 2023 at 4:31 PM Alan Huang <mmpgouride@...il.com> wrote:
>>
>>
>>> 2023年7月21日 05:11,Eric Dumazet <edumazet@...gle.com> 写道:
>>>
>>> On Thu, Jul 20, 2023 at 10:00 PM Alan Huang <mmpgouride@...il.com> wrote:
>>>>
>>>>
>>>>> 2023年7月21日 03:22,Eric Dumazet <edumazet@...gle.com> 写道:
>>>>>
>>>>> On Thu, Jul 20, 2023 at 8:54 PM Alan Huang <mmpgouride@...il.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I noticed a commit c87a124a5d5e(“net: force a reload of first item in hlist_nulls_for_each_entry_rcu”)
>>>>>> and a related discussion [1].
>>>>>>
>>>>>> After reading the whole discussion, it seems like that ptr->field was cached by gcc even with the deprecated
>>>>>> ACCESS_ONCE(), so my question is:
>>>>>>
>>>>>> Is that a compiler bug? If so, has this bug been fixed today, ten years later?
>>>>>>
>>>>>> What about READ_ONCE(ptr->field)?
>>>>>
>>>>> Make sure sparse is happy.
>>>>
>>>> It caused a problem without barrier(), and the deprecated ACCESS_ONCE() didn’t help:
>>>>
>>>> https://lore.kernel.org/all/519D19DA.50400@yandex-team.ru/
>>>>
>>>> So, my real question is: With READ_ONCE(ptr->field), are there still some unusual cases where gcc
>>>> decides not to reload ptr->field?
>>>
>>> I can not really answer without seeing an actual patch...
>>
>> The content of the potential patch:
>>
>> diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h
>> index 89186c499dd4..bcd39670f359 100644
>> --- a/include/linux/rculist_nulls.h
>> +++ b/include/linux/rculist_nulls.h
>> @@ -158,15 +158,9 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
>> * @pos: the &struct hlist_nulls_node to use as a loop cursor.
>> * @head: the head of the list.
>> * @member: the name of the hlist_nulls_node within the struct.
>> - *
>> - * The barrier() is needed to make sure compiler doesn't cache first element [1],
>> - * as this loop can be restarted [2]
>> - * [1] Documentation/memory-barriers.txt around line 1533
>> - * [2] Documentation/RCU/rculist_nulls.rst around line 146
>> */
>> #define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
>> - for (({barrier();}), \
>> - pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
>> + for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
>> (!is_a_nulls(pos)) && \
>> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
>> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
>> @@ -180,8 +174,7 @@ static inline void hlist_nulls_add_fake(struct hlist_nulls_node *n)
>> * @member: the name of the hlist_nulls_node within the struct.
>> */
>> #define hlist_nulls_for_each_entry_safe(tpos, pos, head, member) \
>> - for (({barrier();}), \
>> - pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
>> + for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
>> (!is_a_nulls(pos)) && \
>> ({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); \
>> pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)); 1; });)
>>
>>
>>>
>>> Why are you asking ? Are you tracking compiler bug fixes ?
>>
>> The barrier() here makes me confused.
>>
>> If we really need that, do we need:
>>
>> READ_ONCE(head->first);
>> barrier();
>> READ_ONCE(head->first);
>>
>
> Nope, the patch you want to revert (while it did fix (by pure luck
> ???) a real bug back in the days) was replacing
>
> ACCESS_ONCE()
>
> by
>
> barrier();
> ACCESS_ONCE();
Yeah.
The commit message and related discussions all indicate
that the compiler cached a value accessed through volatile.
That’s why I asked here.
>
> (There is one ACCESS_ONCE(), not two of them)
>
> BTW,
> barrier();
> followed by an arbitrary number of barrier(); back to back,
> translates to one barrier()
>
> Frankly, I would not change the code, unless someone can explain what
> was the issue.
> (Perhaps there was a missing barrier elsewhere)
Fair enough, although I feel like this is masking the real issue.
(I feel like no one will know the real issue ten years later, when no one knew it ten years ago.)
Anyway, thanks for your time.
Powered by blists - more mailing lists