[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <92A96585-DAE5-46C5-8D2A-2EED92F75FDF@oracle.com>
Date: Mon, 14 Jun 2021 03:27:00 +0000
From: Haakon Bugge <haakon.bugge@...cle.com>
To: Greg KH <gregkh@...uxfoundation.org>
CC: Jason Gunthorpe <jgg@...dia.com>,
Leon Romanovsky <leon@...nel.org>,
Doug Ledford <dledford@...hat.com>,
Kees Cook <keescook@...omium.org>,
Nathan Chancellor <nathan@...nel.org>,
Adit Ranadive <aditr@...are.com>,
Ariel Elior <aelior@...vell.com>,
Christian Benvenuti <benve@...co.com>,
"clang-built-linux@...glegroups.com"
<clang-built-linux@...glegroups.com>,
Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
Devesh Sharma <devesh.sharma@...adcom.com>,
Gal Pressman <galpress@...zon.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
OFED mailing list <linux-rdma@...r.kernel.org>,
Michal Kalderon <mkalderon@...vell.com>,
Mike Marciniszyn <mike.marciniszyn@...nelisnetworks.com>,
Mustafa Ismail <mustafa.ismail@...el.com>,
Naresh Kumar PBS <nareshkumar.pbs@...adcom.com>,
Nelson Escobar <neescoba@...co.com>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Potnuri Bharat Teja <bharat@...lsio.com>,
Selvin Xavier <selvin.xavier@...adcom.com>,
Shiraz Saleem <shiraz.saleem@...el.com>,
VMware PV-Drivers <pv-drivers@...are.com>,
Yishai Hadas <yishaih@...dia.com>,
Zhu Yanjun <zyjzyj2000@...il.com>
Subject: Re: [PATCH rdma-next v1 10/15] RDMA/cm: Use an attribute_group on the
ib_port_attribute intead of kobj's
> On 11 Jun 2021, at 10:16, Greg KH <gregkh@...uxfoundation.org> wrote:
>
> On Fri, Jun 11, 2021 at 07:25:46AM +0000, Haakon Bugge wrote:
>>
>>
>>> On 7 Jun 2021, at 14:50, Jason Gunthorpe <jgg@...dia.com> wrote:
>>>
>>> On Mon, Jun 07, 2021 at 02:39:45PM +0200, Greg KH wrote:
>>>> On Mon, Jun 07, 2021 at 09:14:11AM -0300, Jason Gunthorpe wrote:
>>>>> On Mon, Jun 07, 2021 at 12:25:03PM +0200, Greg KH wrote:
>>>>>> On Mon, Jun 07, 2021 at 11:17:35AM +0300, Leon Romanovsky wrote:
>>>>>>> From: Jason Gunthorpe <jgg@...dia.com>
>>>>>>>
>>>>>>> This code is trying to attach a list of counters grouped into 4 groups to
>>>>>>> the ib_port sysfs. Instead of creating a bunch of kobjects simply express
>>>>>>> everything naturally as an ib_port_attribute and add a single
>>>>>>> attribute_groups list.
>>>>>>>
>>>>>>> Remove all the naked kobject manipulations.
>>>>>>
>>>>>> Much nicer.
>>>>>>
>>>>>> But why do you need your counters to be atomic in the first place? What
>>>>>> are they counting that requires this?
>>>>>
>>>>> The write side of the counter is being updated from concurrent kernel
>>>>> threads without locking, so this is an atomic because the write side
>>>>> needs atomic_add().
>>>>
>>>> So the atomic write forces a lock :(
>>>
>>> Of course, but a single atomic is cheaper than the double atomic in a
>>> full spinlock.
>>>
>>>>> Making them a naked u64 will cause significant corruption on the write
>>>>> side, and packet counters that are not accurate after quiescence are
>>>>> not very useful things.
>>>>
>>>> How "accurate" do these have to be?
>>>
>>> They have to be accurate. They are networking packet counters. What is
>>> the point of burning CPU cycles keeping track of inaccurate data?
>>
>> Consider a CPU with a 32-bit wide datapath to memory, which reads and writes the most significant 4-byte word first:
>
> What CPU is that?
Hypothetical32 :-)
>> Memory CPU1 CPU2
>> MSW LSW MSW LSW MSW LSW
>> 0x0 0xffffffff
>> 0x0 0xffffffff 0x0
>> 0x0 0xffffffff 0x0 0xffffffff
>> 0x0 0xffffffff 0x1 0x0 cpu1 has incremented its register
>> 0x1 0xffffffff 0x1 0x0 cpu1 has written msw
>> 0x1 0xffffffff 0x1 0x0 0x1 cpu2 has read msw
>> 0x1 0xffffffff 0x1 0x0 0x1 0xffffffff
>> 0x1 0x0 0x1 0x0 0x2 0x0
>> 0x2 0x0 0x1 0x0 0x2 0x0
>> 0x2 0x0 0x1 0x0 0x2 0x0
>>
>>
>> I would say that 0x200000000 vs. 0x100000001 is more than inaccurate!
>
> True, then maybe these should just be 32bit counters :)
How long can we then run without wrapping? Our UEK is security updated by means of ksplice, and since the introduction of Spectre/Meltdown CPU fixes in 2018, we have been able to update the kernels running wrt. security fixes.
I see no harm by using an atomic 64-bit add. Yes, it serializes the pipeline and locks the cache-line in the first-level cache for as long as it takes to for a RMW on it. Compared to surround an ordinary add with lock/unlock, an atomic increment is strongly preferred in my opinion.
Ordinary add without locking leads to the issue above on systems with 32-bit wide memory data paths.
Using 32-bit counters raises the issue of wrapping in systems running for years and having a high frequency of IB connection forming and resurrections.
Thxs, HÃ¥kon
> thanks,
>
> greg k-h
Powered by blists - more mailing lists