[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <908be351-b4f8-4c25-9171-4f033e11ffc4@linux.alibaba.com>
Date: Wed, 8 Jan 2025 12:57:00 +0800
From: Guangguan Wang <guangguan.wang@...ux.alibaba.com>
To: Halil Pasic <pasic@...ux.ibm.com>, Paolo Abeni <pabeni@...hat.com>
Cc: wenjia@...ux.ibm.com, jaka@...ux.ibm.com, alibuda@...ux.alibaba.com,
tonylu@...ux.alibaba.com, guwen@...ux.alibaba.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, horms@...nel.org,
linux-rdma@...r.kernel.org, linux-s390@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Alexandra Winter <wintera@...ux.ibm.com>
Subject: Re: [PATCH net] net/smc: use the correct ndev to find pnetid by
pnetid table
On 2025/1/8 03:32, Halil Pasic wrote:
> On Tue, 7 Jan 2025 09:44:30 +0100
> Paolo Abeni <pabeni@...hat.com> wrote:
>
>> On 12/27/24 5:04 AM, Guangguan Wang wrote:
>
> @Guangguan Wang: please use my linux.ibm.com address
> in the future.
Get it.
>
>>> The command 'smc_pnet -a -I <ethx> <pnetid>' will add <pnetid>
>>> to the pnetid table and will attach the <pnetid> to net device
>>> whose name is <ethx>. But When do SMCR by <ethx>, in function
>>> smc_pnet_find_roce_by_pnetid, it will use <ethx>'s base ndev's
>>> pnetid to match rdma device, not <ethx>'s pnetid. The asymmetric
>>> use of the pnetid seems weird. Sometimes it is difficult to know
>>> the hierarchy of net device what may make it difficult to configure
>>> the pnetid and to use the pnetid. Looking into the history of
>>> commit, it was the commit 890a2cb4a966 ("net/smc: rework pnet table")
>>> that changes the ndev from the <ethx> to the <ethx>'s base ndev
>>> when finding pnetid by pnetid table. It seems a mistake.
>>>
>>> This patch changes the ndev back to the <ethx> when finding pnetid
>>> by pnetid table.
>>>
>>> Fixes: 890a2cb4a966 ("net/smc: rework pnet table")
>>> Signed-off-by: Guangguan Wang <guangguan.wang@...ux.alibaba.com>
>>
>> If I read correctly, this will break existing applications using the
>> lookup schema introduced by the blamed commit - which is not very
>> recent.
>
> Hi Paolo,
>
> sorry for chiming in late. Wenjia is on vacation and Jan is out sick!
> After some reading and thinking I could not figure out how 890a2cb4a966
> ("net/smc: rework pnet table") is broken.
Before commit 890a2cb4a966:
smc_pnet_find_roce_resource
smc_pnet_find_roce_by_pnetid(ndev, ...) /* lookup via hardware-defined pnetid */
smc_pnetid_by_dev_port(base_ndev, ...)
smc_pnet_find_roce_by_table(ndev, ...) /* lookup via SMC PNET table */
{
...
list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) {
if (ndev == pnetelem->ndev) { /* notice here, it was ndev to matching pnetid element in pnet table */
...
}
After commit 890a2cb4a966:
smc_pnet_find_roce_resource
smc_pnet_find_roce_by_pnetid
{
...
base_ndev = pnet_find_base_ndev(ndev); /* rename the variable name to base_ndev for better understanding */
smc_pnetid_by_dev_port(base_ndev, ...)
smc_pnet_find_ndev_pnetid_by_table(base_ndev, ...)
{
...
list_for_each_entry(pnetelem, &smc_pnettable.pnetlist, list) {
if (base_ndev == pnetelem->ndev) { /* notice here, it is base_ndev to matching pnetid element in pnet table */
...
}
}
The commit 890a2cb4a966 has changed ndev to base_ndev when matching pnetid element in pnet table.
But in the function smc_pnet_add_eth, the pnetid is attached to the ndev itself, not the base_ndev.
smc_pnet_add_eth(...)
{
...
ndev = dev_get_by_name(net, eth_name);
...
if (new_netdev) {
if (ndev) {
new_pe->ndev = ndev;
netdev_tracker_alloc(ndev, &new_pe->dev_tracker,
GFP_ATOMIC);
}
list_add_tail(&new_pe->list, &pnettable->pnetlist);
mutex_unlock(&pnettable->lock);
} else {
...
}
>
> Admittedly I'm not really a net guy,and I'm mostly guessing what that
> lower and upper device stuff is, so please bear with me. All that said, I
> do think that going to the lowest netdev in the hierarchy is a sane
> thing to do here. I assume that lower and upper devices are applicable
> to stuff like bonding.
>
> PNETID stands for "Physical Network Identifier" and the idea is that iff
> two ports are connected to the same physical network then they should
> have the same PNETID. And on s390 PNETID can come and often is comming
> "from the hardware". Now for something like a bond of two OSA
> interfaces, I would expect the two legs of the bond to probably have a
> "HW PNETID", but the netdev representing the bond itself won't have one
> unless the Linux admin defines a software PNETID, which is work, and
> can't have a HW PNETID because it is a software construct within Linux.
> Breaking for example an active-backup bond setup where the legs have
> HW PNETIDs and the admin did not bother to specify a PNETID for the bond
> is not acceptable.
>
> Let me also note that if ndev is a leaf (i.e. there is no lower device to
> it) then ndev == base_ndev, and the whole discussion does not matter for
> that case.
>
> Again I have to emphasize that my domain knowledge is very limited, but
> I really don't feel comfortable going forward with this without Jan or
> Wenjia weighing in on the matter.
>
> Paolo thanks for bringing this up!
>
>>
>> Perhaps for a net patch would be better to support both lookup schemas
>> i.e.
>>
>> (smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) ||
>> smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid))
>>
>> ?
>>
>
> Hm, I guess the idea here is that if ndev has a PNETID then it should
> take precedence, but if not we should try to obtain the PNETID of its
> "base_ndev". I'm not sure this would make things better compared to the
> original idea of caring about the leaf. Which makes me question my
> understanding of the problem statement from the commit message.
>
> BTW to implement the logic proposed by you Paolo, as understood by me,
> we would have to use "&&" instead of "||". The whole expression is
> supposed
> to evaluate to false if a pnetid is found and to true if no pnet_id is
> found. smc_pnet_find_ndev_pnetid_by_table(ndev) returns false if a pnetid
> is found. I.e. if not found we would just short circuit to true and not
> call smc_pnet_find_ndev_pnetid_by_table(base_ndev), which is not what I
> believe you wanted to propose.
Yes, it should be
(smc_pnet_find_ndev_pnetid_by_table(ndev, ndev_pnetid) &&
smc_pnet_find_ndev_pnetid_by_table(base_ndev, ndev_pnetid))
if for the consideration of application's compatible usage of smc_pnet.
Thanks,
Guangguan Wang
>
> To sum it up, please let us wait until Wenjia or Jan chime in. Copying
> Alexandra as well: she is more of a net person than I am, and maybe she
> has a more informed opinion.
>
> Regards,
> Halil
> [...]
>
Powered by blists - more mailing lists