[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2be6038ad8cb43559495e6f84e97b8a6@AUSX13MPS306.AMER.DELL.COM>
Date: Tue, 30 Oct 2018 18:23:28 +0000
From: <Justin.Lee1@...l.com>
To: <sam@...dozajonas.com>, <netdev@...r.kernel.org>
CC: <davem@...emloft.net>, <linux-kernel@...r.kernel.org>,
<openbmc@...ts.ozlabs.org>
Subject: RE: [PATCH net-next v2 5/6] net/ncsi: Reset channel state in
ncsi_start_dev()
> On Fri, 2018-10-26 at 17:25 +0000, Justin.Lee1@...l.com wrote:
> > Hi Samuel,
> >
> > I noticed a few issues and commented below.
> >
> > Thanks,
> > Justin
> >
> >
> > > /* Resources */
> > > +int ncsi_reset_dev(struct ncsi_dev *nd);
> > > void ncsi_start_channel_monitor(struct ncsi_channel *nc);
> > > void ncsi_stop_channel_monitor(struct ncsi_channel *nc);
> > > struct ncsi_channel *ncsi_find_channel(struct ncsi_package *np,
> > > diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> > > index 014321ad31d3..9bad03e3fa5e 100644
> > > --- a/net/ncsi/ncsi-manage.c
> > > +++ b/net/ncsi/ncsi-manage.c
> > > @@ -550,8 +550,10 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp)
> > > spin_lock_irqsave(&nc->lock, flags);
> > > nc->state = NCSI_CHANNEL_INACTIVE;
> > > spin_unlock_irqrestore(&nc->lock, flags);
> > > - ncsi_process_next_channel(ndp);
> > > -
> > > + if (ndp->flags & NCSI_DEV_RESET)
> > > + ncsi_reset_dev(nd);
> > > + else
> > > + ncsi_process_next_channel(ndp);
> > > break;
> > > default:
> > > netdev_warn(nd->dev, "Wrong NCSI state 0x%x in suspend\n",
> > > @@ -1554,7 +1556,7 @@ int ncsi_start_dev(struct ncsi_dev *nd)
> > > return 0;
> > > }
> > >
> > > - return ncsi_choose_active_channel(nd);
> > > + return ncsi_reset_dev(nd);
> >
> > If there is no available channel due to the whitelist, ncsi_start_dev() function will return failed
> > Status and the network interface may fail to bring up too. It is possible for user to disable all
> > channels and leave the interface up for checking the LOM status.
> >
>
> I'm not sure that that is a bug, or at least not in the scope of this
> series. If the whitelist is set such that no channels are valid then
> there's nothing for NCSI to do. If we want to do something like always
> monitor all channels then that would be best to do in another patch.
>
> > > }
> > > EXPORT_SYMBOL_GPL(ncsi_start_dev);
> >
> > Also, if I send set_package_mask and set_channel_mask commands back to back in a program,
> > the state machine doesn't work well. If I use command line and wait for it to complete for
> > each step, then it is fine.
>
> Yeah that's not great; probably hitting some corner cases in the NCSI
> locking. I'll look into the multi-channel related stuff but I have a
> feeling that if you tried this with the existing set/clear commands you
> would probably hit something similar, especially on your dual core
> platform. If so this is probably something to fix separately.
>
It is possible that it is causing by the following code in ncsi_reset_dev() function.
The state might be overwritten and the previous operation is interrupted.
spin_lock_irqsave(&ndp->lock, flags);
ndp->flags |= NCSI_DEV_RESET;
ndp->active_channel = active;
ndp->active_package = active->package;
spin_unlock_irqrestore(&ndp->lock, flags);
nd->state = ncsi_dev_state_suspend;
> >
> > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-package enabled on ifindex 2, mask 0x00000001
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 0
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0400
> > npcm7xx-emc f0825000.eth eth2: NCSI: pkg 0 ch 0 set as preferred channel
> > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000003
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 1
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 1 state 0400
> > npcm7xx-emc f0825000.eth eth2: NCSI: Package 1 set to all channels disabled
> > npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000000
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 0
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass pkg whitelist
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 0
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 1
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - next pkg
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 1
> > npcm7xx-emc f0825000.eth eth2: NCSI: No channel found to configure!
> > npcm7xx-emc f0825000.eth eth2: NCSI interface down
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > npcm7xx-emc f0825000.eth eth2: Wrong NCSI state 0x100 in workqueue
> >
> > All masks are set correctly, but you can see the PS column is not right and channel doesn't
> > configure correctly.
> >
> > /sys/kernel/debug/ncsi_protocol# cat ncsi_device_status
> > IFIDX IFNAME NAME PID CID RX TX MP MC WP WC PC PS LS RU CR NQ HA
> > ===================================================================
> > 2 eth2 ncsi0 000 000 1 1 1 1 1 1 1 0 1 1 1 0 1
> > 2 eth2 ncsi1 000 001 1 0 1 1 1 1 0 0 1 1 1 0 1
> > 2 eth2 ncsi2 001 000 0 0 1 1 0 0 0 0 1 1 1 0 1
> > 2 eth2 ncsi3 001 001 0 0 1 1 0 0 0 0 1 1 1 0 1
> > ===================================================================
> > MP: Multi-mode Package WP: Whitelist Package
> > MC: Multi-mode Channel WC: Whitelist Channel
> > PC: Primary Channel
> > PS: Poll Status
> > LS: Link Status
> > RU: Running
> > CR: Carrier OK
> > NQ: Queue Stopped
> > HA: Hardware Arbitration
> >
> > PS column is getting from (int)nc->monitor.enabled.
Powered by blists - more mailing lists