lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Nov 2018 18:07:49 +0000
From:   <Justin.Lee1@...l.com>
To:     <sam@...dozajonas.com>, <netdev@...r.kernel.org>
CC:     <davem@...emloft.net>, <linux-kernel@...r.kernel.org>,
        <openbmc@...ts.ozlabs.org>
Subject: RE: [PATCH net-next v3 6/6] net/ncsi: Configure multi-package,
 multi-channel modes with failover

Hi Samuel,

The extra patch fixed most issues but I see another corner case.

Thanks,
Justin


> On Thu, 2018-11-08 at 22:48 +0000, Justin.Lee1@...l.com wrote:
> > Hi Samuel,
> > 
> > For multi-package and multi-channel case, channel seems to be select correctly. Expect that,
> > I still see the timing issue for back-to-back netlink command. Due to that, channel might be
> > set to invisible state. Please refer to ncsi0 and ncsi2 below. The channel state is set to 3.
> > 
> > cat /sys/kernel/debug/ncsi_protocol/ncsi_device_status
> > IFIDX IFNAME NAME   PID CID RX TX MP MC WP WC PC CS PS LS RU CR NQ HA
> > =====================================================================
> >   2   eth2   ncsi0  000 000 1  1  1  1  1  0  0  3  0  1  1  1  0  1
> >   2   eth2   ncsi1  000 001 0  0  1  1  1  0  0  1  0  1  1  1  0  1
> >   2   eth2   ncsi2  001 000 1  0  1  1  1  1  0  3  0  1  1  1  0  1
> >   2   eth2   ncsi3  001 001 1  0  1  1  1  1  0  2  1  1  1  1  0  1
> > =====================================================================
> > MP: Multi-mode Package  WP: Whitelist Package
> > MC: Multi-mode Channel  WC: Whitelist Channel
> > PC: Primary Channel     CS: Channel State IA/A/IV 1/2/3
> > PS: Poll Status         LS: Link Status
> > RU: Running             CR: Carrier OK
> > NQ: Queue Stopped       HA: Hardware Arbitration
> > 
> > The timing issue is not only happening in application. If I use using the following way
> > to send the request, I can see the issue as well. 
> > 
> > ncsi_netlink -l 2 -a 0x01 -m; ncsi_netlink -l 2 -p 0 -b 0x03 -m; ncsi_netlink -l 2 -p 1 -b 0x00 -m;
> > ncsi_netlink -l 2 -a 0x03 -m; ncsi_netlink -l 2 -p 0 -b 0x00 -m; ncsi_netlink -l 2 -p 1 -b 0x03 -m;
> 
> This actually recreates for me as well; I see now what you mean about
> channels getting stuck in the invisible state. I believe I've narrowed
> down the issue. I've pasted an additional patch below if you are able to
> test on your machine.
> 
> > 
> > 
> > Also, there is one issue below for non-multi-package/non-multi-channel case.
> > 
> > Thanks,
> > Justin
> > 
> > 
> > > @@ -1008,32 +1164,49 @@ static int ncsi_choose_active_channel(struct ncsi_dev_priv *ndp)
> > >  
> > >  			ncm = &nc->modes[NCSI_MODE_LINK];
> > >  			if (ncm->data[2] & 0x1) {
> > > -				spin_unlock_irqrestore(&nc->lock, flags);
> > >  				found = nc;
> > > -				goto out;
> > > +				with_link = true;
> > >  			}
> > >  
> > > -			spin_unlock_irqrestore(&nc->lock, flags);
> > > +			/* If multi_channel is enabled configure all valid
> > > +			 * channels whether or not they currently have link
> > > +			 * so they will have AENs enabled.
> > > +			 */
> > > +			if (with_link || np->multi_channel) {
> > > +				spin_lock_irqsave(&ndp->lock, flags);
> > > +				list_add_tail_rcu(&nc->link,
> > > +						  &ndp->channel_queue);
> > > +				spin_unlock_irqrestore(&ndp->lock, flags);
> > > +
> > > +				netdev_dbg(ndp->ndev.dev,
> > > +					   "NCSI: Channel %u added to queue (link %s)\n",
> > > +					   nc->id,
> > > +					   ncm->data[2] & 0x1 ? "up" : "down");
> > > +			}
> > > +
> > > +			spin_unlock_irqrestore(&nc->lock, cflags);
> > > +
> > > +			if (with_link && !np->multi_channel)
> > > +				break;
> > 
> > The line needs to change to "goto found". If not, all channels with link will be added
> > even if the multi-channel is not enabled for that package. The ncsi1 below is enabled.
> > There is no netlink command sent to enable multi-package or multi-channel.
> > 
> > IFIDX IFNAME NAME   PID CID RX TX MP MC WP WC PC CS PS LS RU CR NQ HA
> > =====================================================================
> >   2   eth2   ncsi0  000 000 1  1  0  0  1  1  0  2  1  1  1  1  0  1
> >   2   eth2   ncsi1  000 001 1  0  0  0  1  1  0  2  1  1  1  1  0  1
> >   2   eth2   ncsi2  001 000 0  0  0  0  1  1  0  1  0  1  1  1  0  1
> >   2   eth2   ncsi3  001 001 0  0  0  0  1  1  0  1  0  1  1  1  0  1
> > =====================================================================
> > MP: Multi-mode Package  WP: Whitelist Package
> > MC: Multi-mode Channel  WC: Whitelist Channel
> > PC: Primary Channel     CS: Channel State IA/A/IV 1/2/3
> > PS: Poll Status         LS: Link Status
> > RU: Running             CR: Carrier OK
> > NQ: Queue Stopped       HA: Hardware Arbitration
> > 
> > >  		}
> > > +		if (with_link && !ndp->multi_package)
> > > +			break;
> > >  	}
> > 
> > found:
> 
> This *may* be part of the above issue, I don't see this in normal
> operation. The combination of (with_link && !np->multi_channel) and
> (with_link && !ndp->multi_package) should prevent additional channels
> being added without the need for 'goto found'. Please let me know if you
> still see it with the extra patch.
> 

This one is fixed by your extra patch but I see another corner case. Basically,
the issue is due to the link status is not updating during the process of
choosing the active channel. During the process, maybe we should find a 
chance to send Get Link Status command to all whitelist channels.

Here is my step.
1. All four channels plug Ethernet cable.
2. BMC starts without any multi-package/multi-channel configuration.
3. The ncsi0 channel is selected by NC-SI driver.
4. Remove cable from ncsi0.
5. The ncsi1 channel is selected by NC-SI driver.
6. Remove cable from ncsi1.
7. The ncsi2 channel is selected by NC-SI driver.
8. Insert cable for ncsi0. Link status will not be updated.
9. Remove cable from ncsi2.
10. The ncsi3 channel is selected by NC-SI driver.
11. Remove cable from ncsi3.
12. The selected channel is ncsi3 without link.

cat /sys/kernel/debug/ncsi_protocol/ncsi_device_
IFIDX IFNAME NAME   PID CID RX TX MP MC WP WC PC CS PS LS RU CR NQ HA
=====================================================================
  2   eth2   ncsi0  000 000 0  0  0  0  1  1  0  1  0  0  1  1  0  1
  2   eth2   ncsi1  000 001 0  0  0  0  1  1  0  1  0  0  1  1  0  1
  2   eth2   ncsi2  001 000 0  0  0  0  1  1  0  1  0  0  1  1  0  1
  2   eth2   ncsi3  001 001 1  1  0  0  1  1  0  2  1  0  1  1  0  1
=====================================================================
MP: Multi-mode Package  WP: Whitelist Package
MC: Multi-mode Channel  WC: Whitelist Channel
PC: Primary Channel     CS: Channel State IA/A/IV 1/2/3
PS: Poll Status         LS: Link Status
RU: Running             CR: Carrier OK
NQ: Queue Stopped       HA: Hardware Arbitration

> > 
> > After applying this change, I notice that if there is no link available to BMC when BMC
> > starts, NC-SI can't properly configure channel once I plug in the Ethernet cable. 
> > 
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_aen_handler_lsc() - pkg 0 ch 0 state up
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_aen_handler_lsc() - had_link 0, has_link 1, chained 0
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 0
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_process_next_channel()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_process_next_channel() - pkg 0 ch 0 INVISIBLE
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_process_next_channel() - suspending pkg 0 ch 0
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0400 select
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0403 dc
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0404 deselect
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0405 done
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_rsp_handler_dp() - pkg 0 ch 0 INACTIVE
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_rsp_handler_dp() - pkg 0 ch 1 INACTIVE
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0406 deselect
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 INACTIVE
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_process_next_channel()
> > npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_process_next_channel() - No more channels to process
> > npcm7xx-emc f0825000.eth eth2: NCSI interface down
> 
> Good find, there was a corner case in the LSC AEN handler changes that
> led to this, I've fixed this in the patch as well. Thanks for testing!

This one is fixed.

> 
> 
> From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
> From: Samuel Mendoza-Jonas <sam@...dozajonas.com>
> Date: Fri, 9 Nov 2018 13:11:03 +1100
> Subject: [PATCH] net/ncsi: Reset state fixes, single-channel LSC
> 
> ---
>  net/ncsi/ncsi-aen.c    |  8 +++++---
>  net/ncsi/ncsi-manage.c | 19 +++++++++++++++----
>  2 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ncsi/ncsi-aen.c b/net/ncsi/ncsi-aen.c
> index 39c2e9eea2ba..034cb1dc5566 100644
> --- a/net/ncsi/ncsi-aen.c
> +++ b/net/ncsi/ncsi-aen.c
> @@ -93,14 +93,16 @@ static int ncsi_aen_handler_lsc(struct ncsi_dev_priv *ndp,
>  	if ((had_link == has_link) || chained)
>  		return 0;
>  
> -	if (!ndp->multi_package && !nc->package->multi_channel) {
> -		if (had_link)
> -			ndp->flags |= NCSI_DEV_RESHUFFLE;
> +	if (!ndp->multi_package && !nc->package->multi_channel && had_link) {
> +		ndp->flags |= NCSI_DEV_RESHUFFLE;
>  		ncsi_stop_channel_monitor(nc);
>  		spin_lock_irqsave(&ndp->lock, flags);
>  		list_add_tail_rcu(&nc->link, &ndp->channel_queue);
>  		spin_unlock_irqrestore(&ndp->lock, flags);
>  		return ncsi_process_next_channel(ndp);
> +	} else {
> +		/* Configured channel came up */
> +		return 0;
>  	}
>  
>  	if (had_link) {
> diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> index fa3c2144f5ba..92e59f07f9a7 100644
> --- a/net/ncsi/ncsi-manage.c
> +++ b/net/ncsi/ncsi-manage.c
> @@ -1063,17 +1063,17 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
>  	case ncsi_dev_state_config_done:
>  		netdev_dbg(ndp->ndev.dev, "NCSI: channel %u config done\n",
>  			   nc->id);
> +		spin_lock_irqsave(&nc->lock, flags);
> +		nc->state = NCSI_CHANNEL_ACTIVE;
> +
>  		if (ndp->flags & NCSI_DEV_RESET) {
>  			/* A reset event happened during config, start it now */
> -			spin_lock_irqsave(&nc->lock, flags);
>  			nc->reconfigure_needed = false;
>  			spin_unlock_irqrestore(&nc->lock, flags);
> -			nd->state = ncsi_dev_state_functional;
>  			ncsi_reset_dev(nd);
>  			break;
>  		}
>  
> -		spin_lock_irqsave(&nc->lock, flags);
>  		if (nc->reconfigure_needed) {
>  			/* This channel's configuration has been updated
>  			 * part-way during the config state - start the
> @@ -1092,7 +1092,6 @@ static void ncsi_configure_channel(struct ncsi_dev_priv *ndp)
>  			break;
>  		}
>  
> -		nc->state = NCSI_CHANNEL_ACTIVE;
>  		if (nc->modes[NCSI_MODE_LINK].data[2] & 0x1) {
>  			hot_nc = nc;
>  		} else {
> @@ -1803,6 +1802,18 @@ int ncsi_reset_dev(struct ncsi_dev *nd)
>  			spin_unlock_irqrestore(&ndp->lock, flags);
>  			return 0;
>  		}
> +	} else {
> +		switch (nd->state) {
> +		case ncsi_dev_state_suspend_done:
> +		case ncsi_dev_state_config_done:
> +		case ncsi_dev_state_functional:
> +			/* Ok */
> +			break;
> +		default:
> +			/* Current reset operation happening */
> +			spin_unlock_irqrestore(&ndp->lock, flags);
> +			return 0;
> +		}
>  	}
>  
>  	if (!list_empty(&ndp->channel_queue)) {
> -- 
> 2.19.1



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ