lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150125024300.GA6028@Darwish.PC>
Date:	Sun, 25 Jan 2015 04:43:00 +0200
From:	"Ahmed S. Darwish" <darwish.07@...il.com>
To:	Andri Yngvason <andri.yngvason@...el.com>
Cc:	Wolfgang Grandegger <wg@...ndegger.com>,
	Olivier Sobrie <olivier@...rie.be>,
	Oliver Hartkopp <socketcan@...tkopp.net>,
	Marc Kleine-Budde <mkl@...gutronix.de>,
	Linux-CAN <linux-can@...r.kernel.org>,
	netdev <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 2/5] can: kvaser_usb: Consolidate and unify state
 change handling

On Fri, Jan 23, 2015 at 10:32:13AM +0000, Andri Yngvason wrote:
> Quoting Ahmed S. Darwish (2015-01-23 06:07:34)
> > On Wed, Jan 21, 2015 at 05:13:45PM +0100, Wolfgang Grandegger wrote:
> > > On Wed, 21 Jan 2015 10:36:47 -0500, "Ahmed S. Darwish"
> > > <darwish.07@...il.com> wrote:
> > > > On Wed, Jan 21, 2015 at 03:00:15PM +0000, Andri Yngvason wrote:
> > > >> Quoting Ahmed S. Darwish (2015-01-21 14:43:23)
> > > >> > Hi!
> > > > 
> > > > ...
> > > > 
> > > >> > <-- Unplug the cable -->
> > > >> > 
> > > >> >  (000.009106)  can0  20000080   [8]  00 00 00 00 00 00 08 00  
> > > >> >  ERRORFRAME
> > > >> >         bus-error
> > > >> >         error-counter-tx-rx{{8}{0}}
> > > >> >  (000.001872)  can0  20000080   [8]  00 00 00 00 00 00 10 00  
> > > 
> > > For a bus-errors I would also expcect some more information in the
> > > data[2..3] fields. But these are always zero.
> > > 
> > 
> > M16C error factors made it possible to report things like
> > CAN_ERR_PROT_FORM/STUFF/BIT0/BIT1/TX in data[2], and
> > CAN_ERR_PROT_LOC_ACK/CRC_DEL in data[3].
> > 
> > Unfortunately such error factors are only reported in Leaf, but
> > not in USBCan-II due to the wire format change in the error event:
> > 
> >         struct leaf_msg_error_event {
> >                 u8 tid;
> >                 u8 flags;
> >                 __le16 time[3];
> >                 u8 channel;
> >                 u8 padding;
> >                 u8 tx_errors_count;
> >                 u8 rx_errors_count;
> >                 u8 status;
> >                 u8 error_factor;
> >         } __packed;
> > 
> >         struct usbcan_msg_error_event {
> >                 u8 tid;
> >                 u8 padding;
> >                 u8 tx_errors_count_ch0;
> >                 u8 rx_errors_count_ch0;
> >                 u8 tx_errors_count_ch1;
> >                 u8 rx_errors_count_ch1;
> >                 u8 status_ch0;
> >                 u8 status_ch1;
> >                 __le16 time;
> >         } __packed;
> > 
> > I speculate that the wire format was changed due to controller
> > bugs in the USBCan-II, which was slightly mentioned in their
> > data sheets here:
> > 
> >         http://www.kvaser.com/canlib-webhelp/page_hardware_specific_can_controllers.html
> > 
> > So it seems there's really no way for filling such bus error
> > info given the very limited amount of data exported :-(
> >
> We experienced similar problems with FlexCAN.

Hmm, I'll have a look there then...

Although my initial instincts imply that the FlexCAN driver has
access to the raw CAN registers, something I'm unable to do here.
But maybe there's some black magic I'm missing :-)

[...]

> > 
> > I've dumped _every_ message I receive from the firmware while
> > disconnecting the CAN bus, waiting a while, and connecting it again.
> > I really received _nothing_ from the firmware when the CAN bus was
> > reconnected and the data packets were flowing again. Not even a
> > single CHIP_STATE_EVENT, even after waiting for a long time.
> > 
> > So it's basically:
> > ...
> > ERR EVENT, txerr=128, rxerr=0
> > ERR EVENT, txerr=128, rxerr=0
> > ERR EVENT, txerr=128, rxerr=0
> > ...
> > 
> > then complete silence, except the data frames. I've even tried with
> > different versions of the firmware, but the same behaviour persisted.
> > 
> > > > So, What can the driver do given the above?
> > > 
> > > Little if the notification does not come.
> > > 
> > 
> > We can poll the state by sending CMD_GET_CHIP_STATE to the firmware,
> > and it will hopefully reply with a CHIP_STATE_EVENT response
> > containing the new txerr and rxerr values that we can use for
> > reverse state transitions.
> >
> > But do we _really_ want to go through the path? I feel that it will
> > open some cans of worms w.r.t. concurrent access to both the netdev
> > and USB stacks from a single driver.
> >
> Honestly, I don't know.
> >
> > A possible solution can be setting up a kernel thread that queries
> > for a CHIP_STATE_EVENT every second?
> > 
> Have you considered polling in kvaser_usb_tx_acknowledge? You could do something
> like:
> if(unlikely(dev->can.state != CAN_STATE_ERROR_ACTIVE))
> {
>     request_state();
> }
> 

OK, I have four important updates on this issue:

a) My initial testing was done on high-speed channel, at a bitrate
   of 50K. After setting the bus to a more reasonable bitrate 500K
   or 1M, I was _consistently_ able to receive CHIP_STATE_EVENTs
   when plugging the CAN connector again after an unplug.

b) The error counters on this device do not get reset on plugging
   after an unplug. I've setup a kernel thread [2] that queries
   the chip state event every second, and the error counters stays
   the same all the time. [1]

c) There's a single case when the erro counters do indeed get
   reversed, and it happens only when introducing some noise in
   the bus after the re-plug. In that case, the new error events
   get raised with new error counters starting from 0/1 again.

d) I've discovered a bug that forbids the CAN state from
   returning to ERROR_ACTIVE in case of the error counters
   numbers getting decreased. But independent from that bug, the
   verbose debugging messages clearly imply that we only get the
   error counters decreased in the case mentioned at `c)' above.

So from [1] and [2], it's now clear that the device do not reset
these counters back in the re-plug case. I'll give a check to
flexcan as advised, but unfortunately I don't really think there's
much I can do about this.

[1]

[  877.207082] CAN_ERROR_: channel=0, txerr=88, rxerr=0
[  877.207090] CAN_ERROR_: channel=0, txerr=136, rxerr=0
[  877.207094] CAN_ERROR_: channel=0, txerr=144, rxerr=0
[  877.207098] CAN_ERROR_: channel=0, txerr=152, rxerr=0
[  877.207100] CAN_ERROR_: channel=0, txerr=160, rxerr=0
[  877.207102] CAN_ERROR_: channel=0, txerr=168, rxerr=0
[  877.208075] CAN_ERROR_: channel=0, txerr=200, rxerr=0 

(( The above error event, staying the same at txerr=200 keeps
   flooding the bus until the CAN cable is re-plugged ))

[  878.225116] CHIP_STATE: channel=0, txerr=200, rxerr=0
[  878.225143] CHIP_STATE: channel=1, txerr=0, rxerr=0
[  879.265167] CHIP_STATE: channel=0, txerr=200, rxerr=0
[  879.267152] CHIP_STATE: channel=1, txerr=0, rxerr=0
[  879.265167] CHIP_STATE: channel=0, txerr=200, rxerr=0
[  879.267152] CHIP_STATE: channel=1, txerr=0, rxerr=0

(( The same counters get repeated every second ))

[2] State was polled using:

static int kvaser_usb_poll_chip_state(void *vpriv) {
      struct kvaser_usb_net_priv *priv = vpriv;

      while (!kthread_should_stop()) {
              kvaser_usb_simple_msg_async(priv, CMD_GET_CHIP_STATE);
              ssleep(1);
      }

      return 0;
}

> I don't think that anything beyond that would be worth pursuing.
> 

I agree, but given the new input, it seems that our problem
extends to the error counters themselves not getting decreased
on re-plug. So, even polling will not solve the issue: we'll
get the same txerr/rxerr values again and again :-(

> Best regards,
> Andri

Regards,
Darwish

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ