linux-kernel - Re: [PATCH 4/8] can: Driver for the SJA1000 CAN controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <499E794F.4000508@grandegger.com>
Date:	Fri, 20 Feb 2009 10:35:11 +0100
From:	Wolfgang Grandegger <wg@...ndegger.com>
To:	Jonathan Corbet <corbet@....net>
CC:	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Oliver Hartkopp <oliver.hartkopp@...kswagen.de>
Subject: Re: [PATCH 4/8] can: Driver for the SJA1000 CAN controller

Hi Jonathan,

Jonathan Corbet wrote:
> I won't be able to look at all of these...

OK, I will check the others for similar issues.

>> diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
>> index d609895..78a412b 100644
>> --- a/drivers/net/can/Kconfig
>> +++ b/drivers/net/can/Kconfig
>> @@ -35,6 +35,17 @@ config CAN_CALC_BITTIMING
>>  	  files "tq", "prop_seg", "phase_seg1", "phase_seg2" and "sjw".
>>  	  If unsure, say Y.
>>  
>> +config CAN_SJA1000
>> +	depends on CAN_DEV
>> +	tristate "Philips SJA1000"
>> +	---help---
>> +	  The SJA1000 is one of the top CAN controllers out there. As it
>> +	  has a multiplexed interface it fits directly to 8051
>> +	  microcontrollers or into the PC I/O port space. The SJA1000
>> +	  is a full CAN controller, with shadow registers for RX and TX.
>> +	  It can send and receive any kinds of CAN frames (SFF/EFF/RTR)
>> +	  with a single (simple) filter setup.
> 
> This sounds more like advertising text.  But what people need to know is
> whether they should enable it or not.

Yes,

  "Enables support for the SJA1000 CAN controller from Philips or NXP"

should be sufficient.

> [...]
> 
>> diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
>> new file mode 100644
>> index 0000000..6fe516d
>> --- /dev/null
>> +++ b/drivers/net/can/sja1000/sja1000.c
>> @@ -0,0 +1,681 @@
> 
> [...]
> 
>> +static void sja1000_start(struct net_device *dev)
>> +{
>> +	struct sja1000_priv *priv = netdev_priv(dev);
>> +
>> +	/* leave reset mode */
>> +	if (priv->can.state != CAN_STATE_STOPPED)
>> +		set_reset_mode(dev);
>> +
>> +	/* Clear error counters and error code capture */
>> +	priv->write_reg(dev, REG_TXERR, 0x0);
>> +	priv->write_reg(dev, REG_RXERR, 0x0);
>> +	priv->read_reg(dev, REG_ECC);
>> +
>> +	/* leave reset mode */
>> +	set_normal_mode(dev);
>> +}
> 
> It's about here that I begin to wonder about locking again.  What is
> preventing concurrent access to the device?

The device is usually stopped when this function is called but I will
check for corner cases due to the restart feature.

> [...]
> 
>> +static int sja1000_get_state(struct net_device *dev, enum can_state *state)
>> +{
>> +	struct sja1000_priv *priv = netdev_priv(dev);
>> +	u8 status;
>> +
>> +	/* FIXME: inspecting the status register to get the current state
>> +	 * is not really necessary, because state changes are handled by
>> +	 * in the ISR and the variable priv->can.state gets updated. The
>> +	 * CAN devicde interface needs fixing!
>> +	 */
>> +
>> +	spin_lock_irq(&priv->can.irq_lock);
> 
> Interesting, here we do have a lock.  What is it protecting?  *state??  It
> can't be the device registers, since they are accessed without locks in
> many other places.

This lock is indeed required to protect priv->can.irq_lock not be
changed by the ISR. But it should be a lock private for the SJA1000.

> 
>> +	if (priv->can.state == CAN_STATE_STOPPED) {
>> +		*state =  CAN_STATE_STOPPED;
>> +	} else {
>> +		status = priv->read_reg(dev, REG_SR);
>> +		if (status & SR_BS)
>> +			*state = CAN_STATE_BUS_OFF;
>> +		else if (status & SR_ES) {
>> +			if (priv->read_reg(dev, REG_TXERR) > 127 ||
>> +			    priv->read_reg(dev, REG_RXERR) > 127)
>> +				*state = CAN_STATE_BUS_PASSIVE;
>> +			else
>> +				*state = CAN_STATE_BUS_WARNING;
>> +		} else
>> +			*state = CAN_STATE_ACTIVE;
>> +	}
>> +	/* Check state */
>> +	if (*state != priv->can.state)
>> +		dev_err(ND2D(dev),
>> +			"Oops, state mismatch: hard %d != soft %d\n",
>> +			*state, priv->can.state);
>> +	spin_unlock_irq(&priv->can.irq_lock);
>> +	return 0;
>> +}
> 
> [...]
> 
>> +/*
>> + * transmit a CAN message
>> + * message layout in the sk_buff should be like this:
>> + * xx xx xx xx	 ff	 ll   00 11 22 33 44 55 66 77
>> + * [  can-id ] [flags] [len] [can data (up to 8 bytes]
>> + */
>> +static int sja1000_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> +{
>> +	struct sja1000_priv *priv = netdev_priv(dev);
>> +	struct net_device_stats *stats = &dev->stats;
>> +	struct can_frame *cf = (struct can_frame *)skb->data;
>> +	uint8_t fi;
>> +	uint8_t dlc;
>> +	canid_t id;
>> +	uint8_t dreg;
>> +	int i;
>> +
>> +	netif_stop_queue(dev);
>> +
>> +	fi = dlc = cf->can_dlc;
>> +	id = cf->can_id;
>> +
>> +	if (id & CAN_RTR_FLAG)
>> +		fi |= FI_RTR;
>> +
>> +	if (id & CAN_EFF_FLAG) {
>> +		fi |= FI_FF;
>> +		dreg = EFF_BUF;
>> +		priv->write_reg(dev, REG_FI, fi);
>> +		priv->write_reg(dev, REG_ID1, (id & 0x1fe00000) >> (5 + 16));
>> +		priv->write_reg(dev, REG_ID2, (id & 0x001fe000) >> (5 + 8));
>> +		priv->write_reg(dev, REG_ID3, (id & 0x00001fe0) >> 5);
>> +		priv->write_reg(dev, REG_ID4, (id & 0x0000001f) << 3);
>> +	} else {
>> +		dreg = SFF_BUF;
>> +		priv->write_reg(dev, REG_FI, fi);
>> +		priv->write_reg(dev, REG_ID1, (id & 0x000007f8) >> 3);
>> +		priv->write_reg(dev, REG_ID2, (id & 0x00000007) << 5);
>> +	}
>> +
>> +	for (i = 0; i < dlc; i++)
>> +		priv->write_reg(dev, dreg++, cf->data[i]);
>> +
>> +	stats->tx_bytes += dlc;
>> +	dev->trans_start = jiffies;
>> +
>> +	can_put_echo_skb(skb, dev, 0);
> 
> Hmm...looking back at can_put_echo_skb(), I see that it expects dev->priv
> to point to a struct can_priv.  Here, though, we see it pointing to a
> struct sja1000_prive instead.  I begin to suspect dangerous trickery going
> on behind our backs...

I see it coming...

>> +
>> +	priv->write_reg(dev, REG_CMR, CMD_TR);
>> +
>> +	return 0;
>> +}
> 
> [...]
> 
>> +irqreturn_t sja1000_interrupt(int irq, void *dev_id)
>> +{
>> +	struct net_device *dev = (struct net_device *)dev_id;
>> +	struct sja1000_priv *priv = netdev_priv(dev);
>> +	struct net_device_stats *stats = &dev->stats;
>> +	uint8_t isrc, status;
>> +	int n = 0;
>> +
>> +	/* Shared interrupts and IRQ off? */
>> +	if (priv->read_reg(dev, REG_IER) == IRQ_OFF)
>> +		return IRQ_NONE;
>> +
>> +	if (priv->pre_irq)
>> +		priv->pre_irq(dev);
>> +
>> +	while ((isrc = priv->read_reg(dev, REG_IR)) && (n < SJA1000_MAX_IRQ)) {
>> +		n++;
>> +		status = priv->read_reg(dev, REG_SR);
>> +
>> +		if (isrc & IRQ_WUI) {
>> +			/* wake-up interrupt */
>> +			priv->can.can_stats.wakeup++;
>> +		}
>> +		if (isrc & IRQ_TI) {
>> +			/* transmission complete interrupt */
>> +			stats->tx_packets++;
>> +			can_get_echo_skb(dev, 0);
>> +			netif_wake_queue(dev);
>> +		}
>> +		if (isrc & IRQ_RI) {
>> +			/* receive interrupt */
>> +			while (status & SR_RBS) {
>> +				sja1000_rx(dev);
>> +				status = priv->read_reg(dev, REG_SR);
>> +			}
>> +		}
>> +		if (isrc & (IRQ_DOI | IRQ_EI | IRQ_BEI | IRQ_EPI | IRQ_ALI)) {
>> +			/* error interrupt */
>> +			if (sja1000_err(dev, isrc, status))
>> +				break;
>> +		}
>> +	}
>> +
>> +	if (priv->post_irq)
>> +		priv->post_irq(dev);
>> +
>> +	if (n >= SJA1000_MAX_IRQ)
>> +		dev_dbg(ND2D(dev), "%d messages handled in ISR", n);
>> +
>> +	return (n) ? IRQ_HANDLED : IRQ_NONE;
>> +}
>> +EXPORT_SYMBOL_GPL(sja1000_interrupt);
> 
> You used spin_lock_irq(&irq_lock) above, but the interrupt handler doesn't
> take that lock?  So (above) you could acquire the lock while the interrupt
> handler is running?  I hate to keep asking this question, but...what does
> that lock protect?

That's wrong, indeed.

> [...]
> 
>> +static int sja1000_close(struct net_device *dev)
>> +{
>> +	struct sja1000_priv *priv = netdev_priv(dev);
>> +
>> +	set_reset_mode(dev);
>> +	netif_stop_queue(dev);
>> +	priv->open_time = 0;
>> +	can_close_cleanup(dev);
> 
> What happens if your device interrupts right here?  Maybe you should
> disconnect the handler earlier?

It will not interrupt here because set_reset_mode(dev) already disabled
the interrupts.

>> +	if (!(priv->flags & SJA1000_CUSTOM_IRQ_HANDLER))
>> +		free_irq(dev->irq, (void *)dev);
>> +
>> +	return 0;
>> +}
> 
> [...]
> 
>> +int register_sja1000dev(struct net_device *dev)
>> +{
>> +	struct sja1000_priv *priv = netdev_priv(dev);
>> +	int err;
>> +
>> +	if (!sja1000_probe_chip(dev))
>> +		return -ENODEV;
>> +
>> +	dev->flags |= IFF_ECHO;	/* we support local echo */
>> +
>> +	dev->netdev_ops = &sja1000_netdev_ops;
>> +
>> +	priv->can.bittiming_const = &sja1000_bittiming_const;
>> +	priv->can.do_set_bittiming = sja1000_set_bittiming;
>> +	priv->can.do_get_state = sja1000_get_state;
>> +	priv->can.do_set_mode = sja1000_set_mode;
>> +	priv->dev = dev;
>> +
>> +	err = register_candev(dev);
> 
> Here we've registered our device with the CAN and networking core...
> 
>> +	if (err) {
>> +		printk(KERN_INFO
>> +		       "%s: registering netdev failed\n", DRV_NAME);
>> +		free_netdev(dev);
>> +		return err;
>> +	}
>> +
>> +	set_reset_mode(dev);
>> +	chipset_init(dev);
> 
> ...but only here have we gotten it ready to operate.  If the higher levels
> decide to do something with your device in the mean time, will the right
> thing happen?

Right, these two lines must be moved before register_candev().

> 
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(register_sja1000dev);
> 
> [...]
> 
>> diff --git a/drivers/net/can/sja1000/sja1000.h b/drivers/net/can/sja1000/sja1000.h
>> new file mode 100644
>> index 0000000..60d4cd6
>> --- /dev/null
>> +++ b/drivers/net/can/sja1000/sja1000.h
> 
> [...]
> 
>> +/*
>> + * SJA1000 private data structure
>> + */
>> +struct sja1000_priv {
>> +	struct can_priv can;	/* must be the first member! */
> 
> AHA!  I knew it!
> 
> This kind of pointer trickery is fragile and dangerous, please don't do
> it.  Much better would be something like:
> 
> 	dev->priv = &dev_specific_priv->can;
> 
> Then the higher layers know they have a proper struct can_priv pointer.
> Then you can use container_of() at this level to get the outer structure
> pointer.  Much more robust and in line with normal kernel coding idiom.

Our approach allows a more elegant usage and is still used in the kernel
but I agree, it's more error-prone.

I will come up with a revised patch a.s.a.p.

Thanks.

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/