lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55914A4C.9030503@atmel.com>
Date:	Mon, 29 Jun 2015 15:38:20 +0200
From:	Nicolas Ferre <nicolas.ferre@...el.com>
To:	Cyrille Pitchen <cyrille.pitchen@...el.com>,
	<ludovic.desroches@...el.com>, <dan.j.williams@...el.com>,
	<vinod.koul@...el.com>, <dmaengine@...r.kernel.org>,
	<torfl6749@...il.com>, <jiri.prchal@...ignal.cz>
CC:	<linux-arm-kernel@...ts.infradead.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH linux-next v2 1/1] dmaengine: at_hdmac: fix residue computation

Le 18/06/2015 13:25, Cyrille Pitchen a écrit :
> As claimed by the programmer datasheet and confirmed by the IP designer,
> the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
> Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
> transfers.
> 
> Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
> register to compute the DMA residue. So the 'tx_width' field is useless
> and can be removed from the struct at_desc.
> 
> Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
> correctly initialized according to the SRC_WIDTH but 'tx_width' was always
> set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
> bad DMA residue when 'tx_width' != SRC_WIDTH.
> 
> Also the 'tx_width' field was mostly set only in the first and last
> descriptors. Depending on the kind of DMA transfer, this field remained
> uninitialized for intermediate descriptors. The accurate DMA residue was
> computed only when the currently processed descriptor was the first or the
> last of the chain. This algorithm was a little bit odd. An accurate DMA
> residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
> in the CTRLAx register.
> 
> Finally, the test to check whether the currently processed descriptor is
> the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
> is NOT equal to zero, since set_desc_eol() is never called, but logically
> equal to first_desc->txd.phys. This bug has a side effect on the
> drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
> to receive data. Since the DMA residue was wrong each time the DMA
> transfer reaches the second (and last) period of the transfer, no more
> data were received by the USART driver till the cyclic DMA transfer loops
> back to the first period.
> 
> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@...el.com>

Yes:
Acked-by: Nicolas Ferre <nicolas.ferre@...el.com>

It turns out that some information that I gave to Torsten for his
previous patch were wrong. Thank you Cyrille for having dig this issue
and proposed a fix.

Bye,

> ---
>  drivers/dma/at_hdmac.c      | 132 +++++++++++++++++++++++++++++---------------
>  drivers/dma/at_hdmac_regs.h |   3 +-
>  2 files changed, 88 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
> index 59892126..d3629b7 100644
> --- a/drivers/dma/at_hdmac.c
> +++ b/drivers/dma/at_hdmac.c
> @@ -48,6 +48,8 @@
>  	BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |\
>  	BIT(DMA_SLAVE_BUSWIDTH_4_BYTES))
>  
> +#define ATC_MAX_DSCR_TRIALS	10
> +
>  /*
>   * Initial number of descriptors to allocate for each channel. This could
>   * be increased during dma usage.
> @@ -285,28 +287,19 @@ static struct at_desc *atc_get_desc_by_cookie(struct at_dma_chan *atchan,
>   *
>   * @current_len: the number of bytes left before reading CTRLA
>   * @ctrla: the value of CTRLA
> - * @desc: the descriptor containing the transfer width
>   */
> -static inline int atc_calc_bytes_left(int current_len, u32 ctrla,
> -					struct at_desc *desc)
> +static inline int atc_calc_bytes_left(int current_len, u32 ctrla)
>  {
> -	return current_len - ((ctrla & ATC_BTSIZE_MAX) << desc->tx_width);
> -}
> +	u32 btsize = (ctrla & ATC_BTSIZE_MAX);
> +	u32 src_width = ATC_REG_TO_SRC_WIDTH(ctrla);
>  
> -/**
> - * atc_calc_bytes_left_from_reg - calculates the number of bytes left according
> - * to the current value of CTRLA.
> - *
> - * @current_len: the number of bytes left before reading CTRLA
> - * @atchan: the channel to read CTRLA for
> - * @desc: the descriptor containing the transfer width
> - */
> -static inline int atc_calc_bytes_left_from_reg(int current_len,
> -			struct at_dma_chan *atchan, struct at_desc *desc)
> -{
> -	u32 ctrla = channel_readl(atchan, CTRLA);
> -
> -	return atc_calc_bytes_left(current_len, ctrla, desc);
> +	/*
> +	 * According to the datasheet, when reading the Control A Register
> +	 * (ctrla), the Buffer Transfer Size (btsize) bitfield refers to the
> +	 * number of transfers completed on the Source Interface.
> +	 * So btsize is always a number of source width transfers.
> +	 */
> +	return current_len - (btsize << src_width);
>  }
>  
>  /**
> @@ -320,7 +313,7 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
>  	struct at_desc *desc_first = atc_first_active(atchan);
>  	struct at_desc *desc;
>  	int ret;
> -	u32 ctrla, dscr;
> +	u32 ctrla, dscr, trials;
>  
>  	/*
>  	 * If the cookie doesn't match to the currently running transfer then
> @@ -346,15 +339,82 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
>  		 * the channel's DSCR register and compare it against the value
>  		 * of the hardware linked list structure of each child
>  		 * descriptor.
> +		 *
> +		 * The CTRLA register provides us with the amount of data
> +		 * already read from the source for the current child
> +		 * descriptor. So we can compute a more accurate residue by also
> +		 * removing the number of bytes corresponding to this amount of
> +		 * data.
> +		 *
> +		 * However, the DSCR and CTRLA registers cannot be read both
> +		 * atomically. Hence a race condition may occur: the first read
> +		 * register may refer to one child descriptor whereas the second
> +		 * read may refer to a later child descriptor in the list
> +		 * because of the DMA transfer progression inbetween the two
> +		 * reads.
> +		 *
> +		 * One solution could have been to pause the DMA transfer, read
> +		 * the DSCR and CTRLA then resume the DMA transfer. Nonetheless,
> +		 * this approach presents some drawbacks:
> +		 * - If the DMA transfer is paused, RX overruns or TX underruns
> +		 *   are more likey to occur depending on the system latency.
> +		 *   Taking the USART driver as an example, it uses a cyclic DMA
> +		 *   transfer to read data from the Receive Holding Register
> +		 *   (RHR) to avoid RX overruns since the RHR is not protected
> +		 *   by any FIFO on most Atmel SoCs. So pausing the DMA transfer
> +		 *   to compute the residue would break the USART driver design.
> +		 * - The atc_pause() function masks interrupts but we'd rather
> +		 *   avoid to do so for system latency purpose.
> +		 *
> +		 * Then we'd rather use another solution: the DSCR is read a
> +		 * first time, the CTRLA is read in turn, next the DSCR is read
> +		 * a second time. If the two consecutive read values of the DSCR
> +		 * are the same then we assume both refers to the very same
> +		 * child descriptor as well as the CTRLA value read inbetween
> +		 * does. For cyclic tranfers, the assumption is that a full loop
> +		 * is "not so fast".
> +		 * If the two DSCR values are different, we read again the CTRLA
> +		 * then the DSCR till two consecutive read values from DSCR are
> +		 * equal or till the maxium trials is reach.
> +		 * This algorithm is very unlikely not to find a stable value for
> +		 * DSCR.
>  		 */
>  
> -		ctrla = channel_readl(atchan, CTRLA);
> -		rmb(); /* ensure CTRLA is read before DSCR */
>  		dscr = channel_readl(atchan, DSCR);
> +		rmb(); /* ensure DSCR is read before CTRLA */
> +		ctrla = channel_readl(atchan, CTRLA);
> +		for (trials = 0; trials < ATC_MAX_DSCR_TRIALS; ++trials) {
> +			u32 new_dscr;
> +
> +			rmb(); /* ensure DSCR is read after CTRLA */
> +			new_dscr = channel_readl(atchan, DSCR);
> +
> +			/*
> +			 * If the DSCR register value has not changed inside the
> +			 * DMA controller since the previous read, we assume
> +			 * that both the dscr and ctrla values refers to the
> +			 * very same descriptor.
> +			 */
> +			if (likely(new_dscr == dscr))
> +				break;
> +
> +			/*
> +			 * DSCR has changed inside the DMA controller, so the
> +			 * previouly read value of CTRLA may refer to an already
> +			 * processed descriptor hence could be outdated.
> +			 * We need to update ctrla to match the current
> +			 * descriptor.
> +			 */
> +			dscr = new_dscr;
> +			rmb(); /* ensure DSCR is read before CTRLA */
> +			ctrla = channel_readl(atchan, CTRLA);
> +		}
> +		if (unlikely(trials >= ATC_MAX_DSCR_TRIALS))
> +			return -ETIMEDOUT;
>  
>  		/* for the first descriptor we can be more accurate */
>  		if (desc_first->lli.dscr == dscr)
> -			return atc_calc_bytes_left(ret, ctrla, desc_first);
> +			return atc_calc_bytes_left(ret, ctrla);
>  
>  		ret -= desc_first->len;
>  		list_for_each_entry(desc, &desc_first->tx_list, desc_node) {
> @@ -365,16 +425,14 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
>  		}
>  
>  		/*
> -		 * For the last descriptor in the chain we can calculate
> +		 * For the current descriptor in the chain we can calculate
>  		 * the remaining bytes using the channel's register.
> -		 * Note that the transfer width of the first and last
> -		 * descriptor may differ.
>  		 */
> -		if (!desc->lli.dscr)
> -			ret = atc_calc_bytes_left_from_reg(ret, atchan, desc);
> +		ret = atc_calc_bytes_left(ret, ctrla);
>  	} else {
>  		/* single transfer */
> -		ret = atc_calc_bytes_left_from_reg(ret, atchan, desc_first);
> +		ctrla = channel_readl(atchan, CTRLA);
> +		ret = atc_calc_bytes_left(ret, ctrla);
>  	}
>  
>  	return ret;
> @@ -726,7 +784,6 @@ atc_prep_dma_interleaved(struct dma_chan *chan,
>  
>  	desc->txd.cookie = -EBUSY;
>  	desc->total_len = desc->len = len;
> -	desc->tx_width = dwidth;
>  
>  	/* set end-of-link to the last link descriptor of list*/
>  	set_desc_eol(desc);
> @@ -804,10 +861,6 @@ atc_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dest, dma_addr_t src,
>  	first->txd.cookie = -EBUSY;
>  	first->total_len = len;
>  
> -	/* set transfer width for the calculation of the residue */
> -	first->tx_width = src_width;
> -	prev->tx_width = src_width;
> -
>  	/* set end-of-link to the last link descriptor of list*/
>  	set_desc_eol(desc);
>  
> @@ -956,10 +1009,6 @@ atc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
>  	first->txd.cookie = -EBUSY;
>  	first->total_len = total_len;
>  
> -	/* set transfer width for the calculation of the residue */
> -	first->tx_width = reg_width;
> -	prev->tx_width = reg_width;
> -
>  	/* first link descriptor of list is responsible of flags */
>  	first->txd.flags = flags; /* client is in control of this ack */
>  
> @@ -1077,12 +1126,6 @@ atc_prep_dma_sg(struct dma_chan *chan,
>  		desc->txd.cookie = 0;
>  		desc->len = len;
>  
> -		/*
> -		 * Although we only need the transfer width for the first and
> -		 * the last descriptor, its easier to set it to all descriptors.
> -		 */
> -		desc->tx_width = src_width;
> -
>  		atc_desc_chain(&first, &prev, desc);
>  
>  		/* update the lengths and addresses for the next loop cycle */
> @@ -1256,7 +1299,6 @@ atc_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
>  	/* First descriptor of the chain embedds additional information */
>  	first->txd.cookie = -EBUSY;
>  	first->total_len = buf_len;
> -	first->tx_width = reg_width;
>  
>  	return &first->txd;
>  
> diff --git a/drivers/dma/at_hdmac_regs.h b/drivers/dma/at_hdmac_regs.h
> index bc8d5eb..7f5a082 100644
> --- a/drivers/dma/at_hdmac_regs.h
> +++ b/drivers/dma/at_hdmac_regs.h
> @@ -112,6 +112,7 @@
>  #define		ATC_SRC_WIDTH_BYTE	(0x0 << 24)
>  #define		ATC_SRC_WIDTH_HALFWORD	(0x1 << 24)
>  #define		ATC_SRC_WIDTH_WORD	(0x2 << 24)
> +#define		ATC_REG_TO_SRC_WIDTH(r)	(((r) >> 24) & 0x3)
>  #define	ATC_DST_WIDTH_MASK	(0x3 << 28)	/* Destination Single Transfer Size */
>  #define		ATC_DST_WIDTH(x)	((x) << 28)
>  #define		ATC_DST_WIDTH_BYTE	(0x0 << 28)
> @@ -182,7 +183,6 @@ struct at_lli {
>   * @txd: support for the async_tx api
>   * @desc_node: node on the channed descriptors list
>   * @len: descriptor byte count
> - * @tx_width: transfer width
>   * @total_len: total transaction byte count
>   */
>  struct at_desc {
> @@ -194,7 +194,6 @@ struct at_desc {
>  	struct dma_async_tx_descriptor	txd;
>  	struct list_head		desc_node;
>  	size_t				len;
> -	u32				tx_width;
>  	size_t				total_len;
>  
>  	/* Interleaved data */
> 


-- 
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ