linux-kernel - Re: [PATCH V8 3/5] i2c: tegra: Add DMA Support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190201035249.5b1cdfe2@dimatab>
Date:   Fri, 1 Feb 2019 03:52:49 +0300
From:   Dmitry Osipenko <digetx@...il.com>
To:     Thierry Reding <thierry.reding@...il.com>
Cc:     Sowjanya Komatineni <skomatineni@...dia.com>, jonathanh@...dia.com,
        mkarthik@...dia.com, smohammed@...dia.com, talho@...dia.com,
        linux-tegra@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-i2c@...r.kernel.org
Subject: Re: [PATCH V8 3/5] i2c: tegra: Add DMA Support

В Thu, 31 Jan 2019 13:44:23 +0100
Thierry Reding <thierry.reding@...il.com> пишет:

> On Wed, Jan 30, 2019 at 10:16:25PM -0800, Sowjanya Komatineni wrote:
> > This patch adds DMA support for Tegra I2C.
> > 
> > Tegra I2C TX and RX FIFO depth is 8 words. PIO mode is used for
> > transfer size of the max FIFO depth and DMA mode is used for
> > transfer size higher than max FIFO depth to save CPU overhead.
> > 
> > PIO mode needs full intervention of CPU to fill or empty FIFO's
> > and also need to service multiple data requests interrupt for the
> > same transaction. This adds delay between data bytes of the same
> > transfer when CPU is fully loaded and some slave devices has
> > internal timeout for no bus activity and stops transaction to
> > avoid bus hang. DMA mode is helpful in such cases.
> > 
> > DMA mode is also helpful for Large transfers during downloading or
> > uploading FW over I2C to some external devices.
> > 
> > Signed-off-by: Sowjanya Komatineni <skomatineni@...dia.com>
> > ---
> >  [V8] : Moved back dma init to i2c probe, removed
> > ALL_PACKETS_XFER_COMPLETE interrupt and using PACKETS_XFER_COMPLETE
> > interrupt only and some other fixes
> > 	Updated Kconfig for APB_DMA dependency
> >  [V7] : Same as V6
> >  [V6] : Updated for proper buffer allocation/freeing, channel
> > release. Updated to use exact xfer size for syncing dma buffer.
> >  [V5] : Same as V4
> >  [V4] : Updated to allocate DMA buffer only when DMA mode.
> > 	Updated to fall back to PIO mode when DMA channel request or
> > 	buffer allocation fails.
> >  [V3] : Updated without additional buffer allocation.
> >  [V2] : Updated based on V1 review feedback along with code cleanup
> > for proper implementation of DMA.
> > 
> >  drivers/i2c/busses/Kconfig     |   2 +-
> >  drivers/i2c/busses/i2c-tegra.c | 362
> > ++++++++++++++++++++++++++++++++++++++--- 2 files changed, 339
> > insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
> > index f2c681971201..046aeb92a467 100644
> > --- a/drivers/i2c/busses/Kconfig
> > +++ b/drivers/i2c/busses/Kconfig
> > @@ -1016,7 +1016,7 @@ config I2C_SYNQUACER
> >  
> >  config I2C_TEGRA
> >  	tristate "NVIDIA Tegra internal I2C controller"
> > -	depends on ARCH_TEGRA
> > +	depends on (ARCH_TEGRA && TEGRA20_APB_DMA)  
> 
> Like I said in my reply in the v7 subthread, I don't think we want
> this. The dependency that we have is on the DMA engine API, not the
> APB DMA driver.
> 
> Technically there could be a runtime problem if the APB DMA driver is
> disabled and we list a "dmas" property. If I understand correctly, the
> DMA engine API would always return -EPROBE_DEFER in that case. That's
> somewhat annoying, but I think that's fine because it points at an
> integration issue. It lets you know that the driver is relying on a
> resources that is not showing up, which usually means that either the
> provider's driver is not enabled or the provider is failing to probe.
> 
> >  	help
> >  	  If you say yes to this option, support will be included
> > for the I2C controller embedded in NVIDIA Tegra SOCs
> > diff --git a/drivers/i2c/busses/i2c-tegra.c
> > b/drivers/i2c/busses/i2c-tegra.c index c4892a47a483..025d63972e50
> > 100644 --- a/drivers/i2c/busses/i2c-tegra.c
> > +++ b/drivers/i2c/busses/i2c-tegra.c
> > @@ -8,6 +8,9 @@
> >  
> >  #include <linux/clk.h>
> >  #include <linux/delay.h>
> > +#include <linux/dmaengine.h>
> > +#include <linux/dmapool.h>
> > +#include <linux/dma-mapping.h>
> >  #include <linux/err.h>
> >  #include <linux/i2c.h>
> >  #include <linux/init.h>
> > @@ -44,6 +47,8 @@
> >  #define I2C_FIFO_CONTROL_RX_FLUSH		BIT(0)
> >  #define I2C_FIFO_CONTROL_TX_TRIG_SHIFT		5
> >  #define I2C_FIFO_CONTROL_RX_TRIG_SHIFT		2
> > +#define I2C_FIFO_CONTROL_TX_TRIG(x)		(((x) - 1) << 5)
> > +#define I2C_FIFO_CONTROL_RX_TRIG(x)		(((x) - 1) << 2)
> >  #define I2C_FIFO_STATUS				0x060
> >  #define I2C_FIFO_STATUS_TX_MASK			0xF0
> >  #define I2C_FIFO_STATUS_TX_SHIFT		4
> > @@ -125,6 +130,19 @@
> >  #define I2C_MST_FIFO_STATUS_TX_MASK		0xff0000
> >  #define I2C_MST_FIFO_STATUS_TX_SHIFT		16
> >  
> > +/* Packet header size in bytes */
> > +#define I2C_PACKET_HEADER_SIZE			12
> > +
> > +#define DATA_DMA_DIR_TX				(1 << 0)
> > +#define DATA_DMA_DIR_RX				(1 << 1)
> > +
> > +/*
> > + * Upto I2C_PIO_MODE_MAX_LEN bytes, controller will use PIO mode,
> > + * above this, controller will use DMA to fill FIFO.
> > + * MAX PIO len is 20 bytes excluding packet header.
> > + */
> > +#define I2C_PIO_MODE_MAX_LEN			32
> > +
> >  /*
> >   * msg_end_type: The bus control which need to be send at end of
> > transfer.
> >   * @MSG_END_STOP: Send stop pulse at end of transfer.
> > @@ -188,6 +206,7 @@ struct tegra_i2c_hw_feature {
> >   * @fast_clk: clock reference for fast clock of I2C controller
> >   * @rst: reset control for the I2C controller
> >   * @base: ioremapped registers cookie
> > + * @base_phys: Physical base address of the I2C controller
> >   * @cont_id: I2C controller ID, used for packet header
> >   * @irq: IRQ number of transfer complete interrupt
> >   * @irq_disabled: used to track whether or not the interrupt is
> > enabled @@ -201,6 +220,14 @@ struct tegra_i2c_hw_feature {
> >   * @clk_divisor_non_hs_mode: clock divider for non-high-speed modes
> >   * @is_multimaster_mode: track if I2C controller is in
> > multi-master mode
> >   * @xfer_lock: lock to serialize transfer submission and processing
> > + * @has_dma: indicates if DMA can be utilized based on dma DT
> > bindings  
> 
> I don't think we need this. We can just rely on the DMA engine API to
> tell us if the "dmas" property isn't there.
> 
> > + * @tx_dma_chan: DMA transmit channel
> > + * @rx_dma_chan: DMA receive channel
> > + * @dma_phys: handle to DMA resources
> > + * @dma_buf: pointer to allocated DMA buffer
> > + * @dma_buf_size: DMA buffer size
> > + * @is_curr_dma_xfer: indicates active DMA transfer
> > + * @dma_complete: DMA completion notifier
> >   */
> >  struct tegra_i2c_dev {
> >  	struct device *dev;
> > @@ -210,6 +237,7 @@ struct tegra_i2c_dev {
> >  	struct clk *fast_clk;
> >  	struct reset_control *rst;
> >  	void __iomem *base;
> > +	phys_addr_t base_phys;
> >  	int cont_id;
> >  	int irq;
> >  	bool irq_disabled;
> > @@ -223,6 +251,14 @@ struct tegra_i2c_dev {
> >  	u16 clk_divisor_non_hs_mode;
> >  	bool is_multimaster_mode;
> >  	spinlock_t xfer_lock;
> > +	bool has_dma;
> > +	struct dma_chan *tx_dma_chan;
> > +	struct dma_chan *rx_dma_chan;
> > +	dma_addr_t dma_phys;
> > +	u32 *dma_buf;
> > +	unsigned int dma_buf_size;
> > +	bool is_curr_dma_xfer;
> > +	struct completion dma_complete;
> >  };
> >  
> >  static void dvc_writel(struct tegra_i2c_dev *i2c_dev, u32 val,
> > @@ -291,6 +327,85 @@ static void tegra_i2c_unmask_irq(struct
> > tegra_i2c_dev *i2c_dev, u32 mask) i2c_writel(i2c_dev, int_mask,
> > I2C_INT_MASK); }
> >  
> > +static void tegra_i2c_dma_complete(void *args)
> > +{
> > +	struct tegra_i2c_dev *i2c_dev = args;
> > +
> > +	complete(&i2c_dev->dma_complete);
> > +}
> > +
> > +static int tegra_i2c_dma_submit(struct tegra_i2c_dev *i2c_dev,
> > size_t len) +{
> > +	struct dma_async_tx_descriptor *dma_desc;
> > +	enum dma_transfer_direction dir;
> > +	struct dma_chan *chan;
> > +
> > +	dev_dbg(i2c_dev->dev, "starting DMA for length: %zu\n",
> > len);
> > +	reinit_completion(&i2c_dev->dma_complete);
> > +	dir = i2c_dev->msg_read ? DMA_DEV_TO_MEM : DMA_MEM_TO_DEV;
> > +	chan = i2c_dev->msg_read ? i2c_dev->rx_dma_chan :
> > i2c_dev->tx_dma_chan;
> > +	dma_desc = dmaengine_prep_slave_single(chan,
> > i2c_dev->dma_phys,
> > +					       len, dir,
> > DMA_PREP_INTERRUPT |
> > +					       DMA_CTRL_ACK);
> > +	if (!dma_desc) {
> > +		dev_err(i2c_dev->dev, "failed to get DMA
> > descriptor\n");
> > +		return -EIO;
> > +	}
> > +
> > +	dma_desc->callback = tegra_i2c_dma_complete;
> > +	dma_desc->callback_param = i2c_dev;
> > +	dmaengine_submit(dma_desc);
> > +	dma_async_issue_pending(chan);
> > +	return 0;
> > +}
> > +
> > +static int tegra_i2c_init_dma_param(struct tegra_i2c_dev *i2c_dev)
> > +{
> > +	struct dma_chan *dma_chan;
> > +	u32 *dma_buf;
> > +	dma_addr_t dma_phys;
> > +
> > +	if (!i2c_dev->has_dma)
> > +		return -EINVAL;
> > +
> > +	if (!i2c_dev->rx_dma_chan) {
> > +		dma_chan =
> > dma_request_slave_channel_reason(i2c_dev->dev, "rx");
> > +		if (IS_ERR(dma_chan))
> > +			return PTR_ERR(dma_chan);  
> 
> I think we want to fallback to PIO here if dma_chan is -ENODEV.
> 
> > +
> > +		i2c_dev->rx_dma_chan = dma_chan;
> > +	}
> > +
> > +	if (!i2c_dev->tx_dma_chan) {
> > +		dma_chan =
> > dma_request_slave_channel_reason(i2c_dev->dev, "tx");
> > +		if (IS_ERR(dma_chan))
> > +			return PTR_ERR(dma_chan);  
> 
> Same here. We could use rx_dma_chan == NULL as a condition to detect
> that instead of the extra has_dma.
> 
> > +		i2c_dev->tx_dma_chan = dma_chan;
> > +	}  
> 
> Although, I'm not exactly sure I understand what you're trying to
> achieve here. Shouldn't we move the channel request parts into probe
> and remove them from here? Otherwise it seems like we could get into
> a state where we keep trying to get the slave channels everytime we
> set up a DMA transfer, even if we already failed to do so during
> probe.
> 
> > +
> > +	if (!i2c_dev->dma_buf && i2c_dev->msg_buf_remaining) {
> > +		dma_buf = dma_alloc_coherent(i2c_dev->dev,
> > +					     i2c_dev->dma_buf_size,
> > +					     &dma_phys,
> > GFP_KERNEL); +
> > +		if (!dma_buf) {
> > +			dev_err(i2c_dev->dev,
> > +				"failed to allocate the DMA
> > buffer\n");
> > +			dma_release_channel(i2c_dev->tx_dma_chan);
> > +			dma_release_channel(i2c_dev->rx_dma_chan);
> > +			i2c_dev->tx_dma_chan = NULL;
> > +			i2c_dev->rx_dma_chan = NULL;
> > +			return -ENOMEM;
> > +		}
> > +
> > +		i2c_dev->dma_buf = dma_buf;
> > +		i2c_dev->dma_phys = dma_phys;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> >  static int tegra_i2c_flush_fifos(struct tegra_i2c_dev *i2c_dev)
> >  {
> >  	unsigned long timeout = jiffies + HZ;
> > @@ -656,25 +771,38 @@ static irqreturn_t tegra_i2c_isr(int irq,
> > void *dev_id) if (i2c_dev->hw->supports_bus_clear && (status &
> > I2C_INT_BUS_CLR_DONE)) goto err;
> >  
> > -	if (i2c_dev->msg_read && (status &
> > I2C_INT_RX_FIFO_DATA_REQ)) {
> > -		if (i2c_dev->msg_buf_remaining)
> > -			tegra_i2c_empty_rx_fifo(i2c_dev);
> > -		else
> > -			BUG();
> > -	}
> > +	if (!i2c_dev->is_curr_dma_xfer) {
> > +		if (i2c_dev->msg_read && (status &
> > I2C_INT_RX_FIFO_DATA_REQ)) {
> > +			if (i2c_dev->msg_buf_remaining)
> > +				tegra_i2c_empty_rx_fifo(i2c_dev);
> > +			else
> > +				BUG();
> > +		}
> >  
> > -	if (!i2c_dev->msg_read && (status &
> > I2C_INT_TX_FIFO_DATA_REQ)) {
> > -		if (i2c_dev->msg_buf_remaining)
> > -			tegra_i2c_fill_tx_fifo(i2c_dev);
> > -		else
> > -			tegra_i2c_mask_irq(i2c_dev,
> > I2C_INT_TX_FIFO_DATA_REQ);
> > +		if (!i2c_dev->msg_read &&
> > +		   (status & I2C_INT_TX_FIFO_DATA_REQ)) {
> > +			if (i2c_dev->msg_buf_remaining)
> > +				tegra_i2c_fill_tx_fifo(i2c_dev);
> > +			else
> > +				tegra_i2c_mask_irq(i2c_dev,
> > +
> > I2C_INT_TX_FIFO_DATA_REQ);
> > +		}
> >  	}
> >  
> >  	i2c_writel(i2c_dev, status, I2C_INT_STATUS);
> >  	if (i2c_dev->is_dvc)
> >  		dvc_writel(i2c_dev, DVC_STATUS_I2C_DONE_INTR,
> > DVC_STATUS); 
> > +	/*
> > +	 * During message read XFER_COMPLETE interrupt is
> > triggered prior to
> > +	 * DMA completion and during message write XFER_COMPLETE
> > interrupt is
> > +	 * triggered after DMA completion.
> > +	 * PACKETS_XFER_COMPLETE indicates completion of all bytes
> > of transfer.
> > +	 * so forcing msg_buf_remaining to 0 in DMA mode.
> > +	 */
> >  	if (status & I2C_INT_PACKET_XFER_COMPLETE) {
> > +		if (i2c_dev->is_curr_dma_xfer)
> > +			i2c_dev->msg_buf_remaining = 0;
> >  		BUG_ON(i2c_dev->msg_buf_remaining);
> >  		complete(&i2c_dev->msg_complete);
> >  	}
> > @@ -690,12 +818,69 @@ static irqreturn_t tegra_i2c_isr(int irq,
> > void *dev_id) if (i2c_dev->is_dvc)
> >  		dvc_writel(i2c_dev, DVC_STATUS_I2C_DONE_INTR,
> > DVC_STATUS); 
> > +	if (i2c_dev->is_curr_dma_xfer) {
> > +		if (i2c_dev->msg_read)
> > +
> > dmaengine_terminate_all(i2c_dev->rx_dma_chan);
> > +		else
> > +
> > dmaengine_terminate_all(i2c_dev->tx_dma_chan); +
> > +		complete(&i2c_dev->dma_complete);
> > +	}
> > +
> >  	complete(&i2c_dev->msg_complete);
> >  done:
> >  	spin_unlock(&i2c_dev->xfer_lock);
> >  	return IRQ_HANDLED;
> >  }
> >  
> > +static void tegra_i2c_config_fifo_trig(struct tegra_i2c_dev
> > *i2c_dev,
> > +				       size_t len, int direction)
> > +{
> > +	u32 val, reg;
> > +	u8 dma_burst = 0;
> > +	struct dma_slave_config dma_sconfig;
> > +	struct dma_chan *chan;
> > +
> > +	if (i2c_dev->hw->has_mst_fifo)
> > +		reg = I2C_MST_FIFO_CONTROL;
> > +	else
> > +		reg = I2C_FIFO_CONTROL;
> > +	val = i2c_readl(i2c_dev, reg);
> > +
> > +	if (len & 0xF)
> > +		dma_burst = 1;
> > +	else if (len & 0x10)
> > +		dma_burst = 4;
> > +	else
> > +		dma_burst = 8;
> > +
> > +	if (direction == DATA_DMA_DIR_TX) {
> > +		if (i2c_dev->hw->has_mst_fifo)
> > +			val |=
> > I2C_MST_FIFO_CONTROL_TX_TRIG(dma_burst);
> > +		else
> > +			val |= I2C_FIFO_CONTROL_TX_TRIG(dma_burst);
> > +	} else {
> > +		if (i2c_dev->hw->has_mst_fifo)
> > +			val |=
> > I2C_MST_FIFO_CONTROL_RX_TRIG(dma_burst);
> > +		else
> > +			val |= I2C_FIFO_CONTROL_RX_TRIG(dma_burst);
> > +	}
> > +	i2c_writel(i2c_dev, val, reg);
> > +
> > +	if (direction == DATA_DMA_DIR_TX) {
> > +		dma_sconfig.dst_addr = i2c_dev->base_phys +
> > I2C_TX_FIFO;
> > +		dma_sconfig.dst_addr_width =
> > DMA_SLAVE_BUSWIDTH_4_BYTES;
> > +		dma_sconfig.dst_maxburst = dma_burst;
> > +	} else {
> > +		dma_sconfig.src_addr = i2c_dev->base_phys +
> > I2C_RX_FIFO;
> > +		dma_sconfig.src_addr_width =
> > DMA_SLAVE_BUSWIDTH_4_BYTES;
> > +		dma_sconfig.src_maxburst = dma_burst;
> > +	}
> > +
> > +	chan = i2c_dev->msg_read ? i2c_dev->rx_dma_chan :
> > i2c_dev->tx_dma_chan;
> > +	dmaengine_slave_config(chan, &dma_sconfig);
> > +}
> > +
> >  static int tegra_i2c_issue_bus_clear(struct tegra_i2c_dev *i2c_dev)
> >  {
> >  	int err;
> > @@ -740,6 +925,11 @@ static int tegra_i2c_xfer_msg(struct
> > tegra_i2c_dev *i2c_dev, u32 int_mask;
> >  	unsigned long time_left;
> >  	unsigned long flags;
> > +	size_t xfer_size;
> > +	u32 *buffer = 0;  
> 
> Usually this should be = NULL for pointers.
> 
> > +	int err = 0;
> > +	bool dma = false;
> > +	struct dma_chan *chan;
> >  
> >  	tegra_i2c_flush_fifos(i2c_dev);
> >  
> > @@ -749,19 +939,69 @@ static int tegra_i2c_xfer_msg(struct
> > tegra_i2c_dev *i2c_dev, i2c_dev->msg_read = (msg->flags & I2C_M_RD);
> >  	reinit_completion(&i2c_dev->msg_complete);
> >  
> > +	if (i2c_dev->msg_read)
> > +		xfer_size = msg->len;
> > +	else
> > +		xfer_size = msg->len + I2C_PACKET_HEADER_SIZE;
> > +
> > +	xfer_size = ALIGN(xfer_size, BYTES_PER_FIFO_WORD);
> > +	dma = (xfer_size > I2C_PIO_MODE_MAX_LEN);
> > +	if (dma) {
> > +		err = tegra_i2c_init_dma_param(i2c_dev);
> > +		if (err < 0) {
> > +			dev_dbg(i2c_dev->dev, "switching to PIO
> > transfer\n");
> > +			dma = false;
> > +		}  
> 
> If we successfully got DMA channels at probe time, doesn't this turn
> into an error condition that is worth reporting? It seems to me like
> the only reason it could fail is if we fail the allocation, but then
> again, why don't we move the DMA buffer allocation into probe()? We
> already use a fixed size for that allocation, so there's no reason it
> couldn't be allocated at probe time.
> 
> Seems like maybe you just overlooked that as you were moving around
> the code pieces.
> 
> > +	}
> > +
> > +	i2c_dev->is_curr_dma_xfer = dma;
> >  	spin_lock_irqsave(&i2c_dev->xfer_lock, flags);
> >  
> >  	int_mask = I2C_INT_NO_ACK | I2C_INT_ARBITRATION_LOST;
> >  	tegra_i2c_unmask_irq(i2c_dev, int_mask);
> >  
> > +	if (dma) {
> > +		if (i2c_dev->msg_read) {
> > +			chan = i2c_dev->rx_dma_chan;
> > +			tegra_i2c_config_fifo_trig(i2c_dev,
> > xfer_size,
> > +
> > DATA_DMA_DIR_RX);
> > +			dma_sync_single_for_device(i2c_dev->dev,
> > +
> > i2c_dev->dma_phys,
> > +						   xfer_size,
> > +
> > DMA_FROM_DEVICE);  
> 
> Do we really need this? We're not actually passing the device any
> data, so no caches to flush here. I we're cautious about flushing
> caches when we do write to the buffer (and I think we do that
> properly already), then there should be no need to do it here again.
> 

IIUC, DMA API has a concept of buffer handing which tells to use
dma_sync_single_for_device() before issuing hardware job that touches
the buffer and to use dma_sync_single_for_cpu() after hardware done the
execution. In fact the CPU caches are getting flushed or invalidated as
appropriate in a result.

dma_sync_single_for_device(DMA_FROM_DEVICE) invalidates buffer in the
CPU cache, probably to avoid CPU evicting data from cache to
DRAM while hardware writes to the buffer. Hence this hunk is correct.

> > +			err = tegra_i2c_dma_submit(i2c_dev,
> > xfer_size);
> > +			if (err < 0) {
> > +				dev_err(i2c_dev->dev,
> > +					"starting RX DMA failed,
> > err %d\n",
> > +					err);
> > +				goto unlock;
> > +			}
> > +		} else {
> > +			chan = i2c_dev->tx_dma_chan;
> > +			tegra_i2c_config_fifo_trig(i2c_dev,
> > xfer_size,
> > +
> > DATA_DMA_DIR_TX);
> > +			dma_sync_single_for_cpu(i2c_dev->dev,
> > +						i2c_dev->dma_phys,
> > +						xfer_size,
> > +						DMA_TO_DEVICE);  
> 
> This, on the other hand seems correct because we need to invalidate
> the caches for this buffer to make sure the data that we put there
> doesn't get overwritten.

As I stated before in a comment to v6, this particular case of
dma_sync_single_for_cpu() usage is incorrect because CPU should take
ownership of the buffer after completion of hardwate job. But in fact
dma_sync_single_for_cpu(DMA_TO_DEVICE) is a NO-OP because CPU doesn't
need to flush or invalidate anything to take ownership of the buffer if
hardware did a read-only access.

> 
> > +			buffer = i2c_dev->dma_buf;
> > +		}
> > +	}
> > +
> >  	packet_header = (0 << PACKET_HEADER0_HEADER_SIZE_SHIFT) |
> >  			PACKET_HEADER0_PROTOCOL_I2C |
> >  			(i2c_dev->cont_id <<
> > PACKET_HEADER0_CONT_ID_SHIFT) | (1 <<
> > PACKET_HEADER0_PACKET_ID_SHIFT);
> > -	i2c_writel(i2c_dev, packet_header, I2C_TX_FIFO);
> > +	if (dma && !i2c_dev->msg_read)
> > +		*buffer++ = packet_header;
> > +	else
> > +		i2c_writel(i2c_dev, packet_header, I2C_TX_FIFO);
> >  
> >  	packet_header = msg->len - 1;
> > -	i2c_writel(i2c_dev, packet_header, I2C_TX_FIFO);
> > +	if (dma && !i2c_dev->msg_read)
> > +		*buffer++ = packet_header;
> > +	else
> > +		i2c_writel(i2c_dev, packet_header, I2C_TX_FIFO);
> >  
> >  	packet_header = I2C_HEADER_IE_ENABLE;
> >  	if (end_state == MSG_END_CONTINUE)
> > @@ -778,30 +1018,79 @@ static int tegra_i2c_xfer_msg(struct
> > tegra_i2c_dev *i2c_dev, packet_header |= I2C_HEADER_CONT_ON_NAK;
> >  	if (msg->flags & I2C_M_RD)
> >  		packet_header |= I2C_HEADER_READ;
> > -	i2c_writel(i2c_dev, packet_header, I2C_TX_FIFO);
> > -
> > -	if (!(msg->flags & I2C_M_RD))
> > -		tegra_i2c_fill_tx_fifo(i2c_dev);
> > -
> > +	if (dma && !i2c_dev->msg_read)
> > +		*buffer++ = packet_header;
> > +	else
> > +		i2c_writel(i2c_dev, packet_header, I2C_TX_FIFO);
> > +
> > +	if (!i2c_dev->msg_read) {
> > +		if (dma) {
> > +			memcpy(buffer, msg->buf, msg->len);
> > +			dma_sync_single_for_device(i2c_dev->dev,
> > +
> > i2c_dev->dma_phys,
> > +						   xfer_size,
> > +
> > DMA_TO_DEVICE);  
> 
> Again, here we properly flush the caches to make sure the data that
> we've written to the DMA buffer is visible to the DMA engine.
> 

+1 this is correct

> > +			err = tegra_i2c_dma_submit(i2c_dev,
> > xfer_size);
> > +			if (err < 0) {
> > +				dev_err(i2c_dev->dev,
> > +					"starting TX DMA failed,
> > err %d\n",
> > +					err);
> > +				goto unlock;
> > +			}
> > +		} else {
> > +			tegra_i2c_fill_tx_fifo(i2c_dev);
> > +		}
> > +	}
> >  	if (i2c_dev->hw->has_per_pkt_xfer_complete_irq)
> >  		int_mask |= I2C_INT_PACKET_XFER_COMPLETE;
> > -	if (msg->flags & I2C_M_RD)
> > -		int_mask |= I2C_INT_RX_FIFO_DATA_REQ;
> > -	else if (i2c_dev->msg_buf_remaining)
> > -		int_mask |= I2C_INT_TX_FIFO_DATA_REQ;
> > +	if (!dma) {
> > +		if (msg->flags & I2C_M_RD)
> > +			int_mask |= I2C_INT_RX_FIFO_DATA_REQ;
> > +		else if (i2c_dev->msg_buf_remaining)
> > +			int_mask |= I2C_INT_TX_FIFO_DATA_REQ;
> > +	}
> >  
> >  	tegra_i2c_unmask_irq(i2c_dev, int_mask);
> > -	spin_unlock_irqrestore(&i2c_dev->xfer_lock, flags);
> >  	dev_dbg(i2c_dev->dev, "unmasked irq: %02x\n",
> >  		i2c_readl(i2c_dev, I2C_INT_MASK));
> >  
> > +unlock:
> > +	spin_unlock_irqrestore(&i2c_dev->xfer_lock, flags);
> > +
> > +	if (dma) {
> > +		if (err)
> > +			return err;
> > +
> > +		time_left = wait_for_completion_timeout(
> > +
> > &i2c_dev->dma_complete,
> > +						TEGRA_I2C_TIMEOUT);
> > +
> > +		if (time_left == 0) {
> > +			dev_err(i2c_dev->dev, "DMA transfer
> > timeout\n");
> > +			dmaengine_terminate_all(chan);
> > +			tegra_i2c_init(i2c_dev);
> > +			return -ETIMEDOUT;
> > +		}
> > +
> > +		if (i2c_dev->msg_read) {
> > +			if (likely(i2c_dev->msg_err ==
> > I2C_ERR_NONE)) {
> > +
> > dma_sync_single_for_cpu(i2c_dev->dev,
> > +
> > i2c_dev->dma_phys,
> > +							xfer_size,
> > +
> > DMA_FROM_DEVICE);  
> 
> Here we invalidate the caches to make sure we don't get stale data
> that may be in the caches for data that we're copying out of the DMA
> buffer. I think that's about all the cache maintenance that we
> real
> need.

Correct.

And technically here should be dma_sync_single_for_cpu(DMA_TO_DEVICE)
for the TX. But again, it's a NO-OP.