linux-kernel - Re: [PATCH] i2c: designware: use enable on resume instead initialization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 23 Jun 2015 18:45:10 +0200
From:	christian.ruppert@...tech.com
To:	christian.ruppert@...tech.com
Cc:	Christian Ruppert <christian.ruppert@...lis.com>,
	Fabio Mello <fabio.mello@...el.com>,
	Jarkko Nikula <jarkko.nikula@...ux.intel.com>,
	linux-i2c@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>,
	Lucas De Marchi <lucas.demarchi@...el.com>,
	Lucas De Marchi <lucas.de.marchi@...il.com>,
	Mika Westerberg <mika.westerberg@...ux.intel.com>,
	Wolfram Sang <wsa@...-dreams.de>
Subject: Re: [PATCH] i2c: designware: use enable on resume instead initialization

Hello,

Christian Ruppert/ALi_GVA/ALi wrote on 10.06.2015 17:05:16:
> Mika Westerberg <mika.westerberg@...ux.intel.com> wrote on 10.06.
> 2015 09:07:22:
> > On Tue, Jun 09, 2015 at 03:29:01PM -0300, Lucas De Marchi wrote:
> > > Hi Mika,
> > > 
> > > On Tue, Jun 9, 2015 at 5:51 AM, Mika Westerberg
> > > <mika.westerberg@...ux.intel.com> wrote:
> > > > On Mon, Jun 08, 2015 at 02:50:28PM -0300, 
> lucas.de.marchi@...il.com wrote:
> > > >> From: Fabio Mello <fabio.mello@...el.com>
> > > >>
> > > >> According to documentation and tests, initialization is not
> > > >> necessary on module resume, since the controller keeps its state
> > > >> between disable/enable. Change the target address is also 
allowed.
> > > >>
> > > >> So, this patch replaces the initialization on module resume with 
a
> > > >> simple enable, and removes the (non required anymore) enables and
> > > >> disables.
> > > >>
> > > >> Signed-off-by: Fabio Mello <fabio.mello@...el.com>
> > > >> Signed-off-by: Lucas De Marchi <lucas.demarchi@...el.com>
> > > >> ---
> > > >>
> > > >> These pictures explain a little more the consequence of letting 
the
> > > >> enable+disable in the code:
> > > >>
> > > >>       http://pub.politreco.com/paste/TEK0011-before.jpg
> > > >>       http://pub.politreco.com/paste/TEK0007-after.jpg
> > > >>
> > > >> The yellow line is a GPIO toggle in userspace to mark when we 
> > start and finish
> > > >> the i2c transactions.  The blue line is the SCL in that i2c 
> > bus. Take a look on
> > > >> the huge pauses we have between any 2 transactions.  These 
> > pauses are removed
> > > >> with this patch and we are able to read our sensor's values in 
> > 950usec rather
> > > >> than 5.24msec we had before.  We are testing this using a 
> > Minnowboard Max that
> > > >> has a designware i2c controller.
> > > >
> > > > Did you test this on any other platform than Intel Baytrail?
> > > 
> > > No. The only soc we have here with this controller is the Baytrail.
> > 
> > My concern is that this patch might break some non-Intel platform. It
> > would be nice if someone (Christian?) could try this out.
> 
> Ouch, this one brings back painful memories. Take a look at patch 
> 38d7fadef4973bb94e36897fcb6bb6a12fdd10c9 (https://git.kernel.org/
> cgit/linux/kernel/git/torvalds/linux.git/commit/?
> id=38d7fadef4973bb94e36897fcb6bb6a12fdd10c9) for the context.
> 
> In brief:
> - Before patch 38d7fade, the driver disabled the hardware after 
> successful transfers. I do not know the reason for this and I cannot
> judge whether this is necessary or not. I thus decided not to modify
> this behaviour in patch 38d7fade.
> 
> - After patch 38d7fade, the driver disabled the dw controller after 
> all transfers, in particular in the case of unsuccessful transfers. 
> This modification was necessary because of a race condition 
> triggered by an i2c slave device which interrupted transfers in the 
> middle. In this case, the dw controller (at least our version) seems
> to send spurious interrupts later if it is not disabled. The 
> interrupt handler is not designed to be called on already aborted 
> transfers, however, which leads to undesirable behaviour if the 
> interrupt occurs at the wrong moment (system hangs in irq loop).
> 
> I will try to dig out the test setup we used to validate the patch 
> at the time but given the fact that this was two years ago this 
> might take a little while. In the meantime, do you have any means to
> stress test the case of unexpected events on the bus (client aborts 
> transfer, timeout etc.)? An alternative might be to only disable the
> controller in the case of errors and leave it active after 
> successful transfers. We should understand why the controller was 
> disabled after successful transfers in the first place, however. 
> Maybe some quirk with older versions of the hardware? Mika, do you 
> have any memories about this?

As promised I tried to dig out the test setup we used to validate patch 
38d7fade at the time but without success. (I half expected that after such 
a long time...)

So I said to myself, let's give the patch a try nevertheless and see if it 
works in our system at least in the nominal case (no i2c bus errors).

The result is not very encouraging: Out of five (identical) designware i2c 
controllers we have on my test SOC, the first one initialises properly but 
the second one gets stuck in the famous irq loop right away when the 
module is enabled in i2c_dw_init. The system never gets around to try 
initialising the remaining three controllers due to the irq loop. I didn't 
have the time to investigate the details yet but I suspect this is 
triggered by some nastily behaved device on the bus. For example, some of 
our external devices are notorious for keeping i2c  lines tied to zero 
before being properly powered on/reset/initialised by their respective 
drivers (which in turn depend on the i2c master to be initialised first, 
obviously). I'll grab an oscilloscope and dump the waves to confirm this 
suspicion on the occasion.
However, similar situations might occur in multi-master setups (which we 
don't have). I suspect the driver/hardware to also end up in the irq loop 
after loosing arbitration with this patch. Has anybody the means to test 
this in a multi-master system?

Greetings,
  Christian


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/