lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a305e74-2235-47ab-8564-0c594f24dc0a@os.amperecomputing.com>
Date:   Mon, 25 Sep 2023 12:39:37 -0700
From:   Jan Bottorff <janb@...amperecomputing.com>
To:     Serge Semin <fancer.lancer@...il.com>
Cc:     Yann Sionneau <ysionneau@...rayinc.com>,
        Wolfram Sang <wsa@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Yann Sionneau <yann@...nneau.net>,
        Will Deacon <will@...nel.org>,
        Jarkko Nikula <jarkko.nikula@...ux.intel.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        Jan Dabros <jsd@...ihalf.com>,
        Andi Shyti <andi.shyti@...nel.org>,
        Philipp Zabel <p.zabel@...gutronix.de>,
        linux-i2c@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] i2c: designware: Fix corrupted memory seen in the ISR

On 9/25/2023 5:54 AM, Serge Semin wrote:
> On Wed, Sep 20, 2023 at 12:14:17PM -0700, Jan Bottorff wrote:
>> On 9/20/2023 6:27 AM, Yann Sionneau wrote:
>>> Hi,
>>>
>>> On 20/09/2023 11:08, Wolfram Sang wrote:
>>>>> same thread." [1] Thus I'd suggest the next fix for the problem:
>>>>>
>>>>> --- a/drivers/i2c/busses/i2c-designware-common.c
>>>>> +++ b/drivers/i2c/busses/i2c-designware-common.c
>>>>> @@ -72,7 +72,10 @@ static int dw_reg_write(void *context,
>>>>> unsigned int reg, unsigned int val)
>>>>>    {
>>>>>        struct dw_i2c_dev *dev = context;
>>>>> -    writel_relaxed(val, dev->base + reg);
>>>>> +    if (reg == DW_IC_INTR_MASK)
>>>>> +        writel(val, dev->base + reg);
>>>>> +    else
>>>>> +        writel_relaxed(val, dev->base + reg);
>>>>>        return 0;
>>>>>    }
>>>>>
>>>>> (and similar changes for dw_reg_write_swab() and dw_reg_write_word().)
>>>>>
>>>>> What do you think?
>>>> To me, this looks reasonable and much more what I would have expected as
>>>> a result (from a high level point of view). Let's hope it works. I am
>>>> optimistic, though...
>>>>
>>> It works if we make sure all the other register accesses to the
>>> designware i2c IP can't generate IRQ.
>>>
>>> Meaning that all register accesses that can trigger an IRQ are enclosed
>>> in between a call to i2c_dw_disable_int() and a call to
>>> regmap_write(dev->map, DW_IC_INTR_MASK, DW_IC_INTR_MASTER_MASK); or
>>> equivalent.
>>>
>>> It seems to be the case, I'm not sure what's the best way to make sure
>>> it will stay that way.
>>>
>>> Moreover, maybe writes to IC_ENABLE register should also use the
>>> non-relaxed writel() version?
>>>
>>> Since one could do something like:
>>>
>>> [ IP is currently disabled ]
>>>
>>> 1/ enable interrupts in DW_IC_INTR_MASK
>>>
>>> 2/ update some variable in dev-> structure in DDR
>>>
>>> 3/ enable the device by writing to IC_ENABLE, thus triggering for
>>> instance the TX_FIFO_EMPTY irq.
>>>
>>
>> It does seem like there are a variety of register write combinations that
>> could immediately cause an interrupt, so would need a barrier.
> 
> My suggestion was based on your fix. If it won't work or if it won't
> completely solve the problem, then perhaps one of the next option
> shall do it:
> 1. Add the non-relaxed IO call for the IC_ENABLE CSR too.
> 2. Completely convert the IO accessors to using the non-relaxed
> methods especially seeing Wolfram already noted: "Again, I am all with
> Catalin here. Safety first, optimizations a la *_relaxed should be
> opt-in."
> https://lore.kernel.org/linux-i2c/ZQm2Ydt%2F0jRW4crK@shikoro/
> 3. Find all the places where the memory writes need to be fully
> visible after a subsequent IO-write causing an IRQ raise and just
> place dma_wmb() there (though just wmb() would look a bit more
> relevant).
> 
> IMO in the worst case solution 2. must be enough at least in the
> master mode seeing the ISR uses the completion variable to indicate
> the cmd execution completion, which also implies the complete memory
> barrier. Moreover i2c bus isn't that performant for us to be that much
> concerned about the optimizations like the pipeline stalls in between
> the MMIO accesses.
> 

I did stress testing for a few days on our processor of the proposed fix 
that makes writes to DW_IC_INTR_MASK use writel instead of 
writel_relaxed in dw_reg_write. The problem we were seeing is fixed. On 
our system, the problem was occurring when many ssif (ipmi over i2c) 
transfers were done. The stress test was running "ipmitool sdr elist" in 
a loop. Without the change, multiple errors per day from the driver are 
seen in the kernel log.

I'm good with a change that just has that one change. Also applying 
non-relaxed to dw_reg_write_swab and dw_reg_write_word was also 
suggested for completeness.

Does anybody have concerns about other cases that may not get fixed by 
this change? We did have hypothetical cases, like with IC_ENABLE, that 
could have the same issue.

So my next question, is the change to dw_reg_write something that I 
should write and submit, or should someone else submit something more 
generalized, like option 2 above? I don't own the i2c driver, I'm just 
trying to fix one issue on one processor with minimal risk of breaking 
something. I don't have the broader view of what's optimal for the whole 
DesignWare i2c driver. I also don't have any way to test changes on 
other models of processors.




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ