linux-kernel - RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer

Open Source and information security mailing list archives

Message-ID: <149101d4bfb9$fdc5a330$f950e990$@gmail.com>
Date:   Fri, 8 Feb 2019 23:23:59 +0900
From:   "Tokunori Ikegami" <ikegami.t@...il.com>
To:     "'Sobon, Przemyslaw'" <psobon@...zon.com>,
        "'Boris Brezillon'" <boris.brezillon@...labora.com>
Cc:     <keescook@...omium.org>, <marek.vasut@...il.com>, <richard@....at>,
        <linux-kernel@...r.kernel.org>, <joakim.tjernlund@...inera.com>,
        <linux-mtd@...ts.infradead.org>, <computersforpeace@...il.com>,
        <dwmw2@...radead.org>, "'Liu Jian'" <liujian56@...wei.com>,
        <ikegami_to@...oo.co.jp>
Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer

Hi Przemek-san,

Thank you so much for your explanation.

> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.

I also know the similar issues for the both buffer and word write.
Both issues were able to reproduce the write error behavior.
  Note: The word write issue is able to reproduce now also.

Those were resolved by using chip_good() instead to check the state.

> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that.

If possible I would like to know the issue detail and its cause also.

> Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.

Yes I see.

Regards,
Ikegami

> -----Original Message-----
> From: linux-mtd [mailto:linux-mtd-bounces@...ts.infradead.org] On Behalf
> Of Sobon, Przemyslaw
> Sent: Friday, February 8, 2019 8:51 AM
> To: ikegami_to@...oo.co.jp; Boris Brezillon
> Cc: keescook@...omium.org; marek.vasut@...il.com;
> ikegami@...ied-telesis.co.jp; richard@....at;
> linux-kernel@...r.kernel.org; joakim.tjernlund@...inera.com;
> linux-mtd@...ts.infradead.org; computersforpeace@...il.com;
> dwmw2@...radead.org; Liu Jian
> Subject: RE: Re: [PATCH] cfi: fix deadloop in cfi_cmdset_0002.c
> do_write_buffer
> 
> Hi Ikegami,
> 
> I have seen a case myself where a value was written, chip changed
> state to "ready" but when I was reading the value was incorrect.
> This can happen as result of intermittent issue with flash. It is
> hard to fall into scenario when testing on limited number of devices
> but with large enough population you can see that. Another situation
> is when a flash chip reaches its maximum number of writes. So for
> example a chip is designed for 100k writes to a page. Once you
> reach that number of writes you can have invalid data written to
> flash but chip itself reports everything was good and switches to
> "ready" state.
> 
> Hope this explanation is clear. Please let me know.
> 
> Regards,
> Przemek
> 
> > -----Original Message-----
> > From: ikegami_to@...oo.co.jp <ikegami_to@...oo.co.jp>
> > Sent: Thursday, February 7, 2019 3:00 PM
> >
> > Hi Przemek-san,
> >
> > Could you please explain the case detail that the value is written
> incorrectly?
> > I think that the value is only written correctly except a bug.
> >
> > Regards,
> > Ikegami
> >
> > --- boris.brezillon@...labora.com wrote --- :
> > > Hi Sobon,
> > >
> > > On Tue, 5 Feb 2019 22:28:44 +0000
> > > "Sobon, Przemyslaw" <psobon@...zon.com> wrote:
> > >
> > > > > From: Boris Brezillon <bbrezillon@...nel.org>
> > > > > Sent: Sunday, February 3, 2019 12:35 AM
> > > > > > +Przemyslaw
> > > > > >
> > > > > > On Fri, 1 Feb 2019 07:30:39 +0800 Liu Jian
> > > > > > <liujian56@...wei.com> wrote:
> > > > > >
> > > > > > > In function do_write_buffer(), in the for loop, there is a
> > > > > > > case
> > > > > > > chip_ready() returns 1 while chip_good() returns 0, so it
> > > > > > > never break the loop.
> > > > > > > To fix this, chip_good() is enough and it should timeout if
> it
> > > > > > > stay bad for a while.
> > > > > >
> > > > > > Looks like Przemyslaw reported and fixed the same problem.
> > > > > >
> > > > > > >
> > > > > > > Fixes: dfeae1073583(mtd: cfi_cmdset_0002: Change write buffer
> > > > > > > to check correct value)
> > > > > >
> > > > > > Can you put the Fixes tag on a single, and the format is
> > > > > >
> > > > > > Fixes: <hash> ("message")
> > > > > >
> > > > > > > Signed-off-by: Yi Huaijie <yihuaijie@...wei.com>
> > > > > > > Signed-off-by: Liu Jian <liujian56@...wei.com>
> > > > > >
> > > > > > [1]http://patchwork.ozlabs.org/patch/1025566/
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/mtd/chips/cfi_cmdset_0002.c | 6 +++---
> > > > > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > index 72428b6..818e94b 100644
> > > > > > > --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> > > > > > > @@ -1876,14 +1876,14 @@ static int __xipram
> do_write_buffer(struct map_info *map, struct flchip *chip,
> > > > > > >              continue;
> > > > > > >          }
> > > > > > >
> > > > > > > -        if (time_after(jiffies, timeo) && !chip_ready(map,
> adr))
> > > > > > > -            break;
> > > > > > > -
> > > > > > >          if (chip_good(map, adr, datum)) {
> > > > > > >              xip_enable(map, chip, adr);
> > > > > > >              goto op_done;
> > > > > > >          }
> > > > > > >
> > > > > > > +        if (time_after(jiffies, timeo))
> > > > > > > +            break;
> > > > > > > +
> > > > > > >          /* Latency issues. Drop the lock, wait a while and
> retry */
> > > > > > >          UDELAY(map, chip, adr, 1);
> > > > > > >      }
> > > > > >
> > > > >
> > > > > BTW, the patch itself looks good to me. Ikegami, can you confirm
> it does the right thing?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Boris
> > > > >
> > > >
> > > > One comment to this patch. If value is written incorrectly quickly
> > > > we will be stuck in the loop even though nothing is going to change.
> > > > For example a value was written incorrectly after 1us, the loop was
> > > > set to 1ms, function will return after 1ms, this solution is not
> > > > optimized for performance. I considered same when working on this
> change and decided to do it different way.
> > >
> > > Seems like you're right if we assume that checking for GOOD state does
> > > not require a delay after the READY check, but if that's not the case
> > > and an extra delay is actually required, you might end up with a BAD
> > > status while it could have turned GOOD at some point with the 'check
> > > only for GOOD state until we timeout' approach.
> > >
> > > TBH, I don't know how CFI flashes work, so I'll let you guys sort this
> > > out.
> > >
> > > Regards,
> > >
> > > Boris
> > >
> > > ______________________________________________________
> > > Linux MTD discussion mailing list
> > > http://lists.infradead.org/mailman/listinfo/linux-mtd/
> > >
> >
> >
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives