linux-kernel - Re: [PATCH 1/2] mtd: spi-nor: winbond: Add support for w25q01jv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <mafs08qrc5g44.fsf@kernel.org>
Date: Wed, 15 Jan 2025 14:03:23 +0000
From: Pratyush Yadav <pratyush@...nel.org>
To: Miquel Raynal <miquel.raynal@...tlin.com>
Cc: Pratyush Yadav <pratyush@...nel.org>,  Tudor Ambarus
 <tudor.ambarus@...aro.org>,  Michael Walle <mwalle@...nel.org>,  Richard
 Weinberger <richard@....at>,  Vignesh Raghavendra <vigneshr@...com>,
  Thomas Petazzoni <thomas.petazzoni@...tlin.com>,  Steam Lin
 <STLin2@...bond.com>,  linux-mtd@...ts.infradead.org,
  linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mtd: spi-nor: winbond: Add support for w25q01jv

On Tue, Jan 14 2025, Miquel Raynal wrote:

> Hello Pratyush,
>
>>> Winbond chips (maybe this is a shared capability?) accepts another
>>> command, "Write Enable for Volatile Status Register (50h)", which
>>> specifically change the status register bits to use the volatile method.
>>>
>>> Hence, if the only situation we want to solve is the status register
>>> access, then we may just enable this command (this is the third solution
>>> I tried to explain in the commit log), but if we think there are other
>>> racy situations, this approach is not complete and we must fallback to
>>> one of the approaches listed above.
>>
>> I am not quite sure how you fix the write-enable-being-racy bug with
>> your patch. If you look at the code, spi_nor_write_enable() only calls
>> the write enable command (06h), and does not call
>> spi_nor_wait_till_ready() after that. After the write enable, it
>> immediately executes the program or erase operation. So you never
>> actually wait for all dies to be ready after a write enable.
>
> I will double check but my understanding is that the *status register*
> write is racy, not the spi_nor_write_enable().

Okay, I am confused because you said earlier that:

> The bug that has been experienced followed this sequence:
> - send the write enable command (non-volatile)
> - wait for the ready/busy bit, ie. wait for the WEL bit to be set
>   because it is non-volatile write
> - active die is ready, (but idle die is not!)
> - enter 4-byte address mode, only the die that is ready processes the
>   command.

Which says the WEL bit being set itself is racy. What I understand from
that is one die is ready to take writes and the other is not. Now when
you try to write the SR to enable 4B mode, it would only work on the die
that got the WEL set. The other one ignores it and stays in 3B mode. Do
I understand this correctly? To fix this you need to wait after the
write enable, before you initiate the write SR operation.

>
>> You can see an example in spi_nor_write(). It does:
>>
>>     spi_nor_write_enable() -> spi_nor_write_data() ->
>>     spi_nor_wait_till_ready()
>
> What is racy is: act on all dies then check the status of a single die.

Your patch fixes all such operations, except write enable IIUC. For
operations such as write SR (or any other register) or chip erase, we
would call spi_nor_wait_till_ready(), and your patch would make sure all
dies are ready.

But when write enable itself is racy, then we would need to add a wait
after the write enable, which your patch does not do. I am a bit
confused right now whether that is an actual problem or I just misread
your message. If write enable itself isn't racy, then the v3 series
should be good to go.

>
>> Do you have a consistent reproducer for the race? If so, does the patch
>> actually somehow make the race go away? If so, I would be curious to
>> know why.
>
> Not with Linux, it is a problem that has been (consistently) observed
> using an rtos. It's been analysed so we know what the issue is and we
> want to make sure this cannot happen using Linux.
>
> Thanks,
> Miquèl

-- 
Regards,
Pratyush Yadav