[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fbd21f3-2da0-5a10-23c4-fcfecfe48311@suse.com>
Date: Thu, 27 Jul 2023 10:46:05 +0200
From: Oliver Neukum <oneukum@...e.com>
To: liulongfang <liulongfang@...wei.com>,
Oliver Neukum <oneukum@...e.com>,
Greg KH <gregkh@...uxfoundation.org>
Cc: linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] USB:bugfix a controller halt error
On 27.07.23 09:00, liulongfang wrote:
> On 2023/7/26 19:16, Oliver Neukum wrote:
>> 1. temporary - that is you have detected memory corruption but the RAM cell is not broken
>> 2. unrecoverable - that is we have lost data
>> 3. locateable - that is you know it hit the buffer of this operation and only it
>>
>> Am I correct so far?
>>
> You are right about the testing process.
> But this problem can exist in the real environment, just the probability of
> occurrence is very low.
Understood. Bit flips are random.
But this leaves two open questions.
1. How is the error reported
2. How are we supposed to handle it
Firstly, if we already know that there is an ECC failure
on the host we can use a specific error code and can check
for that.
Secondly, does this mean that the affected memory location
must not be touched until the machine is power cycled
or does it simply mean that the buffer is invalid?
> Our test tool only simulates that external interference destroys this part
> of the data in the buffer on the ECC memory. Even without this testing tool.
> This problem may also occur on real business hardware devices.
Understood. But what is the correct remedy if teh problem strikes
for real?
Regards
Oliver
Powered by blists - more mailing lists