lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <21ba3465-9e24-4b5a-a239-4a3ed5bf2309@arm.com>
Date: Thu, 13 Feb 2025 11:13:17 +0000
From: Christian Loehle <christian.loehle@....com>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Ulf Hansson <ulf.hansson@...aro.org>, kernel@...gutronix.de,
 linux-kernel@...r.kernel.org, linux-mmc@...r.kernel.org,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Mark Brown <broonie@...nel.org>, "Rafael J. Wysocki" <rafael@...nel.org>
Subject: Re: [PATCH v1 1/1] mmc: core: Handle undervoltage events and register
 regulator notifiers

On 2/13/25 08:57, Oleksij Rempel wrote:
> On Wed, Feb 12, 2025 at 11:47:08PM +0000, Christian Loehle wrote:
>> On 2/12/25 13:24, Oleksij Rempel wrote:
>>> Extend the MMC core to handle undervoltage events by implementing
>>> infrastructure to notify the MMC bus about voltage drops.
>>>
>>> Background & Decision at LPC24:
>>>
>>> This solution was proposed and refined during LPC24 in the talk
>>> "Graceful Under Pressure: Prioritizing Shutdown to Protect Your Data in
>>> Embedded Systems" which aimed to address how Linux should handle power
>>> fluctuations in embedded devices to prevent data corruption or storage
>>> damage.
>>>
>>> At the time, multiple possible solutions were considered:
>>>
>>> 1. Triggering a system-wide suspend or shutdown: when undervoltage is
>>>    detected, with device-specific prioritization to ensure critical
>>>    components shut down first.
>>>    - This approach was disliked by Greg Kroah-Hartman, as it introduced
>>>      complexity and was not suitable for all use cases.
>>>
>>> 2. Notifying relevant devices through the regulator framework: to allow
>>>    graceful per-device handling.
>>>    - This approach was agreed upon as the most acceptable: by participants
>>>      in the discussion, including Greg Kroah-Hartman, Mark Brown,
>>>      and Rafael J. Wysocki.
>>>    - This patch implements that decision by integrating undervoltage
>>>      handling into the MMC subsystem.
>>>
>>> This patch was tested on iMX8MP based system with SDHCI controller.
>>
>> Any details here? How long does it take from undervoltage to
>> poweroff notification.
> 
> On this system, with current implementation, it takes 4.5 millisecond
> from voltage drop detection to mmc_poweroff_notify.
> 
>> Roughly how long of a heads up would that yield in realistic
>> undervoltage scenarios?
> 
> It depends on the board implementation and attached power supply.
> In my case, the testing system provides about 100ms capacity on board.
> The power supply provides additional 1-2 seconds.
> 
> If the power is cut between power supply and board, we will have max
> 100ms.

Thanks, that's not too bad then.

> 
>>> +static int _mmc_handle_undervoltage(struct mmc_host *host)
>>> +{
>>> +	return mmc_shutdown(host);
>>> +}
>>> +
>>
>> The poweroff notification part I understand, because it polls for busy
>> (i.e. hopefully until the card thinks it's done committing to flash).
>> Poweroff isn't always available though, the other paths of
>> _mmc_suspend() are:
>>
>> 	else if (mmc_can_sleep(host->card))
>> 		err = mmc_sleep(host);
>> 	else if (!mmc_host_is_spi(host))
>> 		err = mmc_deselect_cards(host);
>>
>> 	if (!err) {
>> 		mmc_power_off(host);
>>
>> So we may also just deselect, which AFAIR succeeds as a FSM (i.e.
>> doesn't mean anything was committed to flash) and then we just
>> poweroff.
>> Is that what we want in an undervoltage scenario?
> 
> Yes. In an undervoltage scenario, our primary priority is to protect the
> hardware from damage. Data integrity is secondary in this case. The most
> critical action is to immediately stop writing to the card.  

Protect hardware from damage by not having the powerfail during a
host-write? An active host-write command doesn't sound like the
actual cause, any writing metadata to flash more likely, which in the
deselect->poweroff case isn't ensured at all to not be the case.

While I still think this should be handled at procurement instead of the
kernel (especially if you don't even care about data integrity?), I'd
still be interested how we would ensure we aren't doing more harm than
good here.
Any info from some vendors how they implement any of these? IME only
poweroff notify would do anything at all to help and if that isn't
available we should leave the voltage up for the card and idle.

FWIW poweroff notify should probably be
EXT_CSD_POWER_OFF_SHORT
instead of the current EXT_CSD_POWER_OFF_LONG for your intended use case.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ