lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abe0b7f7-19de-605b-e1b1-c62930052894@axentia.se>
Date:   Mon, 22 May 2023 21:28:39 +0200
From:   Peter Rosin <peda@...ntia.se>
To:     Uwe Kleine-König <u.kleine-koenig@...gutronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-pwm@...r.kernel.org,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Thierry Reding <thierry.reding@...il.com>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Claudiu Beznea <claudiu.beznea@...rochip.com>
Subject: Re: PWM regression causing failures with the pwm-atmel driver

Hi!

2023-05-22 at 19:23, Uwe Kleine-König wrote:
> Hello Peter,
> 
> On Mon, May 22, 2023 at 05:19:43PM +0200, Peter Rosin wrote:
>> I have a device with a "sound card" that has an amplifier that needs
>> an extra boost when high amplification is requested. This extra
>> boost is controlled with a pwm-regulator.
>>
>> As of commit c73a3107624d ("pwm: Handle .get_state() failures") this
>> device no longer works. I have tracked the problem to an unfortunate
>> interaction between the underlying PWM driver and the PWM core.
>>
>> The driver is drivers/pwm/pwm-atmel.c which has difficulties getting
>> the period and/or duty_cycle from the HW when the PWM is not enabled.
>> Because of this, I think, the driver does not fill in .period and
>> .duty_cycle at all in atmel_pwm_get_state() unless the PWM is enabled.
>>
>> However, the PWM core is not expecting these fields to be left as-is,
>> at least not in pwm_adjust_config(), and its local state variable on
>> the stack ends up with whatever crap was on the stack on entry for
>> these fields. That fails spectacularly when the function continues to
>> do math on these uninitialized values.
>>
>> In particular, I find this in the kernel log when a bad kernel runs:
>> pwm-regulator: probe of reg-ana failed with error -22
>>
>> Before commit c73a3107624d this was a silent failure, and the situation
>> "repaired itself" when the PWM was later reprogrammed, at least for my
>> case. After that commit, the failure is fatal and the "sound card"
>> fails to come up at all.
>>
>>
>> I see a couple of adjustments that could be made.
>>
>> 1. Zero out some fields in the driver:
>>
>> @@ -390,4 +390,6 @@ static int atmel_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
>>  		state->enabled = true;
>>  	} else {
>> +		state->period = 0;
>> +		state->duty_cycle = 0;
>>  		state->enabled = false;
>>  	}
> 
> I don't particularily like that. While state->period is an invalid
> value, IMHO enabled = false is enough information from the driver's POV.

This was the preferred approach of Thierry, and given the number of
call sites for pwm_get_state with a local variable, I can sympathize
with that view. At the same time, fixing drivers one by one is not
a fun game, so I can certainly see that approach 3 also has an appeal.
  
>> 2. Don't try to "adjust" a disabled PWM:
>>
>> @@ -656,7 +656,7 @@ int pwm_adjust_config(struct pwm_device *pwm)
>>  	 * In either case, we setup the new period and polarity, and assign a
>>  	 * duty cycle of 0.
>>  	 */
>> -	if (!state.period) {
>> +	if (!state.enabled || !state.period) {
>>  		state.duty_cycle = 0;
>>  		state.period = pargs.period;
>>  		state.polarity = pargs.polarity;
> 
> In my book code that calls pwm_get_state() should consider .period
> (and .duty_cycle) undefined if .enabled is false. So this hunk is an

I agree fully. This feels like a good hunk.

> improvement. Having said that, I think pwm_adjust_config() has a strange
> semantic. I don't understand when you would want to call it, and it only
> has one caller (i.e. pwm-regulator).

I believe it's for a case where some bootstrap/bootloader has configured
a PWM in order to set up a regulator for some critical component, and the
kernel needs to cake over the responsibility seamlessly, or everything
will take a dive. I deduced that from the description of pwm_adjust_config:

 * This function will adjust the PWM config to the PWM arguments provided
 * by the DT or PWM lookup table. This is particularly useful to adapt
 * the bootloader config to the Linux one.

But what do I know?

> So another option would be
> 
> diff --git a/drivers/regulator/pwm-regulator.c b/drivers/regulator/pwm-regulator.c
> index b64d99695b84..418aff0ddbed 100644
> --- a/drivers/regulator/pwm-regulator.c
> +++ b/drivers/regulator/pwm-regulator.c
> @@ -368,10 +368,6 @@ static int pwm_regulator_probe(struct platform_device *pdev)
>  		return ret;
>  	}
>  
> -	ret = pwm_adjust_config(drvdata->pwm);
> -	if (ret)
> -		return ret;
> -
>  	regulator = devm_regulator_register(&pdev->dev,
>  					    &drvdata->desc, &config);
>  	if (IS_ERR(regulator)) {
> 
> and drop pwm_adjust_config() as a followup?!

As described above, I think that might be too drastic?

>> 3. Zero out the state before calling pwm_get_state:
>>
>> @@ -115,7 +115,7 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
>>  	}
>>  
>>  	if (pwm->chip->ops->get_state) {
>> -		struct pwm_state state;
>> +		struct pwm_state state = { .enabled = true };
> 
> I wonder why you picked .enabled = true here but = false on all other
> code locations.

Silly typo, soory.

> Also you don't seem to have 1271a7b98e7989ba6bb978e14403fc84efe16e13
> which has the same effect and is included in v6.3 and v6.2.13.

Right, approach 3 felt so ugly and intrusive that I didn't bother to take
it beyond the problematic commit (and hinted at that further down: "It also
needs a rebase from the culprit commit"). Having said that, 1271a7b98e79 is
not enough, or else I would never have noticed the problem in the first
place. I don't think the hunk in pwm_dbg_show() makes any difference for my
case since I heven't enabled debugging, and a double-check confirms it:
After 1271a7b98e79, zeroing "state" in pwm_adjust_config() is enough to kill
this regression.

So, given that approach 3 is already half done, it no longer feels as ugly
and intrusive and more like just following the path forward.

Pros and cons, pros and cons. 1, 2, 3, don't know what to do. But hmm, it's
not some annoying "any two of three" crap, and it's actually possible and
not entirely unreasonable to do all three...

I'll let someonw else pick and choose. I have no strong preference.

Cheers,
Peter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ