lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c2706cf-bce4-05f7-9498-5755c78daaa3@axentia.se>
Date:   Tue, 23 May 2023 01:34:39 +0200
From:   Peter Rosin <peda@...ntia.se>
To:     Uwe Kleine-König <u.kleine-koenig@...gutronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>, linux-pwm@...r.kernel.org,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Thierry Reding <thierry.reding@...il.com>,
        Conor Dooley <conor.dooley@...rochip.com>,
        Claudiu Beznea <claudiu.beznea@...rochip.com>
Subject: Re: PWM regression causing failures with the pwm-atmel driver

Hi!

2023-05-22 at 22:43, Uwe Kleine-König wrote:
> Hello Peter,
> 
> On Mon, May 22, 2023 at 09:28:39PM +0200, Peter Rosin wrote:
>> 2023-05-22 at 19:23, Uwe Kleine-König wrote:
>>> On Mon, May 22, 2023 at 05:19:43PM +0200, Peter Rosin wrote:
>>>> I have a device with a "sound card" that has an amplifier that needs
>>>> an extra boost when high amplification is requested. This extra
>>>> boost is controlled with a pwm-regulator.
>>>>
>>>> As of commit c73a3107624d ("pwm: Handle .get_state() failures") this
>>>> device no longer works. I have tracked the problem to an unfortunate
>>>> interaction between the underlying PWM driver and the PWM core.
>>>>
>>>> The driver is drivers/pwm/pwm-atmel.c which has difficulties getting
>>>> the period and/or duty_cycle from the HW when the PWM is not enabled.
>>>> Because of this, I think, the driver does not fill in .period and
>>>> .duty_cycle at all in atmel_pwm_get_state() unless the PWM is enabled.
>>>>
>>>> However, the PWM core is not expecting these fields to be left as-is,
>>>> at least not in pwm_adjust_config(), and its local state variable on
>>>> the stack ends up with whatever crap was on the stack on entry for
>>>> these fields. That fails spectacularly when the function continues to
>>>> do math on these uninitialized values.
> 
> After looking again, I don't understand that part. Note that
> pwm_get_state() doesn't call .get_state() at all. Also pwmchip_add()
> zero initializes .state, then pwm_get() calls .get_state() (via
> pwm_device_request() which is called in .xlate()) which (if the HW is
> disabled) doesn't touch .period, so it should continue to be zero?!
> 
> So I wonder why your approach 2 works at all. Do you see what I'm
> missing?

I tried various things on top of v6.3 and you are right. My hunks do
not make a (significant) difference there.

So, I took a step back and can only conclude that there must be some
another regression to find, and I was confused by that other regression.
In short, I was on 6.1.<foo> and everything was fine, and then I bumped
to 6.3 and a process crashed. I went to 6.2 and that same process also
crashed. I then totally focused on v6.1..v6.2 to figure out the problem.
I simply assumed v6.3 had the same problem because the symptom from
30.000ft was the same (that process died). I failed to go back to v6.3
to confirm that it was indeed the same problem as I had found in the
v6.1..v6.2 range.

My bad, it seems I have another day of bisections lined up.

>>>> In particular, I find this in the kernel log when a bad kernel runs:
>>>> pwm-regulator: probe of reg-ana failed with error -22
>>>>
>>>> Before commit c73a3107624d this was a silent failure, and the situation
>>>> "repaired itself" when the PWM was later reprogrammed, at least for my
>>>> case. After that commit, the failure is fatal and the "sound card"
>>>> fails to come up at all.
>>>>
>>>>
>>>> I see a couple of adjustments that could be made.
>>>>
>>>> 1. Zero out some fields in the driver:
>>>>
>>>> @@ -390,4 +390,6 @@ static int atmel_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
>>>>  		state->enabled = true;
>>>>  	} else {
>>>> +		state->period = 0;
>>>> +		state->duty_cycle = 0;
>>>>  		state->enabled = false;
>>>>  	}
>>>
>>> I don't particularily like that. While state->period is an invalid
>>> value, IMHO enabled = false is enough information from the driver's POV.
>>
>> This was the preferred approach of Thierry, and given the number of
>> call sites for pwm_get_state with a local variable, I can sympathize
>> with that view.
> 
> I looked a bit more into the issue and think that pwm_get_state() isn't
> problematic. pwm_get_state() fully assigns *state.

Right, the only way pwm_get_state() can be problematic is if bogus values
have made their way into pwm->state earlier. Which is what happened for
v6.2.

>> At the same time, fixing drivers one by one is not
>> a fun game, so I can certainly see that approach 3 also has an appeal.
> 
> What I don't like about it is that the output of a disabled PWM doesn't
> have a period. There might be one configured in hardware(, and the
> .get_state() callback might or might not return that), but the emitted
> signal has no well-defined period.
> 
>>>> 2. Don't try to "adjust" a disabled PWM:
>>>>
>>>> @@ -656,7 +656,7 @@ int pwm_adjust_config(struct pwm_device *pwm)
>>>>  	 * In either case, we setup the new period and polarity, and assign a
>>>>  	 * duty cycle of 0.
>>>>  	 */
>>>> -	if (!state.period) {
>>>> +	if (!state.enabled || !state.period) {
>>>>  		state.duty_cycle = 0;
>>>>  		state.period = pargs.period;
>>>>  		state.polarity = pargs.polarity;
>>>
>>> In my book code that calls pwm_get_state() should consider .period
>>> (and .duty_cycle) undefined if .enabled is false. So this hunk is an
>>
>> I agree fully. This feels like a good hunk.
>>
>>> improvement. Having said that, I think pwm_adjust_config() has a strange
>>> semantic. I don't understand when you would want to call it, and it only
>>> has one caller (i.e. pwm-regulator).
>>
>> I believe it's for a case where some bootstrap/bootloader has configured
>> a PWM in order to set up a regulator for some critical component, and the
>> kernel needs to cake over the responsibility seamlessly, or everything
>> will take a dive. I deduced that from the description of pwm_adjust_config:
> 
> If it wants to take over seamlessly, it shouldn't call pwm_apply_state()
> at all?!
> 
>>  * This function will adjust the PWM config to the PWM arguments provided
>>  * by the DT or PWM lookup table. This is particularly useful to adapt
>>  * the bootloader config to the Linux one.
>>
>> But what do I know?
> 
>>> So another option would be
>>>
>>> diff --git a/drivers/regulator/pwm-regulator.c b/drivers/regulator/pwm-regulator.c
>>> index b64d99695b84..418aff0ddbed 100644
>>> --- a/drivers/regulator/pwm-regulator.c
>>> +++ b/drivers/regulator/pwm-regulator.c
>>> @@ -368,10 +368,6 @@ static int pwm_regulator_probe(struct platform_device *pdev)
>>>  		return ret;
>>>  	}
>>>  
>>> -	ret = pwm_adjust_config(drvdata->pwm);
>>> -	if (ret)
>>> -		return ret;
>>> -
>>>  	regulator = devm_regulator_register(&pdev->dev,
>>>  					    &drvdata->desc, &config);
>>>  	if (IS_ERR(regulator)) {
>>>
>>> and drop pwm_adjust_config() as a followup?!
>>
>> As described above, I think that might be too drastic?
>>
>>>> 3. Zero out the state before calling pwm_get_state:
>>>>
>>>> @@ -115,7 +115,7 @@ static int pwm_device_request(struct pwm_device *pwm, const char *label)
>>>>  	}
>>>>  
>>>>  	if (pwm->chip->ops->get_state) {
>>>> -		struct pwm_state state;
>>>> +		struct pwm_state state = { .enabled = true };
>>>
>>> I wonder why you picked .enabled = true here but = false on all other
>>> code locations.
>>
>> Silly typo, soory.
>>
>>> Also you don't seem to have 1271a7b98e7989ba6bb978e14403fc84efe16e13
>>> which has the same effect and is included in v6.3 and v6.2.13.
>>
>> Right, approach 3 felt so ugly and intrusive that I didn't bother to take
>> it beyond the problematic commit (and hinted at that further down: "It also
>> needs a rebase from the culprit commit"). Having said that, 1271a7b98e79 is
>> not enough, or else I would never have noticed the problem in the first
>> place. I don't think the hunk in pwm_dbg_show() makes any difference for my
>> case since I heven't enabled debugging, and a double-check confirms it:
>> After 1271a7b98e79, zeroing "state" in pwm_adjust_config() is enough to kill
>> this regression.
> 
> That really puzzles me because pwm_get_state() fully overwrites state.

I only verified that I had no problem /with/ my change. See above.

Sorry for wasting time.

Cheers,
Peter

> 
> Best regards
> Uwe
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ