[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <273E9F9E-B512-4F7F-B016-1F8FF324ED1F@goldelico.com>
Date: Mon, 18 May 2015 18:03:46 +0200
From: "Dr. H. Nikolaus Schaller" <hns@...delico.com>
To: Krzysztof Kozlowski <k.kozlowski@...sung.com>
Cc: Rodolfo Giometti <giometti@...ux.it>,
Sebastian Reichel <sre@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Marek Belisko <marek@...delico.com>,
Linux PM mailing list <linux-pm@...r.kernel.org>,
Dmitry Eremin-Solenikov <dbaryshkov@...il.com>,
David Woodhouse <dwmw2@...radead.org>,
List for communicating with real GTA04 owners
<gta04-owner@...delico.com>
Subject: Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
Am 18.05.2015 um 16:09 schrieb Krzysztof Kozlowski <k.kozlowski@...sung.com>:
> 2015-05-18 22:30 GMT+09:00 Dr. H. Nikolaus Schaller <hns@...delico.com>:
>> Hi Krzysztof,
>>
>> Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@...sung.com>:
>>
>>> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@...delico.com>:
>>>> Hi,
>>>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>>>
>>>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>>>
>>>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>>>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>>>
>>>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>>>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>>>> is properly initialized.
>>>>
>>>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>>>> right before we see the segfault.
>>>>
>>>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>>>> during registration.
>>>>
>>>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>>>> I don’t know what it does to the uevent and if it restores previous operation.
>>>>
>>>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>>>> by value, but by reference to the di->bat struct:
>>>>
>>>> - ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>>>> + di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>>>
>>>> So that code called within the context of power_supply_register_no_ws() could already
>>>> refer to initialized di->bat.
>>>
>>> Indeed this issue was not present in previous design however I think
>>> the architecture was still error-prone. Starting from driver's probe:
>>> - some_driver_probe()
>>> - power_supply_register(&psy)
>>> - device_add()
>>> - kobject_uevent() - notify userspace
>>> - power_supply_uevent()
>>> - power_supply_show_property()
>>> - power_supply_get_property()
>>> - some_driver_get_property()
>>>
>>> So before probe() ends the power supply core calls driver's
>>> get_property(). In that time the driver internal structures may not be
>>> ready yet, because the probe did not end.
>>
>> Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
>> earlier because the call to power_supply_register(&psy) was the last
>> activity.
>
> Right, for most of the drivers it is the last part of probe.
>
>>
>>> The get_property() could
>>> access some registers or other core functions which will be ready
>>> after power_supply_register() call. For example the
>>> bq27x00_battery_get_property() accesses delayed work which could be
>>> (by mistake) not yet initialized.
>>>
>>> I looked at other dev_uevent implementations and almost all of them do
>>> not call back the driver.
>>>
>>> Of course this does not change that I introduced the issue and I feel
>>> bad about it.
>>> I got some ideas for resolving it:
>>> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
>>> member of its state container. This does not solve the issue from
>>> architectural point of view - still some uninitialised data may be
>>> accessed because probe() is in progress. However this would be the
>>> fastest and the least intrusive.
>>
>> The problem might be that it fundamentally changes the driver code
>> architecture. For example one call using di->bat is in
>>
>> bq27x00_battery_status() {
>> …
>> else if (power_supply_am_i_supplied(di->bat))
>> status = POWER_SUPPLY_STATUS_NOT_CHARGING;
>> …
>> }
>>
>>>
>>> 2. Remove calls to get_property() (and other functions provided by
>>> driver) from power_supply_uevent(). Unfortunately this may break
>>> userspace which expects that some data will be present in uevent.
>>>
>>> 3. Change the API to the previous convention, which you pointed as a remedy:
>>> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
>>> This also won't solve the issue from 1. that uevent will be called
>>> before probe ends.
>>
>> Well, we could require that power_supply_register_no_ws() is the last
>> activity to be done in any_driver_probe().
>
> Some drivers (like drivers/power/lp8727_charger.c) register multiple
> power supplies and there can be only one last call. What is even
> important in this lp8727 case, is that it registers:
> - a battery,
> - two chargers which supply this battery.
>
> So when power supplies are registered, the delayed work is executed
> for each supplied battery. Hopefully the the battery is registered as
> last... but I am not quite sure that this is still safe.
>
>>> 4. Block the power_supply_uevent() from calling the get_property()
>>> functions until device_add() finishes. This would solve this issue but
>>> first uevent messages (from adding device) won't contain all of device
>>> data (just like in solution no 2.) and this won't solve other race -
>>> someone may call sysfs show() on created device nodes and thus access
>>> the get_property() before probe finishes.
>>
>>>
>>> Any ideas?
>>
>> 5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
>> is finished? E.g. by registering a one-shot delayed work?
>>
>> There seems to be a shared workqueue (mentioned e.g. in
>> <http://www.makelinux.net/ldd3/chp-7-sect-6>)
>> but that is an area of the kernel I am not familiar with.
>
> Unfortunately there is no guarantee that delayed work will be executed
> after probe ends. It should be but… who knows when it be scheduled?
Ah, I see. It would be necessary to schedule the worker thread only after
probing is successful. So this would deeply go into general driver probing
mechanisms. Or it would need a “call me as soon as my probe function
has successfully completed” callback that a driver can register somewhere.
> Anyway thanks for reporting the issue and ideas.
Please let me know if we can test something.
BR and thanks,
Nikolaus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists