[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C9D177D.6090708@ti.com>
Date:	Fri, 24 Sep 2010 16:26:21 -0500
From:	Nishanth Menon <nm@...com>
To:	"paulmck@...ux.vnet.ibm.com" <paulmck@...ux.vnet.ibm.com>
CC:	linux-pm <linux-pm@...ts.linux-foundation.org>,
	lkml <linux-kernel@...r.kernel.org>,
	linux-arm <linux-arm-kernel@...ts.infradead.org>,
	linux-omap <linux-omap@...r.kernel.org>
Subject: Re: [PATCH v4] power: introduce library for device-specific OPPs
Paul E. McKenney had written, on 09/24/2010 02:37 PM, the following:
[...]
> 
> Looks like a good start!!!  Some questions and suggestions about RCU
Thanks for the review.. few comments below..
> usage interspersed below.
> 
>                                                         Thanx, Paul
[...]
>> +
>> +/**
>> + * find_device_opp() - find device_opp struct using device pointer
>> + * @dev:     device pointer used to lookup device OPPs
>> + *
>> + * Search list of device OPPs for one containing matching device. Does a RCU
>> + * reader operation to grab the pointer needed.
>> + *
>> + * Returns pointer to 'struct device_opp' if found, otherwise -ENODEV or
>> + * -EINVAL based on type of error.
>> + */
>> +static struct device_opp *find_device_opp(struct device *dev)
>> +{
>> +     struct device_opp *tmp_dev_opp, *dev_opp = ERR_PTR(-ENODEV);
>> +
>> +     if (unlikely(!dev || IS_ERR(dev))) {
>> +             pr_err("%s: Invalid parameters being passed\n", __func__);
>> +             return ERR_PTR(-EINVAL);
>> +     }
>> +
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) {
>> +             if (tmp_dev_opp->dev == dev) {
>> +                     dev_opp = tmp_dev_opp;
>> +                     break;
>> +             }
>> +     }
>> +     rcu_read_unlock();
> 
> What prevents the structure pointed to by dev_opp from being freed
> at this point?  We are no longer in an RCU read-side critical section,
> so RCU grace periods starting during the above RCU read-side critical
> section can now end.
dev_opp is never freed in the implementation -> it represents domains, 
only adds with list_add_rcu() is done -> wont the usage be safe then? or 
I being blind?
the reason why we dont free is coz of the following: dev_opp represents 
voltage domains in opp library. SoC frameworks are required to register 
only those voltage domain opp that are required. by allowing a free 
logic, I knew it'd have complicated the implementation way beyond what 
we needed it to be.
> 
> Here is an example sequence of events that I am worried about:
> 
> o       CPU 1 enters find_device_opp(), and pick up a pointer to
>         a given device opp.
> 
> o       CPU 2 executes opp_set_availability(), replacing that same
>         device opp with a new one.  It then calls synchronize_rcu()
>         which blocks waiting for CPU 1 to exit its RCU read-side
>         critical section.
> 
> o       CPU 1 exits its RCU read-side critical section, arriving at
>         this point in the code.
> 
> o       CPU 2's synchronize_rcu() is now permitted to return, executing
>         the kfree(), which frees up the memory that CPU 1's dev_opp
>         pointer references.
> 
> o       This newly freed memory is allocated for some other structure
>         by CPU 3.  CPU 1 and CPU 3 are now trying to use the same
>         memory for two different structures, and nothing good can
>         possibly come of this.  The kernel dies a brutal and nasty
>         death.
> 
> One way to fix this is to have the caller do rcu_read_lock() before
> calling find_device_opp(), and to do rcu_read_unlock() only after the
> caller has finished using the pointer that find_device_opp() returns.
> This works well unless the caller needs to do some blocking operation
> before it gets done using the pointer.
> 
> Another approach is for find_device_opp() to use a reference count on
> the structure, and for opp_set_availability() to avoid freeing the
> structure unless/until the reference counter drops to zero.
> 
> There are other approaches as well, please feel free to take a look
> at Documentation/RCU/rcuref.txt for more info on using reference
> counting and RCU.
thx. I probably should read yet again if I got my understanding of usage 
right..
> 
[...]
>> +
>> +/**
>> + * opp_find_freq_exact() - search for an exact frequency
>> + * @dev:             device for which we do this operation
>> + * @freq:            frequency to search for
>> + * @is_available:    true/false - match for available opp
>> + *
>> + * Searches for exact match in the opp list and returns pointer to the matching
>> + * opp if found, else returns ERR_PTR in case of error and should be handled
>> + * using IS_ERR.
>> + *
>> + * Note: available is a modifier for the search. if available=true, then the
>> + * match is for exact matching frequency and is available in the stored OPP
>> + * table. if false, the match is for exact frequency which is not available.
>> + *
>> + * This provides a mechanism to enable an opp which is not available currently
>> + * or the opposite as well.
>> + *
>> + * Locking: RCU reader.
>> + */
>> +struct opp *opp_find_freq_exact(struct device *dev,
>> +                                  unsigned long freq, bool available)
>> +{
>> +     struct device_opp *dev_opp;
>> +     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
>> +
>> +     dev_opp = find_device_opp(dev);
>> +     if (IS_ERR(dev_opp))
>> +             return opp;
>> +
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
>> +             if (temp_opp->available == available &&
>> +                             temp_opp->rate == freq) {
>> +                     opp = temp_opp;
>> +                     break;
>> +             }
>> +     }
>> +     rcu_read_unlock();
> 
> But this one sadly has the same problem that find_device_opp() does.
is the concern about opp OR about dev_opp here? I am guessing opp..
> 
>> +     return opp;
>> +}
>> +
>> +/**
>> + * opp_find_freq_ceil() - Search for an rounded ceil freq
>> + * @dev:     device for which we do this operation
>> + * @freq:    Start frequency
>> + *
>> + * Search for the matching ceil *available* OPP from a starting freq
>> + * for a device.
>> + *
>> + * Returns matching *opp and refreshes *freq accordingly, else returns
>> + * ERR_PTR in case of error and should be handled using IS_ERR.
>> + *
>> + * Locking: RCU reader.
>> + */
>> +struct opp *opp_find_freq_ceil(struct device *dev, unsigned long *freq)
>> +{
>> +     struct device_opp *dev_opp;
>> +     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
>> +
>> +     if (!dev || !freq) {
>> +             pr_err("%s: invalid param dev=%p freq=%p\n", __func__,
>> +                             dev, freq);
>> +             return ERR_PTR(-EINVAL);
>> +     }
>> +
>> +     dev_opp = find_device_opp(dev);
>> +     if (IS_ERR(dev_opp))
>> +             return opp;
>> +
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
>> +             if (temp_opp->available && temp_opp->rate >= *freq) {
>> +                     opp = temp_opp;
>> +                     *freq = opp->rate;
>> +                     break;
>> +             }
>> +     }
>> +     rcu_read_unlock();
> 
> And this one also has the same problem that find_device_opp() does.
guessing opp ptr here.. am I right? if it is about device_opp, it is not 
going to be freed as I mentioned above - at least we dont give an 
function to update(hence free) it.
> 
>> +     return opp;
>> +}
>> +
>> +/**
>> + * opp_find_freq_floor() - Search for a rounded floor freq
>> + * @dev:     device for which we do this operation
>> + * @freq:    Start frequency
>> + *
>> + * Search for the matching floor *available* OPP from a starting freq
>> + * for a device.
>> + *
>> + * Returns matching *opp and refreshes *freq accordingly, else returns
>> + * ERR_PTR in case of error and should be handled using IS_ERR.
>> + *
>> + * Locking: RCU reader.
>> + */
>> +struct opp *opp_find_freq_floor(struct device *dev, unsigned long *freq)
>> +{
>> +     struct device_opp *dev_opp;
>> +     struct opp *temp_opp, *opp = ERR_PTR(-ENODEV);
>> +
>> +     if (!dev || !freq) {
>> +             pr_err("%s: invalid param dev=%p freq=%p\n", __func__,
>> +                             dev, freq);
>> +             return ERR_PTR(-EINVAL);
>> +     }
>> +
>> +     dev_opp = find_device_opp(dev);
>> +     if (IS_ERR(dev_opp))
>> +             return opp;
>> +
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(temp_opp, &dev_opp->opp_list, node) {
>> +             if (temp_opp->available) {
>> +                     /* go to the next node, before choosing prev */
>> +                     if (temp_opp->rate > *freq)
>> +                             break;
>> +                     else
>> +                             opp = temp_opp;
>> +             }
>> +     }
>> +     if (!IS_ERR(opp))
>> +             *freq = opp->rate;
>> +     rcu_read_unlock();
> 
> As does this one.
guessing opp ptr here.. am I right?
> 
>> +     return opp;
>> +}
>> +
>> +/**
>> + * opp_add()  - Add an OPP table from a table definitions
>> + * @dev:     device for which we do this operation
>> + * @freq:    Frequency in Hz for this OPP
>> + * @u_volt:  Voltage in uVolts for this OPP
>> + *
>> + * This function adds an opp definition to the opp list and returns status.
>> + * The opp is made available by default and it can be controlled using
>> + * opp_enable/disable functions.
>> + *
>> + * Locking: RCU, mutex
>> + */
>> +int opp_add(struct device *dev, unsigned long freq, unsigned long u_volt)
>> +{
>> +     struct device_opp *tmp_dev_opp, *dev_opp = NULL;
>> +     struct opp *opp, *new_opp;
>> +     struct list_head *head;
>> +
>> +     /* Check for existing list for 'dev' */
>> +     rcu_read_lock();
>> +     list_for_each_entry_rcu(tmp_dev_opp, &dev_opp_list, node) {
>> +             if (dev == tmp_dev_opp->dev) {
>> +                     dev_opp = tmp_dev_opp;
>> +                     break;
>> +             }
>> +     }
>> +     rcu_read_unlock();
>> +
>> +     /* allocate new OPP node */
>> +     new_opp = kzalloc(sizeof(struct opp), GFP_KERNEL);
>> +     if (!new_opp) {
>> +             pr_warning("%s: unable to allocate new opp node\n",
>> +                     __func__);
>> +             return -ENOMEM;
>> +     }
>> +
>> +     if (!dev_opp) {
>> +             /* Allocate a new device OPP table */
>> +             dev_opp = kzalloc(sizeof(struct device_opp), GFP_KERNEL);
>> +             if (!dev_opp) {
>> +                     kfree(new_opp);
>> +                     pr_warning("%s: unable to allocate device structure\n",
>> +                             __func__);
>> +                     return -ENOMEM;
>> +             }
>> +
>> +             dev_opp->dev = dev;
>> +             INIT_LIST_HEAD(&dev_opp->opp_list);
>> +             mutex_init(&dev_opp->lock);
>> +
>> +             /* Secure the device list modification */
>> +             mutex_lock(&dev_opp_list_lock);
>> +             list_add_rcu(&dev_opp->node, &dev_opp_list);
>> +             mutex_unlock(&dev_opp_list_lock);
>> +             synchronize_rcu();
> 
> You do not need to wait for an RCU grace period when adding objects, only
> between removing them and freeing them.
ouch.. my bad.. thx.. will fix
> 
>> +     }
>> +
>> +     /* populate the opp table */
>> +     new_opp->dev_opp = dev_opp;
>> +     new_opp->rate = freq;
>> +     new_opp->u_volt = u_volt;
>> +     new_opp->available = true;
>> +
>> +     /* make the dev_opp modification safe */
>> +     mutex_lock(&dev_opp->lock);
>> +
>> +     rcu_read_lock();
>> +     /* Insert new OPP in order of increasing frequency */
>> +     head = &dev_opp->opp_list;
>> +     list_for_each_entry_rcu(opp, &dev_opp->opp_list, node) {
>> +             if (new_opp->rate < opp->rate)
>> +                     break;
>> +             else
>> +                     head = &opp->node;
>> +     }
>> +     rcu_read_unlock();
>> +
>> +     list_add_rcu(&new_opp->node, head);
>> +     mutex_unlock(&dev_opp->lock);
>> +     synchronize_rcu();
> 
> Ditto.
thx.. will fix.
> 
>> +     return 0;
>> +}
>> +
>> +/**
>> + * opp_set_availability() - helper to set the availability of an opp
>> + * @opp:             Pointer to opp
>> + * @availability_req:        availability status requested for this opp
>> + *
>> + * Set the availability of an OPP with an RCU operation, opp_{enable,disable}
>> + * share a common logic which is isolated here.
>> + *
>> + * Returns -EINVAL for bad pointers, -ENOMEM if no memory available for the
>> + * copy operation, returns 0 if no modifcation was done OR modification was
>> + * successful.
>> + */
>> +static int opp_set_availability(struct opp *opp, bool availability_req)
>> +{
>> +     struct opp *new_opp, *tmp_opp;
>> +     bool is_available;
>> +
>> +     if (unlikely(!opp || IS_ERR(opp))) {
>> +             pr_err("%s: Invalid parameters being passed\n", __func__);
>> +             return -EINVAL;
>> +     }
>> +
>> +     new_opp = kmalloc(sizeof(struct opp), GFP_KERNEL);
>> +     if (!new_opp) {
>> +             pr_warning("%s: unable to allocate opp\n", __func__);
>> +             return -ENOMEM;
>> +     }
>> +
>> +     mutex_lock(&opp->dev_opp->lock);
>> +
>> +     rcu_read_lock();
>> +     tmp_opp = rcu_dereference(opp);
>> +     is_available = tmp_opp->available;
>> +     rcu_read_unlock();
>> +
>> +     /* Is update really needed? */
>> +     if (is_available == availability_req) {
>> +             mutex_unlock(&opp->dev_opp->lock);
>> +             kfree(tmp_opp);
>> +             return 0;
>> +     }
>> +
>> +     *new_opp = *opp;
>> +     new_opp->available = availability_req;
>> +     list_replace_rcu(&opp->node, &new_opp->node);
>> +
>> +     mutex_unlock(&opp->dev_opp->lock);
>> +     synchronize_rcu();
> 
> If you decide to rely on reference counts to fix the problem in
> find_device_opp(), you will need to check the reference counts here.
> Again, please see Documentation/RCU/rcuref.txt.
Does the original point about not needing to free dev_opp resolve this?
> 
>> +     kfree(opp);
>> +
>> +     return 0;
>> +}
>> +
>> +/**
>> + * opp_enable() - Enable a specific OPP
>> + * @opp:     Pointer to opp
>> + *
>> + * Enables a provided opp. If the operation is valid, this returns 0, else the
>> + * corresponding error value. It is meant to be used for users an OPP available
>> + * after being temporarily made unavailable with opp_disable.
>> + *
>> + * Locking: RCU, mutex
> 
> By "Locking: RCU", you presumably don't mean that the caller must do
> an rcu_read_lock() -- this would result in a synchronize_rcu() being
> invoked in an RCU read-side critical section, which is illegal.
aye..thx. I will make it more verbose. Does the following sound right?
Locking used internally: RCU copy-update and read_lock used, mutex
and for the readers:
Locking used internally: RCU read_lock used
or do we need to go all verbatim about the implementation here?
I intended the user to know the context in which they can call it, for 
example, since mutex is used, dont think of using this in interrupt 
context. since read_locks are already used, dont need to double lock 
it.. opp library takes care of it's own exclusivity.
> 
>> + */
>> +int opp_enable(struct opp *opp)
>> +{
>> +     return opp_set_availability(opp, true);
>> +}
>> +
>> +/**
>> + * opp_disable() - Disable a specific OPP
>> + * @opp:     Pointer to opp
>> + *
>> + * Disables a provided opp. If the operation is valid, this returns
>> + * 0, else the corresponding error value. It is meant to be a temporary
>> + * control by users to make this OPP not available until the circumstances are
>> + * right to make it available again (with a call to opp_enable).
>> + *
>> + * Locking: RCU, mutex
> 
> Ditto.  (And similar feedback applies elsewhere.)
> 
>> + */
>> +int opp_disable(struct opp *opp)
>> +{
>> +     return opp_set_availability(opp, false);
>> +}
>> +
>> +#ifdef CONFIG_CPU_FREQ
>> +/**
>> + * opp_init_cpufreq_table() - create a cpufreq table for a device
>> + * @dev:     device for which we do this operation
>> + * @table:   Cpufreq table returned back to caller
>> + *
>> + * Generate a cpufreq table for a provided device- this assumes that the
>> + * opp list is already initialized and ready for usage.
>> + *
>> + * This function allocates required memory for the cpufreq table. It is
>> + * expected that the caller does the required maintenance such as freeing
>> + * the table as required.
>> + *
>> + * WARNING: It is  important for the callers to ensure refreshing their copy of
>> + * the table if any of the mentioned functions have been invoked in the interim.
>> + *
>> + * Locking: RCU reader
>> + */
>> +void opp_init_cpufreq_table(struct device *dev,
>> +                         struct cpufreq_frequency_table **table)
>> +{
>> +     struct device_opp *dev_opp;
>> +     struct opp *opp;
>> +     struct cpufreq_frequency_table *freq_table;
>> +     int i = 0;
>> +
>> +     dev_opp = find_device_opp(dev);
>> +     if (IS_ERR(dev_opp)) {
>> +             pr_warning("%s: unable to find device\n", __func__);
>> +             return;
>> +     }
>> +
>> +     freq_table = kzalloc(sizeof(struct cpufreq_frequency_table) *
>> +                          (opp_get_opp_count(dev) + 1), GFP_ATOMIC);
>> +     if (!freq_table) {
>> +             pr_warning("%s: failed to allocate frequency table\n",
>> +                        __func__);
>> +             return;
> 
> How does the caller tell that the allocation failed?  Should the caller
> set the pointer passed in through the "table" argument to NULL before
> calling this function?  Or should this function set *table to NULL
> before returning in this case?
Good catch. Thanks. I would rather change the return to int and pass 
proper errors to caller so that they can handle it appropriately.
[...]
-- 
Regards,
Nishanth Menon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
