lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c465967-3bbf-595b-b61c-c44fa95e41d5@suse.com>
Date:   Tue, 17 Dec 2019 14:51:47 +0100
From:   Jürgen Groß <jgross@...e.com>
To:     SeongJae Park <sjpark@...zon.com>,
        Roger Pau Monné <roger.pau@...rix.com>
Cc:     SeongJae Park <sj38.park@...il.com>, sjpark@...zon.de,
        axboe@...nel.dk, konrad.wilk@...cle.com, pdurrant@...zon.com,
        linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
        xen-devel@...ts.xenproject.org
Subject: Re: [Xen-devel] [PATCH v10 2/4] xen/blkback: Squeeze page pools if a
 memory pressure is detected

On 17.12.19 14:15, SeongJae Park wrote:
> On Tue, 17 Dec 2019 12:39:15 +0100 "Roger Pau Monné" <roger.pau@...rix.com> wrote:
> 
>> On Mon, Dec 16, 2019 at 08:48:03PM +0100, SeongJae Park wrote:
>>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote:
>>>
>>>> On 16.12.19 17:15, SeongJae Park wrote:
>>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park <sjpark@...zon.com> wrote:
>>>>>
>>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park <sjpark@...zon.com> wrote:
>>>>>>
>>>>>>> From: SeongJae Park <sjpark@...zon.de>
>>>>>>>
>>>>> [...]
>>>>>>> --- a/drivers/block/xen-blkback/xenbus.c
>>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c
>>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev,
>>>>>>>    }
>>>>>>>    
>>>>>>>    
>>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for a while. */
>>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10;
>>>>>>> +module_param_named(buffer_squeeze_duration_ms,
>>>>>>> +		buffer_squeeze_duration_ms, int, 0644);
>>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms,
>>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is detected");
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Callback received when the memory pressure is detected.
>>>>>>> + */
>>>>>>> +static void reclaim_memory(struct xenbus_device *dev)
>>>>>>> +{
>>>>>>> +	struct backend_info *be = dev_get_drvdata(&dev->dev);
>>>>>>> +
>>>>>>> +	be->blkif->buffer_squeeze_end = jiffies +
>>>>>>> +		msecs_to_jiffies(buffer_squeeze_duration_ms);
>>>>>>
>>>>>> This callback might race with 'xen_blkbk_probe()'.  The race could result in
>>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it links
>>>>>> 'be' to the 'dev'.  Please _don't merge_ this patch now!
>>>>>>
>>>>>> I will do more test and share results.  Meanwhile, if you have any opinion,
>>>>>> please let me know.
>>>
>>> I reduced system memory and attached bunch of devices in short time so that
>>> memory pressure occurs while device attachments are ongoing.  Under this
>>> circumstance, I was able to see the race.
>>>
>>>>>
>>>>> Not only '->blkif', but 'be' itself also coule be a NULL.  As similar
>>>>> concurrency issues could be in other drivers in their way, I suggest to change
>>>>> the reclaim callback ('->reclaim_memory') to be called for each driver instead
>>>>> of each device.  Then, each driver could be able to deal with its concurrency
>>>>> issues by itself.
>>>>
>>>> Hmm, I don't like that. This would need to be changed back in case we
>>>> add per-guest quota.
>>>
>>> Extending this callback in that way would be still not too hard.  We could use
>>> the argument to the callback.  I would keep the argument of the callback to
>>> 'struct device *' as is, and will add a comment saying 'NULL' value of the
>>> argument means every devices.  As an example, xenbus would pass NULL-ending
>>> array of the device pointers that need to free its resources.
>>>
>>> After seeing this race, I am now also thinking it could be better to delegate
>>> detailed control of each device to its driver, as some drivers have some
>>> complicated and unique relation with its devices.
>>>
>>>>
>>>> Wouldn't a get_device() before calling the callback and a put_device()
>>>> afterwards avoid that problem?
>>>
>>> I didn't used the reference count manipulation operations because other similar
>>> parts also didn't.  But, if there is no implicit reference count guarantee, it
>>> seems those operations are indeed necessary.
>>>
>>> That said, as get/put operations only adjust the reference count, those will
>>> not make the callback to wait until the linking of the 'backend' and 'blkif' to
>>> the device (xen_blkbk_probe()) is finished.  Thus, the race could still happen.
>>> Or, am I missing something?
>>
>> I would expect the device is not added to the list of backend devices
>> until the probe hook has finished with a non-error return code. Ie:
>> bus_for_each_dev should _not_ iterate over devices for which the probe
>> function hasn't been run to competition without errors.
>>
>> The same way I would expect the remove hook to first remove the device
>> from the list of backend devices and then run the remove hook.
>>
>> blkback uses an ad-hoc reference counting mechanism, but if the above
>> assumptions are true I think it would be enough to take an extra
>> reference in xen_blkbk_probe and drop it in xen_blkbk_remove.
> 
> Well, if the assumption is true, wouldn't the Juergen's approach solved the
> problem?  As previously said, I tried the approach but failed to solve this
> race.  The assumption is wrong or I missed something.  I think Juergen also
> think the assumption is not true as he suggested use of locking but not sure.
> Juergen, if I misunderstood, please let me know.

bus_for_each_dev() does no locking at all. All it does is
taking krefs on the iterated objects in order to avoid them
to be freed under its feet.


Juergen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ