lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 17 Nov 2022 17:53:11 +0100
From:   Maxime Ripard <maxime@...no.tech>
To:     Daniel Vetter <daniel.vetter@...el.com>,
        David Airlie <airlied@...ux.ie>,
        Maarten Lankhorst <maarten.lankhorst@...ux.intel.com>,
        Thomas Zimmermann <tzimmermann@...e.de>,
        Maxime Ripard <maxime@...no.tech>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: DRM-managed resources / devm_drm_dev_alloc leaking resources

Hi,

After trying to get more kunit tests for KMS, I found out that the
recent kunit helpers we merged to create a DRM device [1] are broken and
won't free their device-managed and DRM-managed resources.

With some help from Thomas, we've dug into this and it turns out that if
we allocate a device with root_device_register, initialise our drm
device with devm_drm_dev_alloc(), register it using drm_dev_register(),
unregister it using drm_dev_unregister/drm_dev_unplug and then remove
the parent device, neither the device managed nor the DRM managed
actions are run.

root_device_register initializes the device by eventually calling
device_initialize() which sets the initial reference count of the root
device to 1 [2]. devm_drm_dev_alloc() then comes in, drm_dev_init() will
increase the root device refcount [3] and initialize our DRM device to 1
[4]. drm_dev_register(), through drm_minor_register() and device_add(),
will increase the root device refcount [5].

When unrolling things, drm_dev_unregister(), through
drm_minor_unregister() and device_del(), will give up its reference [6].
root_device_unregister(), through device_unregister(), will also give up
its own [7].

So we end up with this for the reference counts:

+------------------------+-------------+------------+
|         funcs          | root device | DRM device |
+------------------------+-------------+------------+
| root_device_register   |           1 | N/A        |
| devm_drm_dev_alloc     |           2 | 1          |
| drm_dev_register       |           3 | 1          |
| drm_dev_unregister     |           2 | 1          |
| root_device_unregister |           1 | 1          |
+------------------------+-------------+------------+

If we go back to the list of reference taken, the root device reference
and the initial drm_device reference, both taken by devm_drm_dev_alloc()
through drm_dev_init(), haven't been put back.

If we look at the drm_dev_init code(), we can see that it sets up a
DRM-managed action [8] that will put back the device reference [9]. The
DRM-managed code is executed by the drm_managed_cleanup() function, that
is executed as part of a release hook [10] executed once we give up the
final reference to the DRM device [11].

If we go back a little, the final reference to the DRM device is
actually the initial one setup by devm_drm_dev_alloc(). This function
has superseded drm_dev_alloc(), with the documentation that we do need a
final drm_dev_put() to put back our final reference [12].

devm_drm_dev_alloc() is a more convenient variant that has been
introduced explicitly to not require that drm_dev_put(), and states it
as such in the documentation [13]. It does so by adding a device-managed
action that will call drm_dev_put() [14].

Device-managed actions are ran as part devres_release_all() that is
called by device_release() [15], itself being run when the last
reference on the device is put back [16][17][18].

So if we sum things up, the DRM device will only give its last root
device reference when the last DRM device reference will be put back,
and the last DRM device reference will be put back when the last device
reference will be put back, which sounds very circular to me, with both
ending up in a deadlock scenario.

I've added two kunit tests that demonstrate the issue: we register a
device, allocate and register a DRM device, register a DRM-managed
action, remove the DRM device and the parent device, and wait for the
action to execute. drm_register_unregister_with_devm_test() uses the
broken(?) devm_drm_dev_alloc and is failing.
drm_register_unregister_test uses the deprecated drm_dev_alloc() that
requires an explicit call to drm_dev_put() which works fine.

It's also worth noting that Thomas tested with simpledrm and it seems to
work fine. Using a platform_device instead of the root_device doesn't
change anything to the outcome in my tests, so there might be a more
subtle behaviour involved.

Thanks,
Maxime

--------- 8< -----------
diff --git a/drivers/gpu/drm/tests/drm_register_test.c b/drivers/gpu/drm/tests/drm_register_test.c
new file mode 100644
index 000000000000..3256b53d08f2
--- /dev/null
+++ b/drivers/gpu/drm/tests/drm_register_test.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <drm/drm_drv.h>
+#include <drm/drm_managed.h>
+
+#include <kunit/resource.h>
+
+#include <linux/device.h>
+#include <linux/platform_device.h>
+
+#include "drm_kunit_helpers.h"
+
+struct test_priv {
+	bool release_done;
+	wait_queue_head_t release_wq;
+};
+
+static const struct drm_mode_config_funcs drm_mode_config_funcs = {
+};
+
+static const struct drm_driver drm_driver = {
+	.driver_features = DRIVER_MODESET,
+};
+
+static void drm_release(struct drm_device *drm, void *ptr)
+{
+	struct test_priv *priv = ptr;
+
+	priv->release_done = true;
+	wake_up_interruptible(&priv->release_wq);
+}
+
+#define RELEASE_TIMEOUT_MS	500
+
+static void drm_register_unregister_test(struct kunit *test)
+{
+	struct test_priv *priv;
+	struct drm_device *drm;
+	struct device *dev;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	dev = root_device_register("test");
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, dev);
+
+	drm = drm_dev_alloc(&drm_driver, dev);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, drm);
+
+	drm->mode_config.funcs = &drm_mode_config_funcs;
+	ret = drmm_mode_config_init(drm);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = drmm_add_action_or_reset(drm, drm_release, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = drm_dev_register(drm, 0);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	drm_dev_unregister(drm);
+	drm_dev_put(drm);
+	root_device_unregister(dev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static void drm_register_unregister_with_devm_test(struct kunit *test)
+{
+	struct test_priv *priv;
+	struct drm_device *drm;
+	struct device *dev;
+	int ret;
+
+	priv = kunit_kzalloc(test, sizeof(*priv), GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, priv);
+	init_waitqueue_head(&priv->release_wq);
+
+	dev = root_device_register("test");
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, dev);
+
+	drm = __devm_drm_dev_alloc(dev, &drm_driver, sizeof(*drm), 0);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, drm);
+
+	drm->mode_config.funcs = &drm_mode_config_funcs;
+	ret = drmm_mode_config_init(drm);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = drmm_add_action_or_reset(drm, drm_release, priv);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ret = drm_dev_register(drm, 0);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	drm_dev_unregister(drm);
+	root_device_unregister(dev);
+
+	ret = wait_event_interruptible_timeout(priv->release_wq, priv->release_done,
+					       msecs_to_jiffies(RELEASE_TIMEOUT_MS));
+	KUNIT_EXPECT_GT(test, ret, 0);
+}
+
+static struct kunit_case drm_register_tests[] = {
+	KUNIT_CASE(drm_register_unregister_test),
+	KUNIT_CASE(drm_register_unregister_with_devm_test),
+	{}
+};
+
+static struct kunit_suite drm_register_test_suite = {
+	.name = "drm-test-register",
+	.test_cases = drm_register_tests
+};
+
+kunit_test_suite(drm_register_test_suite);
--------- 8< -----------

1: https://cgit.freedesktop.org/drm-misc/tree/drivers/gpu/drm/tests/drm_kunit_helpers.c
2: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L2979
3: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L597
4: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L596
5: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L3437
6: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L201
7: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L3737
8: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L618
9: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L570
10: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L751
11: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L785
12: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L259
13: https://elixir.bootlin.com/linux/latest/source/include/drm/drm_drv.h#L505
14: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_drv.c#L682
15: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L2321
16: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L2357
17: https://elixir.bootlin.com/linux/latest/source/lib/kobject.c#L721
18: https://elixir.bootlin.com/linux/latest/source/lib/kobject.c#L647

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)

Powered by blists - more mailing lists