lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Jan 2021 08:04:03 +0100
From:   Marek Szyprowski <m.szyprowski@...sung.com>
To:     Saravana Kannan <saravanak@...gle.com>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Android Kernel Team <kernel-team@...roid.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Jisheng Zhang <Jisheng.Zhang@...aptics.com>,
        Kevin Hilman <khilman@...libre.com>,
        John Stultz <john.stultz@...aro.org>,
        Nicolas Saenz Julienne <nsaenzjulienne@...e.de>,
        Marc Zyngier <maz@...nel.org>,
        Linux Samsung SOC <linux-samsung-soc@...r.kernel.org>,
        Krzysztof Kozlowski <krzk@...nel.org>,
        Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>
Subject: Re: [PATCH v1 5/5] driver core: Set fw_devlink=on by default

Hi Saravana,

On 12.01.2021 21:51, Saravana Kannan wrote:
> On Mon, Jan 11, 2021 at 11:11 PM Marek Szyprowski
> <m.szyprowski@...sung.com> wrote:
>> On 11.01.2021 22:47, Saravana Kannan wrote:
>>> On Mon, Jan 11, 2021 at 6:18 AM Marek Szyprowski
>>> <m.szyprowski@...sung.com> wrote:
>>>> On 11.01.2021 12:12, Marek Szyprowski wrote:
>>>>> On 18.12.2020 04:17, Saravana Kannan wrote:
>>>>>> Cyclic dependencies in some firmware was one of the last remaining
>>>>>> reasons fw_devlink=on couldn't be set by default. Now that cyclic
>>>>>> dependencies don't block probing, set fw_devlink=on by default.
>>>>>>
>>>>>> Setting fw_devlink=on by default brings a bunch of benefits (currently,
>>>>>> only for systems with device tree firmware):
>>>>>> * Significantly cuts down deferred probes.
>>>>>> * Device probe is effectively attempted in graph order.
>>>>>> * Makes it much easier to load drivers as modules without having to
>>>>>>      worry about functional dependencies between modules (depmod is still
>>>>>>      needed for symbol dependencies).
>>>>>>
>>>>>> If this patch prevents some devices from probing, it's very likely due
>>>>>> to the system having one or more device drivers that "probe"/set up a
>>>>>> device (DT node with compatible property) without creating a struct
>>>>>> device for it.  If we hit such cases, the device drivers need to be
>>>>>> fixed so that they populate struct devices and probe them like normal
>>>>>> device drivers so that the driver core is aware of the devices and their
>>>>>> status. See [1] for an example of such a case.
>>>>>>
>>>>>> [1] -
>>>>>> https://lore.kernel.org/lkml/CAGETcx9PiX==mLxB9PO8Myyk6u2vhPVwTMsA5NkD-ywH5xhusw@mail.gmail.com/
>>>>>> Signed-off-by: Saravana Kannan <saravanak@...gle.com>
>>>>> This patch landed recently in linux next-20210111 as commit
>>>>> e590474768f1 ("driver core: Set fw_devlink=on by default"). Sadly it
>>>>> breaks Exynos IOMMU operation, what causes lots of devices being
>>>>> deferred and not probed at all. I've briefly checked and noticed that
>>>>> exynos_sysmmu_probe() is never called after this patch. This is really
>>>>> strange for me, as the SYSMMU controllers on Exynos platform are
>>>>> regular platform devices registered by the OF code. The driver code is
>>>>> here: drivers/iommu/exynos-iommu.c, example dts:
>>>>> arch/arm/boot/dts/exynos3250.dtsi (compatible = "samsung,exynos-sysmmu").
>>>> Okay, I found the source of this problem. It is caused by Exynos power
>>>> domain driver, which is not platform driver yet. I will post a patch,
>>>> which converts it to the platform driver.
>>> Thanks Marek! Hopefully the debug logs I added were sufficient to
>>> figure out the reason.
>> Frankly, it took me a while to figure out that device core waits for the
>> power domain devices. Maybe it would be possible to add some more debug
>> messages or hints? Like the reason of the deferred probe in
>> /sys/kernel/debug/devices_deferred ?
> There's already a /sys/devices/.../<device>/waiting_for_supplier file
> that tells you if the device is waiting for a supplier device to be
> added. That file goes away once the device probes. If the file has 1,
> then it's waiting for the supplier device to be added (like your
> case). If it's 0, then the device is just waiting on one of the
> existing suppliers to probe. You can find the existing suppliers
> through /sys/devices/.../<device>/supplier:*/supplier. Also, flip
> these dev_dbg() to dev_info() if you need more details about deferred
> probing.

Frankly speaking I doubt that anyone will find those. Even experienced 
developer might need some time to figure it out.

I expect that such information will be at least in the mentioned 
/sys/kernel/debug/devices_deferred file. We already have infrastructure 
for putting the deferred probe reason there, see dev_err_probe() 
function. Even such a simple change makes the debugging this issue much 
easier:

diff --git a/drivers/base/core.c b/drivers/base/core.c
index cd8e518fadd6..ceb5aed5a84c 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -937,12 +937,13 @@ int device_links_check_suppliers(struct device *dev)
         mutex_lock(&fwnode_link_lock);
         if (dev->fwnode && !list_empty(&dev->fwnode->suppliers) &&
             !fw_devlink_is_permissive()) {
-               dev_dbg(dev, "probe deferral - wait for supplier %pfwP\n",
+               ret = dev_err_probe(dev, -EPROBE_DEFER,
+                       "probe deferral - wait for supplier %pfwP\n",
list_first_entry(&dev->fwnode->suppliers,
                         struct fwnode_link,
                         c_hook)->supplier);
                 mutex_unlock(&fwnode_link_lock);
-               return -EPROBE_DEFER;
+               return ret;
         }
         mutex_unlock(&fwnode_link_lock);

@@ -955,9 +956,9 @@ int device_links_check_suppliers(struct device *dev)
                 if (link->status != DL_STATE_AVAILABLE &&
                     !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
                         device_links_missing_supplier(dev);
-                       dev_dbg(dev, "probe deferral - supplier %s not 
ready\n",
+                       ret = dev_err_probe(dev, -EPROBE_DEFER,
+                               "probe deferral - supplier %s not ready\n",
                                 dev_name(link->supplier));
-                       ret = -EPROBE_DEFER;
                         break;
                 }
                 WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);


After such change:

# cat /sys/kernet/debug/devices_deferred
sound
13620000.sysmmu platform: probe deferral - supplier 
10023c40.power-domain not ready
13630000.sysmmu platform: probe deferral - supplier 
10023c40.power-domain not ready
12e20000.sysmmu platform: probe deferral - supplier 
10023c20.power-domain not ready
11a20000.sysmmu platform: probe deferral - supplier 
10023c00.power-domain not ready
11a30000.sysmmu platform: probe deferral - supplier 
10023c00.power-domain not ready
11a40000.sysmmu platform: probe deferral - supplier 
10023c00.power-domain not ready
11a50000.sysmmu platform: probe deferral - supplier 
10023c00.power-domain not ready
11a60000.sysmmu platform: probe deferral - supplier 
10023c00.power-domain not ready
11e20000.sysmmu platform: probe deferral - supplier 
10023c80.power-domain not ready
12d00000.hdmi   platform: probe deferral - supplier 
10023c20.power-domain not ready
10048000.clock-controller       platform: probe deferral - supplier 
10023ca0.power-domain not ready
12260000.sysmmu platform: probe deferral - supplier 
10048000.clock-controller not ready
12270000.sysmmu platform: probe deferral - supplier 
10048000.clock-controller not ready
122a0000.sysmmu platform: probe deferral - supplier 
10048000.clock-controller not ready
122b0000.sysmmu platform: probe deferral - supplier 
10048000.clock-controller not ready
123b0000.sysmmu platform: probe deferral - supplier 
10048000.clock-controller not ready
123c0000.sysmmu platform: probe deferral - supplier 
10048000.clock-controller not ready
12c10000.mixer  platform: probe deferral - supplier 
10023c20.power-domain not ready
13000000.gpu    platform: probe deferral - supplier 
10023c60.power-domain not ready

Probably the message can be adjusted a bit, this would significantly 
help me finding that is the source of the problem.

Best regards

-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Powered by blists - more mailing lists