lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 8 Aug 2022 03:26:10 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Michael Walle <michael@...le.cc>
Cc:     daniel.lezcano@...exp.org, abailon@...libre.com,
        anarsoul@...il.com, baolin.wang7@...il.com,
        bjorn.andersson@...aro.org, broonie@...nel.org,
        damien.lemoal@...nsource.wdc.com, daniel.lezcano@...aro.org,
        digetx@...il.com, f.fainelli@...il.com, glaroque@...libre.com,
        hayashi.kunihiko@...ionext.com, heiko@...ech.de, j-keerthy@...com,
        jonathanh@...dia.com, khilman@...libre.com,
        linux-kernel@...r.kernel.org, linux-pm@...r.kernel.org,
        lukasz.luba@....com, matthias.bgg@...il.com,
        mcoquelin.stm32@...il.com, mhiramat@...nel.org,
        miquel.raynal@...tlin.com, niklas.soderlund@...natech.se,
        rafael@...nel.org, rui.zhang@...el.com, shawnguo@...nel.org,
        talel@...zon.com, thierry.reding@...il.com, tiny.windzz@...il.com
Subject: Re: [PATCH v5 00/33] New thermal OF code

On Mon, Aug 08, 2022 at 11:42:16AM +0200, Michael Walle wrote:
> Hi,
> 
> > The following changes are depending on:
> > 
> >  - 20220722200007.1839356-1-daniel.lezcano@...exp.org
> > 
> > which are present in the thermal/linux-next branch:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/log/?h=thermal/linux-next
> > 
> > The series introduces a new thermal OF code. The patch description gives
> > a detailed explanation of the changes. Basically we write new OF parsing
> > functions, we migrate all the users of the old thermal OF API to the new
> > one and then we finish by removing the old OF code.
> > 
> > That is the second step to rework the thermal OF code. More patches will
> > come after that to remove the duplication of the trip definitions in the
> > different drivers which will result in more code duplication removed and
> > consolidation of the core thermal framework.
> > 
> > Thanks for those who tested the series on their platform and
> > investigated the regression with the disabled by default thermal zones.
> 
> I haven't looked closely yet, but this series is breaking two of my
> boards.
> 
> There seems to be one mistake within the new thermal code:
> 
> [    2.030452] thermal_sys: Failed to find 'trips' node
> [    2.033664] usb 1-1: new high-speed USB device number 2 using xhci-hcd
> [    2.035434] thermal_sys: Failed to find trip points for tmu id=2
> [    2.048010] qoriq_thermal 1f80000.tmu: Failed to register sensors
> [    2.054128] qoriq_thermal: probe of 1f80000.tmu failed with error -22
> [    2.060607] devm_thermal_of_zone_release:707 res=ffff002002377180
> [    2.067044] Unable to handle kernel paging request at virtual address 01adadadadadad88
> [    2.075003] Mem abort info:
> [    2.077805]   ESR = 0x0000000096000004
> [    2.081562]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    2.086893]   SET = 0, FnV = 0
> [    2.089955]   EA = 0, S1PTW = 0
> [    2.093100]   FSC = 0x04: level 0 translation fault
> [    2.097993] Data abort info:
> [    2.100876]   ISV = 0, ISS = 0x00000004
> [    2.104724]   CM = 0, WnR = 0
> [    2.107698] [01adadadadadad88] address between user and kernel address ranges
> [    2.114863] Internal error: Oops: 96000004 [#1] SMP
> [    2.119754] Modules linked in:
> [    2.122815] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.19.0-next-20220808-00078-ga957a15f74fc-dirty #1694
> [    2.132504] Hardware name: Kontron KBox A-230-LS (DT)
> [    2.137568] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    2.144554] pc : kfree+0x5c/0x3c0
> [    2.147885] lr : thermal_of_zone_unregister+0x34/0x54
> [    2.152954] sp : ffff80000a22bab0
> [    2.156274] x29: ffff80000a22bab0 x28: 0000000000000000 x27: ffff800009960464
> [    2.163438] x26: ffff800009a16960 x25: 0000000000000006 x24: ffff800009f09a40
> [    2.170601] x23: ffff800009ab9008 x22: ffff800008d0d684 x21: 01adadadadadad80
> [    2.177763] x20: 6b6b6b6b6b6b6b6b x19: ffff002002335000 x18: 00000000fffffffb
> [    2.184925] x17: ffff800008d0d67c x16: ffff800008d072b4 x15: ffff800008d0c6c4
> [    2.192087] x14: ffff800008d0c34c x13: ffff8000088d5034 x12: ffff8000088d46d4
> [    2.199248] x11: ffff8000088d4624 x10: 0000000000000000 x9 : ffff800008d0d684
> [    2.206410] x8 : ffff002000b1a158 x7 : bbbbbbbbbbbbbbbb x6 : ffff80000a0f53b8
> [    2.213572] x5 : ffff80000a22b940 x4 : 0000000000000000 x3 : 0000000000000000
> [    2.220733] x2 : fffffc0000000000 x1 : ffff002000838040 x0 : 01adb1adadadad80
> [    2.227895] Call trace:
> [    2.230342]  kfree+0x5c/0x3c0
> [    2.233318]  thermal_of_zone_unregister+0x34/0x54
> [    2.238036]  devm_thermal_of_zone_release+0x44/0x54
> [    2.242931]  release_nodes+0x64/0xd0
> [    2.246516]  devres_release_all+0xbc/0x350
> [    2.250623]  device_unbind_cleanup+0x20/0x70
> [    2.254905]  really_probe+0x1a0/0x2e4
> [    2.258577]  __driver_probe_device+0x80/0xec
> [    2.262859]  driver_probe_device+0x44/0x130
> [    2.267055]  __driver_attach+0x104/0x1b4
> [    2.270989]  bus_for_each_dev+0x7c/0xe0
> [    2.274834]  driver_attach+0x30/0x40
> [    2.278418]  bus_add_driver+0x160/0x210
> [    2.281900] hub 1-1:1.0: USB hub found
> [    2.282264]  driver_register+0x84/0x140
> [    2.286109] hub 1-1:1.0: 7 ports detected
> [    2.289859]  __platform_driver_register+0x34/0x40
> [    2.289867]  qoriq_tmu_init+0x28/0x34
> [    2.302258]  do_one_initcall+0x50/0x250
> [    2.306104]  kernel_init_freeable+0x278/0x31c
> [    2.310474]  kernel_init+0x30/0x140
> [    2.313972]  ret_from_fork+0x10/0x20
> [    2.317559] Code: b25657e2 d34cfc00 d37ae400 8b020015 (f94006a1) 
> [    2.323672] ---[ end trace 0000000000000000 ]---
> [    2.328317] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    2.335999] SMP: stopping secondary CPUs
> [    2.339932] Kernel Offset: disabled
> [    2.343425] CPU features: 0x2000,0800f021,00001086
> [    2.348229] Memory Limit: none
> [    2.351289] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
> 
> This was seen a sl28 board
> (arch/arm64/boot/dts/freescale/fsl-ls1028a-kontron-kbox-a-230-ls.dts).
> The same board in the KernelCI also have some more information:
> https://lavalab.kontron.com/scheduler/job/151900#L1162
> 
> But I guess even if that is fixed, the driver will not probe due to the
> missing trip points? Are they now mandatory? Does it mean we'd need to
> update our device trees? But that will then mean older devices trees
> don't work anymore.

It would also mean that all hwmon drivers registering a thermal zone sensor
would fail to register unless such a thermal zone actually exists. This
would make the whole concept of having the hwmon core register thermal
zone sensors impossible. I have no idea how this is expected to work now,
but there is an apparent flaw in the logic. That means I withdraw my
Acked-by: for the hwmon patches in this series until it is guaranteed
that hwmon registration does not fail as above if there is no thermal
zone associated with a sensor.

> 
> On my second board
> (arch/arm/boot/dts/lan966x-kontron-kswitch-d10-mmt-6g-2gs.dts). I get the
> following error:
> 
> [    6.292819] thermal_sys: Unable to find thermal zones description
> [    6.298872] thermal_sys: Failed to find thermal zone for hwmon id=0
> [    6.305375] lan966x-hwmon e2010180.hwmon: error -EINVAL: failed to register hwmon device
> [    6.313508] lan966x-hwmon: probe of e2010180.hwmon failed with error -22
> 
> Again, is there seems to be something missing in the device tree. For this
> board a device tree change should be easily doable, as it is still in
> development.
> 

That would work for this board, but not for all other boards where a sensor
tries to register with the thermal subsystem but there is no thermal zone
defined for it.

Guenter

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ