[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <35ebc15b-b3fa-4129-a542-fe348069df88@gmail.com>
Date: Wed, 7 Aug 2024 14:10:08 -0700
From: Florian Fainelli <f.fainelli@...il.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>, stable@...r.kernel.org,
Justin Chen <justin.chen@...adcom.com>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
Cc: patches@...ts.linux.dev, linux-kernel@...r.kernel.org,
torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
linux@...ck-us.net, shuah@...nel.org, patches@...nelci.org,
lkft-triage@...ts.linaro.org, pavel@...x.de, jonathanh@...dia.com,
sudipm.mukherjee@...il.com, srw@...dewatkins.net, rwarsow@....de,
conor@...nel.org, allen.lkml@...il.com, broonie@...nel.org
Subject: Re: [PATCH 6.1 00/86] 6.1.104-rc1 review
On 8/7/24 07:59, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.104 release.
> There are 86 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri, 09 Aug 2024 15:00:24 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.104-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
I have been getting some fairly unexplained oopses with 6.1.104-rc1,
whereas 6.1.103 was stable. This is only seen with ARM64, not with ARM32
running on the same board for some reason.
Here are a few samples, they most often fall within the workqueue code,
but not always:
Loading modules...[ 4.538506] usb 1-1: new high-speed USB device
number 2 using xhci-hcd
[ 4.621340] Unable to handle kernel paging request at virtual address
ffffff8004ea078d
[ 4.629297] Mem abort info:
[ 4.632097] ESR = 0x0000000096000021
[ 4.635851] EC = 0x25: DABT (current EL), IL = 32 bits
[ 4.641172] SET = 0, FnV = 0
[ 4.644229] EA = 0, S1PTW = 0
[ 4.647374] FSC = 0x21: alignment fault
[ 4.651389] Data abort info:
[ 4.654274] ISV = 0, ISS = 0x00000021
[ 4.658115] CM = 0, WnR = 0
[ 4.661085] swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000004102f000
[ 4.667795] [ffffff8004ea078d] pgd=18000000bdff8003,
p4d=18000000bdff8003, pud=18000000bdff8003, pmd=18000000bdfd6003,
pte=0068000044ea0707
[ 4.680345] Internal error: Oops: 0000000096000021 [#1] SMP
[ 4.685930] Modules linked in: udc_core(+)
[ 4.690039] CPU: 0 PID: 1086 Comm: modprobe Not tainted
6.1.104-1.1pre-gfcba0aeec90f #2
[ 4.698058] Hardware name: BCM972164PCK (DT)
[ 4.702334] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 4.709308] pc : queue_work_on+0x70/0x90
[ 4.713248] lr : queue_work_on+0x28/0x90
[ 4.717178] sp : ffffffc00cd23940
[ 4.720497] x29: ffffffc00cd23940 x28: ffffff8002de6800 x27:
0000000000000000
[ 4.727648] x26: ffffffc00a7b5c68 x25: ffffffc00cd23978 x24:
0000000000000000
[ 4.734798] x23: ffffffc00a630578 x22: ffffff8002c12c00 x21:
0000000000000100
[ 4.741948] x20: 0000000000000000 x19: ffffff8004ea078d x18:
0000000000000000
[ 4.749098] x17: 0000000000000000 x16: 0000000000000000 x15:
000000000000000a
[ 4.756247] x14: 0000000000000001 x13: 6e69622f7273752f x12:
3a6e6962732f7273
[ 4.763397] x11: 752f3a6e69622f3a x10: 0000000000000073 x9 :
ffffffc00804d610
[ 4.770547] x8 : ffffff8004ea080d x7 : 0000000000000000 x6 :
0000000080200006
[ 4.777696] x5 : 00000000ffffffff x4 : 0000000000000dc0 x3 :
0000000000000080
[ 4.784846] x2 : ffffff8004ea078d x1 : ffffff8002c12c00 x0 :
0000000000000000
[ 4.791997] Call trace:
[ 4.794446] queue_work_on+0x70/0x90
[ 4.798028] call_usermodehelper_exec+0xd4/0x1cc
[ 4.802654] kobject_uevent_env+0x6a0/0x6e0
[ 4.806849] kobject_uevent+0x10/0x18
[ 4.810519] kset_register+0x50/0x60
[ 4.814102] bus_register+0xa4/0x234
[ 4.817686] usb_udc_init+0x7c/0x1000 [udc_core]
[ 4.822338] do_one_initcall+0x80/0x1b0
[ 4.826183] do_init_module+0x54/0x1d8
[ 4.829942] load_module+0x1818/0x18e4
[ 4.833699] __do_sys_finit_module+0xec/0x10c
[ 4.838064] __arm64_sys_finit_module+0x20/0x28
[ 4.842603] invoke_syscall+0x80/0x118
[ 4.846360] el0_svc_common.constprop.3+0xb8/0xe4
[ 4.851071] do_el0_svc+0x98/0xbc
[ 4.854392] el0_svc+0x14/0x3c
[ 4.857455] el0t_64_sync_handler+0x64/0x140
[ 4.861732] el0t_64_sync+0x148/0x14c
[ 4.865402] Code: a9425bf5 a8c37bfd d65f03c0 f9800271 (c85f7e60)
[ 4.871506] ---[ end trace 0000000000000000 ]---
[ 4.876130] note: modprobe[1086] exited with irqs disabled
/sbin/load_modules: line 21: 1086 Segmentation fault modprobe -q $m
done
Another one was:
[ 5.833060] Unable to handle kernel paging request at virtual address
ffffff800586ebc6
[ 5.841005] Mem abort info:
[ 5.843812] ESR = 0x0000000096000021
[ 5.847576] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5.852907] SET = 0, FnV = 0
[ 5.855974] EA = 0, S1PTW = 0
[ 5.859128] FSC = 0x21: alignment fault
[ 5.863154] Data abort info:
[ 5.866047] ISV = 0, ISS = 0x00000021
[ 5.869897] CM = 0, WnR = 0
[ 5.872878] swapper pgtable: 4k pages, 39-bit VAs, pgdp=000000000102f000
[ 5.879601] [ffffff800586ebc6] pgd=180000007dff8003,
p4d=180000007dff8003, pud=180000007dff8003, pmd=180000007dfd1003,
pte=006800000586e707
[ 5.892173] Internal error: Oops: 0000000096000021 [#1] SMP
[ 5.897764] Modules linked in:
[ 5.900832] CPU: 1 PID: 24 Comm: kworker/u4:1 Not tainted
6.1.104-1.1pre-gfcba0aeec90f #2
[ 5.909032] Hardware name: BCM972604DV2GB (DT)
[ 5.913489] Workqueue: events_unbound deferred_probe_work_func
[ 5.919349] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 5.926330] pc : kobject_get+0x6c/0x94
[ 5.930096] lr : kobject_add_internal+0x5c/0x25c
[ 5.934730] sp : ffffffc00aa1b760
[ 5.938054] x29: ffffffc00aa1b760 x28: 0000000000000000 x27:
0000000000000000
[ 5.945213] x26: 000000000f700001 x25: ffffff8002f6ac10 x24:
ffffff8002f6ac10
[ 5.952373] x23: ffffffc008d99430 x22: ffffff800586eb8e x21:
ffffffc008d99430
[ 5.959533] x20: ffffff8004c49000 x19: ffffff800586eb8e x18:
0000000000000000
[ 5.966693] x17: 5f696368652e3030 x16: 3330306230663a6d x15:
000000000000000a
[ 5.973853] x14: 0000000000000001 x13: ffffff800589fa88 x12:
ffffffffffffffff
[ 5.981012] x11: 0000000000000020 x10: 0000000000000000 x9 :
ffffffc00858c200
[ 5.988171] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 :
ffffff800345c098
[ 5.995331] x5 : ffffffc00aa1b880 x4 : ffffff800586ebc6 x3 :
ffffff800589fa80
[ 6.002490] x2 : ffffffc00aa1b7b0 x1 : 0000000000000000 x0 :
ffffff800586ebc6
[ 6.009650] Call trace:
[ 6.012104] kobject_get+0x6c/0x94
[ 6.015518] kobject_add_internal+0x5c/0x25c
[ 6.019804] kobject_add+0xe0/0xfc
[ 6.023220] device_add+0x164/0x688
[ 6.026724] device_create_groups_vargs+0xac/0xfc
[ 6.031445] device_create+0x70/0x94
[ 6.035035] mon_bin_add+0x6c/0x80
[ 6.038449] mon_bus_init+0x74/0xa8
[ 6.041954] mon_notify+0x50/0xf8
[ 6.045282] notifier_call_chain+0x6c/0x8c
[ 6.049398] blocking_notifier_call_chain+0x48/0x70
[ 6.054294] usb_notify_add_bus+0x24/0x2c
[ 6.058319] usb_add_hcd+0x1f4/0x5fc
[ 6.061908] ehci_brcm_probe+0x164/0x1ac
[ 6.065846] platform_probe+0x6c/0xb8
[ 6.069524] really_probe+0x1b8/0x38c
[ 6.073198] __driver_probe_device+0x134/0x14c
[ 6.077656] driver_probe_device+0x40/0xf8
[ 6.081766] __device_attach_driver+0x108/0x11c
[ 6.086311] bus_for_each_drv+0xa0/0xc4
[ 6.090158] __device_attach+0xf0/0x178
[ 6.094007] device_initial_probe+0x18/0x20
[ 6.098203] bus_probe_device+0x34/0x94
[ 6.102052] deferred_probe_work_func+0xd4/0xe8
[ 6.106597] process_one_work+0x1a4/0x254
[ 6.110623] process_scheduled_works+0x44/0x48
[ 6.115083] worker_thread+0x1e8/0x264
[ 6.118846] kthread+0xbc/0xcc
[ 6.121912] ret_from_fork+0x10/0x20
[ 6.125506] Code: a8c27bfd d65f03c0 9100e264 f9800091 (885f7c81)
[ 6.131615] ---[ end trace 0000000000000000 ]---
It appears to be somewhat probabilistic because out of our dozen or so
boards in the farm, not all of them will hit the panic for a given
"bad" commit in the bisection. The bisection eventually landed on:
commit 2f7f85911e7559b06c44561c1e31a69ee80a5f60
Author: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
Date: Wed Jun 28 18:02:51 2023 +0300
irqdomain: Use return value of strreplace()
[ Upstream commit 67a4e1a3bf7c68ed3fbefc4213648165d912cabb ]
Since strreplace() returns the pointer to the string itself, use it
directly.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@...ux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Link:
https://lore.kernel.org/r/20230628150251.17832-1-andriy.shevchenko@linux.intel.com
Stable-dep-of: 6ce3e98184b6 ("irqdomain: Fixed unbalanced fwnode
get and put")
Signed-off-by: Sasha Levin <sashal@...nel.org>
kernel/irq/irqdomain.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Reverting that commit on top of 6.1.104-rc1 gives me a stable system
again, but I really have no explanation why because the transformation
seems correct to me, it is the *first* bad commit.
Andy, does that make any sense to you?
--
Florian
Powered by blists - more mailing lists