lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <feecfdc2-4df6-47cf-8f96-5044858dc881@gmx.com>
Date: Thu, 19 Dec 2024 10:44:12 +1030
From: Qu Wenruo <quwenruo.btrfs@....com>
To: Naresh Kamboju <naresh.kamboju@...aro.org>, qemu-devel@...gnu.org,
 open list <linux-kernel@...r.kernel.org>,
 Linux Regressions <regressions@...ts.linux.dev>,
 linux-ext4 <linux-ext4@...r.kernel.org>, lkft-triage@...ts.linaro.org,
 linux-mm <linux-mm@...ck.org>, Linux btrfs <linux-btrfs@...r.kernel.org>
Cc: Alex Bennée <alex.bennee@...aro.org>,
 Anders Roxell <anders.roxell@...aro.org>, Arnd Bergmann <arnd@...db.de>,
 Dan Carpenter <dan.carpenter@...aro.org>, Qu Wenruo <wqu@...e.com>,
 David Sterba <dsterba@...e.com>
Subject: Re: qemu-arm64: CONFIG_ARM64_64K_PAGES=y kernel crash on qemu-arm64
 with Linux next-20241210 and above



在 2024/12/19 06:37, Qu Wenruo 写道:
>
>
> 在 2024/12/19 02:22, Naresh Kamboju 写道:
>> On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju
>> <naresh.kamboju@...aro.org> wrote:
>>>
>>> The following kernel crash noticed on qemu-arm64 while running the
>>> Linux next-20241210 tag (to next-20241218) kernel built with
>>>   - CONFIG_ARM64_64K_PAGES=y
>>>   - CONFIG_ARM64_16K_PAGES=y
>>> and running LTP smoke tests.
>>>
>>> First seen on Linux next-20241210.
>>>    Good: next-20241209
>>>    Bad:  next-20241210 and next-20241218
>>>
>>> qemu-arm64: 9.1.2
>>>
>>> Anyone noticed this ?
>>>
>>
>> Anders bisected this reported regression and found,
>> # first bad commit:
>>    [9c1d66793b6faa00106ae4c866359578bfc012d2]
>>    btrfs: validate system chunk array at btrfs_validate_super()
>
> Weird, I run daily fstests with 64K page sized aarch64 VM.
>
> But never hit a crash on this.
>
> And the original crash call trace only points back to ext4, not btrfs.
>
> Mind to test it with KASAN enabled?

Another thing is, how do you enable both 16K and 64K page size at the
same time?

The Kconfig should only select one page size IIRC.

And for the bisection, does it focus on the test failure or the crash?

For the test failure it looks like some older btrfs-progs, causing
invalid system chunk items, which got caught by the newer and more
strict sanity checks.

For the crash, unfortunately I'm not able to reproduce using fstests.
Will try LTP soon.

Thanks,
Qu
>
> Thanks,
> Qu
>>
>>
>>> Test log:
>>> ---------
>>> tst_test.c:1799: TINFO: === Testing on btrfs ===
>>> tst_test.c:1158: TINFO: Formatting /dev/loop0 with btrfs opts=''
>>> extra opts=''
>>> <6>[   71.880167] BTRFS: device fsid
>>> d492b571-012c-40a9-b8e1-efc97408d3bc devid 1 transid 6 /dev/loop0
>>> (7:0) scanned by chdir01 (476)
>>> tst_test.c:1170: TINFO: Mounting /dev/loop0 to
>>> /tmp/LTP_chdJeywxF/mntpoint fstyp=btrfs flags=0
>>> <6>[   71.960245] BTRFS info (device loop0): first mount of filesystem
>>> d492b571-012c-40a9-b8e1-efc97408d3bc
>>> <6>[   71.970667] BTRFS info (device loop0): using crc32c
>>> (crc32c-arm64) checksum algorithm
>>> <2>[   71.993486] BTRFS critical (device loop0): corrupt superblock
>>> syschunk array: chunk_start=22020096, invalid chunk sectorsize, have
>>> 65536 expect 4096
>>> <3>[   71.995802] BTRFS error (device loop0): superblock contains
>>> fatal errors
>>> <3>[   72.014538] BTRFS error (device loop0): open_ctree failed: -22
>>> tst_test.c:1170: TBROK: mount(/dev/loop0, mntpoint, btrfs, 0, (nil))
>>> failed: EINVAL (22)
>>>
>>> Summary:
>>> passed   48
>>> failed   0
>>> broken   1
>>> skipped  0
>>> warnings 0
>>>
>>> Duration: 7.002s
>>>
>>>
>>> ===== symlink01 =====
>>> command: symlink01
>>> <12>[   72.494428] /usr/local/bin/kirk[253]: starting test symlink01
>>> (symlink01)
>>> symlink01    0  TINFO  :  Using /tmp/LTP_symmsYXet as tmpdir (tmpfs
>>> filesystem)
>>> symlink01    1  TPASS  :  Creation of symbolic link file to no object
>>> file is ok
>>> symlink01    2  TPASS  :  Creation of symbolic link file to no object
>>> file is ok
>>> symlink01    3  TPASS  :  Creation of symbolic link file and object
>>> file via symbolic link is ok
>>> symlink01    4  TPASS  :  Creating an existing symbolic link file
>>> error is caught
>>> symlink01    5  TPASS  :  Creating a symbolic link which exceeds
>>> maximum pathname error is caught
>>>
>>> Summary:
>>> passed    5
>>> failed    0
>>> broken    0
>>> skipped   0
>>> warnings  0
>>>
>>> Duration: 0.052s
>>>
>>>
>>> ===== stat04 =====
>>> command: stat04
>>> <12>[   72.966706] /usr/local/bin/kirk[253]: starting test stat04
>>> (stat04)
>>> tst_buffers.c:57: TINFO: Test is using guarded buffers
>>> tst_tmpdir.c:316: TINFO: Using /tmp/LTP_staEABwgV as tmpdir (tmpfs
>>> filesystem)
>>> <6>[   73.447708] loop0: detected capacity change from 0 to 614400
>>> tst_device.c:96: TINFO: Found free device 0 '/dev/loop0'
>>> tst_test.c:1860: TINFO: LTP version: 20240930
>>> tst_test.c:1864: TINFO: Tested kernel: 6.13.0-rc3-next-20241218 #1 SMP
>>> PREEMPT @1734498806 aarch64
>>> tst_test.c:1703: TINFO: Timeout per run is 0h 05m 24s
>>> stat04.c:60: TINFO: Formatting /dev/loop0 with ext2 opts='-b 4096'
>>> extra opts=''
>>> mke2fs 1.47.1 (20-May-2024)
>>> <3>[   73.859753] operation not supported error, dev loop0, sector
>>> 614272 op 0x9:(WRITE_ZEROES) flags 0x10000800 phys_seg 0 prio class 0
>>> stat04.c:61: TINFO: Mounting /dev/loop0 to /tmp/LTP_staEABwgV/mntpoint
>>> fstyp=ext2 flags=0
>>> <6>[   73.939263] EXT4-fs (loop0): mounting ext2 file system using the
>>> ext4 subsystem
>>> <1>[   73.946378] Unable to handle kernel paging request at virtual
>>> address a8fff00000c0c224
>>> <1>[   73.947878] Mem abort info:
>>> <1>[   73.949153]   ESR = 0x0000000096000005
>>> <1>[   73.959105]   EC = 0x25: DABT (current EL), IL = 32 bits
>>> <1>[   73.960031]   SET = 0, FnV = 0
>>> <1>[   73.960349]   EA = 0, S1PTW = 0
>>> <1>[   73.960638]   FSC = 0x05: level 1 translation fault
>>> <1>[   73.961005] Data abort info:
>>> <1>[   73.961293]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>>> <1>[   73.963739]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>> <1>[   73.964980]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>> <1>[   73.967132] [a8fff00000c0c224] address between user and kernel
>>> address ranges
>>> <0>[   73.968923] Internal error: Oops: 0000000096000005 [#1] PREEMPT
>>> SMP
>>> <4>[   73.970516] Modules linked in: btrfs blake2b_generic xor
>>> xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce
>>> sha512_arm64 fuse drm backlight ip_tables x_tables
>>> <4>[   73.974237] CPU: 1 UID: 0 PID: 529 Comm: stat04 Not tainted
>>> 6.13.0-rc3-next-20241218 #1
>>> <4>[   73.975359] Hardware name: linux,dummy-virt (DT)
>>> <4>[   73.977170] pstate: 62402009 (nZCv daif +PAN -UAO +TCO -DIT
>>> -SSBS BTYPE=--)
>>> <4>[ 73.978295] pc : __kmalloc_node_noprof (mm/slub.c:492
>>> mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152
>>> mm/slub.c:4293 mm/slub.c:4300)
>>> <4>[ 73.980200] lr : alloc_cpumask_var_node (lib/cpumask.c:62
>>> (discriminator 2))
>>> <4>[   73.981466] sp : ffff80008258f950
>>> <4>[   73.982228] x29: ffff80008258f970 x28: ffffa93389398000 x27:
>>> 0000000000000001
>>> <4>[   73.983875] x26: fffffc1fc0303080 x25: 00000000ffffffff x24:
>>> a8fff00000c0c224
>>> <4>[   73.985649] x23: 0000000000000cc0 x22: ffffa93387f51d0c x21:
>>> 00000000ffffffff
>>> <4>[   73.986188] x20: fff00000c0010400 x19: 0000000000000008 x18:
>>> 0000000000000000
>>> <4>[   73.988686] x17: fff056cd748b0000 x16: ffff800080020000 x15:
>>> 0000000000000000
>>> <4>[   73.990276] x14: 0000000000002a66 x13: 0000000000004000 x12:
>>> 0000000000000001
>>> <4>[   73.992401] x11: 0000000000000002 x10: 0000000000004001 x9 :
>>> ffffa93387f51d0c
>>> <4>[   73.993108] x8 : fff00000c2c99240 x7 : 0000000000000001 x6 :
>>> 0000000000000001
>>> <4>[   73.993886] x5 : fff00000c4879800 x4 : 0000000000000000 x3 :
>>> 000000000033a401
>>> <4>[   73.995550] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 :
>>> fff00000c0010400
>>> <4>[   73.997017] Call trace:
>>> <4>[ 73.998266] __kmalloc_node_noprof+0x100/0x4a0 P
>>> <4>[ 73.999716] alloc_cpumask_var_node (lib/cpumask.c:62
>>> (discriminator 2))
>>> <4>[ 74.000942] alloc_workqueue_attrs (kernel/workqueue.c:4624
>>> (discriminator 1))
>>> <4>[ 74.001327] apply_wqattrs_prepare (kernel/workqueue.c:5263)
>>> <4>[ 74.003095] apply_workqueue_attrs_locked (kernel/workqueue.c:5351)
>>> <4>[ 74.003855] alloc_workqueue (kernel/workqueue.c:5722
>>> (discriminator 1) kernel/workqueue.c:5772 (discriminator 1))
>>> <4>[ 74.005398] ext4_fill_super (fs/ext4/super.c:5484 fs/ext4/
>>> super.c:5722)
>>> <4>[ 74.006132] get_tree_bdev_flags (fs/super.c:1636)
>>> <4>[ 74.007624] get_tree_bdev (fs/super.c:1660)
>>> <4>[ 74.008664] ext4_get_tree (fs/ext4/super.c:5755)
>>> <4>[ 74.009423] vfs_get_tree (fs/super.c:1814)
>>> <4>[ 74.009703] path_mount (fs/namespace.c:3556 fs/namespace.c:3883)
>>> <4>[ 74.010608] __arm64_sys_mount (fs/namespace.c:3896
>>> fs/namespace.c:4107 fs/namespace.c:4084 fs/namespace.c:4084)
>>> <4>[ 74.011527] invoke_syscall.constprop.0
>>> (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54)
>>> <4>[ 74.012798] do_el0_svc (include/linux/thread_info.h:135
>>> (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2)
>>> arch/arm64/kernel/syscall.c:151 (discriminator 2))
>>> <4>[ 74.014042] el0_svc (arch/arm64/include/asm/irqflags.h:82
>>> (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
>>> 1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
>>> arch/arm64/kernel/entry-common.c:165 (discriminator 1)
>>> arch/arm64/kernel/entry-common.c:178 (discriminator 1)
>>> arch/arm64/kernel/entry-common.c:745 (discriminator 1))
>>> <4>[ 74.014942] el0t_64_sync_handler (arch/arm64/kernel/entry-
>>> common.c:763)
>>> <4>[ 74.015917] el0t_64_sync (arch/arm64/kernel/entry.S:600)
>>> <0>[ 74.017042] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)
>>> All code
>>> ========
>>>     0: 12800019 mov w25, #0xffffffff            // #-1
>>>     4: b9402a82 ldr w2, [x20, #40]
>>>     8: aa1803e1 mov x1, x24
>>>     c: aa1403e0 mov x0, x20
>>>    10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>     0: f8626b1a ldr x26, [x24, x2]
>>> <4>[   74.019014] ---[ end trace 0000000000000000 ]---
>>> tst_test.c:1763: TBROK: Test killed by SIGSEGV!
>>>
>>> Summary:
>>> passed   0
>>> failed   0
>>> broken   1
>>> skipped  0
>>> warnings 0
>>> tst_device.c:269: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO
>>> for too long
>>> Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m
>>> Tainted kernel: ['kernel died recently, i.e. there was an OOPS or
>>> BUG'][0m
>>> Restarting SUT: host
>>>
>>> ===== df01_sh =====
>>> command: df01.sh
>>> <12>[   76.370093] /usr/local/bin/kirk[253]: starting test df01_sh
>>> (df01.sh)
>>> Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m
>>> <1>[   76.603065] Unable to handle kernel paging request at virtual
>>> address a8fff00000c0c224
>>> <1>[   76.603922] Mem abort info:
>>> <1>[   76.604197]   ESR = 0x0000000096000005
>>> <1>[   76.604638]   EC = 0x25: DABT (current EL), IL = 32 bits
>>> <1>[   76.605128]   SET = 0, FnV = 0
>>> <1>[   76.606996]   EA = 0, S1PTW = 0
>>> <1>[   76.607274]   FSC = 0x05: level 1 translation fault
>>> <1>[   76.607611] Data abort info:
>>> <1>[   76.607897]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>>> <1>[   76.609765]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>> <1>[   76.610958]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>> <1>[   76.611652] [a8fff00000c0c224] address between user and kernel
>>> address ranges
>>> <0>[   76.612130] Internal error: Oops: 0000000096000005 [#2] PREEMPT
>>> SMP
>>> <4>[   76.613305] Modules linked in: btrfs blake2b_generic xor
>>> xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce
>>> sha512_arm64 fuse drm backlight ip_tables x_tables
>>> <4>[   76.617688] CPU: 1 UID: 0 PID: 553 Comm: df01.sh Tainted: G
>>> D            6.13.0-rc3-next-20241218 #1
>>> <4>[   76.620869] Tainted: [D]=DIE
>>> <4>[   76.621184] Hardware name: linux,dummy-virt (DT)
>>> <4>[   76.622671] pstate: 63402009 (nZCv daif +PAN -UAO +TCO +DIT
>>> -SSBS BTYPE=--)
>>> <4>[ 76.623693] pc : __kmalloc_node_noprof (mm/slub.c:492
>>> mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152
>>> mm/slub.c:4293 mm/slub.c:4300)
>>> <4>[ 76.624180] lr : __vmalloc_node_range_noprof
>>> (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846)
>>> <4>[   76.625290] sp : ffff80008258fa90
>>> <4>[   76.626275] x29: ffff80008258fab0 x28: fff00000c2c98e80 x27:
>>> fff00000c48fd100
>>> <4>[   76.626966] x26: fffffc1fc0303080 x25: 00000000ffffffff x24:
>>> a8fff00000c0c224
>>> <4>[   76.627599] x23: 0000000000000dc0 x22: ffffa93386d87390 x21:
>>> 00000000ffffffff
>>> <4>[   76.628603] x20: fff00000c0010400 x19: 0000000000000008 x18:
>>> 0000000000000000
>>> <4>[   76.629618] x17: 0000000000000000 x16: ffff800082180000 x15:
>>> ffff800080000000
>>> <4>[   76.630999] x14: fff00000c00203f0 x13: 00000ffff8000821 x12:
>>> 0000000000000000
>>> <4>[   76.632089] x11: 0000000000000000 x10: 0000000000000000 x9 :
>>> ffffa93386d87390
>>> <4>[   76.634293] x8 : ffff80008258f908 x7 : fff00000c2c98e80 x6 :
>>> 0000000000010000
>>> <4>[   76.634816] x5 : ffffa93389379000 x4 : 0000000000000000 x3 :
>>> 000000000033b801
>>> <4>[   76.636355] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 :
>>> fff00000c0010400
>>> <4>[   76.638309] Call trace:
>>> <4>[ 76.639031] __kmalloc_node_noprof+0x100/0x4a0 P
>>> <4>[ 76.640890] __vmalloc_node_range_noprof (include/linux/slab.h:922
>>> mm/vmalloc.c:3647 mm/vmalloc.c:3846)
>>> <4>[ 76.641267] copy_process (kernel/fork.c:314 (discriminator 1)
>>> kernel/fork.c:1061 (discriminator 1) kernel/fork.c:2176 (discriminator
>>> 1))
>>> <4>[ 76.641795] kernel_clone (kernel/fork.c:2758)
>>> <4>[ 76.643003] __do_sys_clone (kernel/fork.c:2902)
>>> <4>[ 76.644078] __arm64_sys_clone (kernel/fork.c:2869)
>>> <4>[ 76.645306] invoke_syscall.constprop.0
>>> (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54)
>>> <4>[ 76.646337] do_el0_svc (include/linux/thread_info.h:135
>>> (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2)
>>> arch/arm64/kernel/syscall.c:151 (discriminator 2))
>>> <4>[ 76.646974] el0_svc (arch/arm64/include/asm/irqflags.h:82
>>> (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
>>> 1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
>>> arch/arm64/kernel/entry-common.c:165 (discriminator 1)
>>> arch/arm64/kernel/entry-common.c:178 (discriminator 1)
>>> arch/arm64/kernel/entry-common.c:745 (discriminator 1))
>>> <4>[ 76.647709] el0t_64_sync_handler (arch/arm64/kernel/entry-
>>> common.c:763)
>>> <4>[ 76.649032] el0t_64_sync (arch/arm64/kernel/entry.S:600)
>>> <0>[ 76.649724] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)
>>>
>>> <trim>
>>>
>>> All code
>>> ========
>>>     0: 12800019 mov w25, #0xffffffff            // #-1
>>>     4: b9402a82 ldr w2, [x20, #40]
>>>     8: aa1803e1 mov x1, x24
>>>     c: aa1403e0 mov x0, x20
>>>    10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>     0: f8626b1a ldr x26, [x24, x2]
>>>   <4>[   79.647693] ---[ end trace 0000000000000000 ]---
>>>   <0>[   79.649260] Kernel panic - not syncing: Attempted to kill init!
>>> exitcode=0x0000000b
>>>   <2>[   79.650229] SMP: stopping secondary CPUs
>>>   <0>[   79.651558] Kernel Offset: 0x293306a00000 from
>>> 0xffff800080000000
>>>   <0>[   79.652015] PHYS_OFFSET: 0x40000000
>>>   <0>[   79.652461] CPU features: 0x000,000000d0,60bef2d8,cb7e7f3f
>>>   <0>[   79.653039] Memory Limit: none
>>>   <0>[   79.653854] ---[ end Kernel panic - not syncing: Attempted to
>>> kill init! exitcode=0x0000000b ]---
>>>
>>>
>>> Links:
>>> -------
>>>   - https://qa-reports.linaro.org/lkft/linux-next-master/build/
>>> next-20241218/testrun/26396709/suite/log-parser-test/test/panic-
>>> multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/
>>> history/
>>>   - https://qa-reports.linaro.org/lkft/linux-next-master/build/
>>> next-20241212/testrun/26277241/suite/log-parser-test/test/panic-
>>> multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/log
>>>   - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/
>>> tests/2qNMDhPFtR8j185QSvZMn989u84
>>>   - https://storage.tuxsuite.com/public/linaro/lkft/
>>> builds/2qNMCQazNJteQLGCw7MnMtUwzkD/
>>>   - https://qa-reports.linaro.org/lkft/linux-next-master/build/
>>> next-20241211/testrun/26266202/suite/log-parser-test/test/panic-
>>> multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/
>>> details/
>>>
>>>
>>> metadata:
>>> ----
>>>    git describe: next-20241210..next-20241218
>>>    git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/
>>> linux-next.git
>>>    kernel config:
>>> https://storage.tuxsuite.com/public/linaro/lkft/
>>> builds/2qNMCQazNJteQLGCw7MnMtUwzkD/config
>>>    build url: https://storage.tuxsuite.com/public/linaro/lkft/
>>> builds/2qNMCQazNJteQLGCw7MnMtUwzkD/
>>>    toolchain: gcc-13
>>>    config: CONFIG_ARM64_64K_PAGES=y, CONFIG_ARM64_16K_PAGES=y
>>>    arch: arm64
>>>    qemu: qemu-arm64 version 9.1.2
>>>
>>
>> --
>> Linaro LKFT
>> https://lkft.linaro.org
>>
>
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ