lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4d3b8b3a-31b7-4476-e83c-e9894ceed0d3@suse.com>
Date:   Tue, 15 Sep 2020 16:00:40 +0800
From:   Qu Wenruo <wqu@...e.com>
To:     Oliver Sang <oliver.sang@...el.com>
CC:     linux-btrfs@...r.kernel.org, 0day robot <lkp@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: Re: [btrfs] 3b54a0a703:
 WARNING:at_fs/btrfs/inode.c:#btrfs_finish_ordered_io[btrfs]



On 2020/9/15 下午3:40, Qu Wenruo wrote:
> 
> 
> On 2020/9/15 下午1:54, Oliver Sang wrote:
>> On Wed, Sep 09, 2020 at 03:49:30PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2020/9/9 下午3:08, kernel test robot wrote:
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>
>>>> commit: 3b54a0a703f17d2b1317d24beefcdcca587a7667 ("[PATCH v3 3/5] btrfs: Detect unbalanced tree with empty leaf before crashing btree operations")
>>>> url: https://github.com/0day-ci/linux/commits/Qu-Wenruo/btrfs-Enhanced-runtime-defence-against-fuzzed-images/20200809-201720
>>>> base: https://git.kernel.org/cgit/linux/kernel/git/kdave/linux.git for-next
>>>>
>>>> in testcase: fio-basic
>>>> with following parameters:
>>>>
>>>> 	runtime: 300s
>>>> 	disk: 1SSD
>>>> 	fs: btrfs
>>>> 	nr_task: 100%
>>>> 	test_size: 128G
>>>> 	rw: write
>>>> 	bs: 4k
>>>> 	ioengine: sync
>>>> 	cpufreq_governor: performance
>>>> 	ucode: 0x400002c
>>>> 	fs2: nfsv4
>>>>
>>>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
>>>> test-url: https://github.com/axboe/fio
>>>>
>>>>
>>>> on test machine: 96 threads Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz with 128G memory
>>>>
>>>> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>>>>
>>>>
>>>> +----------------------------------------------------------------------------+------------+------------+
>>>> |                                                                            | 2703206ff5 | 3b54a0a703 |
>>>> +----------------------------------------------------------------------------+------------+------------+
>>>> | boot_successes                                                             | 9          | 0          |
>>>> | boot_failures                                                              | 4          |            |
>>>> | Kernel_panic-not_syncing:VFS:Unable_to_mount_root_fs_on_unknown-block(#,#) | 4          |            |
>>>> +----------------------------------------------------------------------------+------------+------------+
>>>>
>>>>
>>>> If you fix the issue, kindly add following tag
>>>> Reported-by: kernel test robot <oliver.sang@...el.com>
>>>>
>>>>
>>>
>>> According to the full dmesg, it's invalid nritems causing transaction abort.
>>>
>>> I'm not sure if it's caused by corrupts fs or something else.
>>>
>>> If intel guys can reproduce it reliably, would you please add such debug
>>> diff to output extra info?
>>
>> Hi Qu, sorry for late. we double confirmed the issue can be reproduced reliably.
>> The error will only occur on fbc but not parent commit.
>>
>> below from applying your path for extra info
>> [   42.539443] [task_0]$
>> [   42.539445]~$
>> [   42.546125] rw=write$
>> [   42.546126]~$
>> [   42.551637] directory=/fs/nvme1n1p1$
>> [   42.551638]~$
>> [   42.559135] numjobs=96' | fio --output-format=json -$
>> [   42.559136]~$
>> [   42.574513] perf version 5.9.rc4.g34d4ddd359db$
>> [   42.574518]~$
>> [   56.152662] BTRFS error (device nvme1n1p1): invalid tree nritems, bytenr=13238272 owner=7 level=0 first_key=(18446744073709551606 128 96941895680) nritems=0
>>  expect >0$
> 
> Just as expected, this is indeed csum tree.
> And it looks like it's indeed still valid.
> 
> The csum root can still have its key from previous not emptry csum.

Wait for a minute, if it's csum root empty, we shouldn't have first_key
passed in.

So this still has something wrong.

Would you please try this diff to provide more debug info?
(Better to remove the existing diff)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 75bbe879ed18..6f29a3c38b56 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -400,10 +400,17 @@ int btrfs_verify_level_key(struct extent_buffer
*eb, int level,

        /* We have @first_key, so this @eb must have at least one item */
        if (btrfs_header_nritems(eb) == 0) {
+               pr_info("%s: eb start=%llu gen=%llu last_committed=%llu\n",
+                       __func__, eb->start, btrfs_header_generation(eb),
+                       fs_info->last_trans_committed);
                btrfs_err(fs_info,
                "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
                          eb->start);
-               WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+               pr_info("%s: csum tree commit root:\n", __func__);
+               btrfs_print_tree(fs_info->csum_root->commit_root, true);
+               pr_info("%s: csum tree current root:\n", __func__);
+               btrfs_print_tree(fs_info->csum_root->node, true);
+               WARN_ON(1);
                return -EUCLEAN;
        }


> 
> In that case, the check is indeed too strict and causes false alert.
> 
> I'll soon send out a fix with Intel reported-by.
> 
> Thanks,
> Qu
> 
>> [   56.152664] BTRFS error (device nvme1n1p1): invalid tree nritems, bytenr=13238272 owner=7 level=0 first_key=(18446744073709551606 128 96941895680) nritems=0
>>  expect >0$
>> [   56.152666] ------------[ cut here ]------------$
>> [   56.168263] BTRFS: error (device nvme1n1p1) in btrfs_finish_ordered_io:2687: errno=-117 Filesystem corrupted$
>> [   56.168264] BTRFS info (device nvme1n1p1): forced readonly$
>> [   56.205009] BTRFS: Transaction aborted (error -117)$
>> [   56.210368] WARNING: CPU: 71 PID: 537 at fs/btrfs/inode.c:2687 btrfs_finish_ordered_io+0x70a/0x820 [btrfs]$
>> [   56.220466] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfsd auth_rpcgss dm_mod dax_pmem_compat nd_pmem device_dax nd_btt dax_pmem_core btrfs blak
>> e2b_generic sr_mod xor cdrom sd_mod zstd_decompress sg zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common skx_edac x86_pkg_temp_thermal intel_po
>> werclamp coretemp kvm_intel ipmi_ssif kvm irqbypass ast drm_vram_helper crct10dif_pclmul drm_ttm_helper crc32_pclmul ttm crc32c_intel ghash_clmulni_intel rapl
>> drm_kms_helper acpi_ipmi syscopyarea intel_cstate sysfillrect ahci sysimgblt nvme libahci fb_sys_fops intel_uncore mei_me nvme_core ioatdma t10_pi drm mei liba
>> ta joydev ipmi_si dca wmi ipmi_devintf ipmi_msghandler nfit libnvdimm ip_tables$
>> [   56.285795] CPU: 71 PID: 537 Comm: kworker/u193:28 Not tainted 5.8.0-rc7-00166-g6e85ab8532a52 #1$
>> [   56.295218] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]$
>> [   56.302044] RIP: 0010:btrfs_finish_ordered_io+0x70a/0x820 [btrfs]$
>> [   56.308762] Code: 48 0a 00 00 02 72 25 41 83 ff fb 0f 84 f2 00 00 00 41 83 ff e2 0f 84 e8 00 00 00 44 89 fe 48 c7 c7 a0 9c 2c c1 e8 58 2e ec bf <0f> 0b 44 8
>> 9 f9 ba 7f 0a 00 00 48 c7 c6 50 c7 2b c1 48 89 df e8 15$
>> [   56.328846] RSP: 0018:ffffc90007babd58 EFLAGS: 00010282$
>> [   56.334755] RAX: 0000000000000000 RBX: ffff888fb85984e0 RCX: 0000000000000000$
>> [   56.342577] RDX: ffff8890401e82a0 RSI: ffff8890401d7f60 RDI: ffff8890401d7f60$
>> [   56.350415] RBP: ffff889007e2e4c0 R08: 0000000001200000 R09: 0000000000000000$
>> [   56.358255] R10: 0000000000000001 R11: ffffffffc00bd060 R12: 0000000010e7e000$
>> [   56.366121] R13: 0000000010e7efff R14: ffff888fb86622a8 R15: 00000000ffffff8b$
>> [   56.373983] FS:  0000000000000000(0000) GS:ffff8890401c0000(0000) knlGS:0000000000000000$
>> [   56.382806] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033$
>> [   56.389291] CR2: 00007fa44c4cc448 CR3: 0000005eac7c0006 CR4: 00000000007606e0$
>> [   56.397176] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000$
>> [   56.405065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400$
>> [   56.412952] PKRU: 55555554$
>> [   56.416419] Call Trace:$
>> [   56.419631]  ? update_curr+0xc0/0x200$
>> [   56.424060]  ? newidle_balance+0x232/0x3e0$
>> [   56.428958]  btrfs_work_helper+0xc9/0x400 [btrfs]$
>> [   56.434441]  ? __schedule+0x378/0x860$
>> [   56.438895]  process_one_work+0x1b5/0x3a0$
>> [   56.443690]  worker_thread+0x50/0x3c0$
>> [   56.446698] /usr/bin/wget -q --timeout=1800 --tries=1 --local-encoding=UTF-8 http://inn:80/~lkp/cgi-bin/lkp-jobfile-append-var?job_file=/lkp/jobs/scheduled/
>> lkp-csl-2sp3/fio-basic-4k-performance-1SSD-btrfs-nfsv4-sync-100%25-300s-write-128G-ucode=0x400002c-monitor=a122c70f-debian-10.4-x86_64-20200603-20200915-84129-
>> 1i8kzyy-0.yaml&job_state=post_run -O /dev/null$
>> [   56.446700]~$
>> [   56.448144]  ? process_one_work+0x3a0/0x3a0$
>> [   56.491886]  kthread+0x114/0x160$
>> [   56.495986]  ? kthread_park+0xa0/0xa0$
>> [   56.500522]  ret_from_fork+0x1f/0x30$
>> [   56.504966] ---[ end trace fbe43e164e851f97 ]---$
>> [   56.510432] BTRFS: error (device nvme1n1p1) in btrfs_finish_ordered_io:2687: errno=-117 Filesystem corrupted$
>>
>>
>> I also attached full dmesg 'dmesg-with-debug' for your reference.
>>
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index b1a148058773..b050d6fcb90a 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -406,8 +406,9 @@ int btrfs_verify_level_key(struct extent_buffer *eb,
>>> int level,
>>>         /* We have @first_key, so this @eb must have at least one item */
>>>         if (btrfs_header_nritems(eb) == 0) {
>>>                 btrfs_err(fs_info,
>>> -               "invalid tree nritems, bytenr=%llu nritems=0 expect >0",
>>> -                         eb->start);
>>> +               "invalid tree nritems, bytenr=%llu owner=%llu level=%d
>>> first_key=(%llu %u %llu) nritems=0 expect >0",
>>> +                         eb->start, btrfs_header_owner(eb),
>>> btrfs_header_level(eb),
>>> +                         first_key->objectid, first_key->type,
>>> first_key->offset);
>>>                 WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>>>                 return -EUCLEAN;
>>>         }
>>>
>>> Thanks,
>>> Qu
>>>
>>>> [   50.226906] WARNING: CPU: 71 PID: 500 at fs/btrfs/inode.c:2687 btrfs_finish_ordered_io+0x70a/0x820 [btrfs]
>>>> [   50.236913] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfsd auth_rpcgss dm_mod dax_pmem_compat nd_pmem device_dax nd_btt dax_pmem_core btrfs sr_mod blake2b_generic xor cdrom sd_mod zstd_decompress sg zstd_compress raid6_pq libcrc32c intel_rapl_msr intel_rapl_common skx_edac x86_pkg_temp_thermal ipmi_ssif intel_powerclamp coretemp kvm_intel kvm irqbypass ast crct10dif_pclmul drm_vram_helper crc32_pclmul crc32c_intel acpi_ipmi drm_ttm_helper ghash_clmulni_intel ttm rapl drm_kms_helper intel_cstate syscopyarea sysfillrect nvme sysimgblt intel_uncore fb_sys_fops nvme_core ahci libahci t10_pi drm mei_me ioatdma libata mei ipmi_si joydev dca wmi ipmi_devintf ipmi_msghandler nfit libnvdimm ip_tables
>>>> [   50.301669] CPU: 71 PID: 500 Comm: kworker/u193:5 Not tainted 5.8.0-rc7-00165-g3b54a0a703f17 #1
>>>> [   50.310904] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
>>>> [   50.317626] RIP: 0010:btrfs_finish_ordered_io+0x70a/0x820 [btrfs]
>>>> [   50.324255] Code: 48 0a 00 00 02 72 25 41 83 ff fb 0f 84 f2 00 00 00 41 83 ff e2 0f 84 e8 00 00 00 44 89 fe 48 c7 c7 70 1c 2b c1 e8 58 ae ed bf <0f> 0b 44 89 f9 ba 7f 0a 00 00 48 c7 c6 50 47 2a c1 48 89 df e8 15
>>>> [   50.344116] RSP: 0018:ffffc90007a83d58 EFLAGS: 00010282
>>>> [   50.349923] RAX: 0000000000000000 RBX: ffff888a93ca5ea0 RCX: 0000000000000000
>>>> [   50.357656] RDX: ffff8890401e82a0 RSI: ffff8890401d7f60 RDI: ffff8890401d7f60
>>>> [   50.365385] RBP: ffff8890300ab8a8 R08: 0000000000000bd4 R09: 0000000000000000
>>>> [   50.373133] R10: 0000000000000001 R11: ffffffffc0714060 R12: 000000000f83e000
>>>> [   50.381060] R13: 000000000f83ffff R14: ffff888fb6c39968 R15: 00000000ffffff8b
>>>> [   50.388824] FS:  0000000000000000(0000) GS:ffff8890401c0000(0000) knlGS:0000000000000000
>>>> [   50.397545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   50.404300] CR2: 00007feacc500f98 CR3: 0000000f74422006 CR4: 00000000007606e0
>>>> [   50.412477] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> [   50.420281] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> [   50.428082] PKRU: 55555554
>>>> [   50.431451] Call Trace:
>>>> [   50.434570]  ? update_curr+0xc0/0x200
>>>> [   50.438919]  ? newidle_balance+0x232/0x3e0
>>>> [   50.443700]  ? __wake_up_common+0x80/0x180
>>>> [   50.448491]  btrfs_work_helper+0xc9/0x400 [btrfs]
>>>> [   50.453880]  ? __schedule+0x378/0x860
>>>> [   50.458218]  process_one_work+0x1b5/0x3a0
>>>> [   50.462917]  worker_thread+0x50/0x3c0
>>>> [   50.467262]  ? process_one_work+0x3a0/0x3a0
>>>> [   50.472148]  kthread+0x114/0x160
>>>> [   50.476084]  ? kthread_park+0xa0/0xa0
>>>> [   50.480445]  ret_from_fork+0x1f/0x30
>>>> [   50.484731] ---[ end trace cc096c1a2068030e ]---
>>>>
>>>>
>>>> To reproduce:
>>>>
>>>>         git clone https://github.com/intel/lkp-tests.git
>>>>         cd lkp-tests
>>>>         bin/lkp install job.yaml  # job file is attached in this email
>>>>         bin/lkp run     job.yaml
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Oliver Sang
>>>>
>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ