[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95753732-9714-42e0-8097-e2b4c3dd5820@linux.ibm.com>
Date: Thu, 22 May 2025 18:07:21 +0530
From: Nilay Shroff <nilay@...ux.ibm.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev, lkp@...el.com, linux-kernel@...r.kernel.org,
Jens Axboe <axboe@...nel.dk>, Christoph Hellwig <hch@....de>,
Hannes Reinecke <hare@...e.de>, Ming Lei <ming.lei@...hat.com>,
cgroups@...r.kernel.org, linux-block@...r.kernel.org
Subject: Re: [linus:master] [block] 245618f8e4: stress-ng.fpunch.fail
On 5/22/25 7:59 AM, kernel test robot wrote:
>
>
> Hello,
>
>
> we don't have enough knowledge if this is a kernel issue or test case issue.
>
> =========================================================================================
> tbox_group/testcase/rootfs/kconfig/compiler/nr_threads/disk/testtime/fs/test/cpufreq_governor:
> lkp-icl-2sp4/stress-ng/debian-12-x86_64-20240206.cgz/x86_64-rhel-9.4/gcc-12/100%/1HDD/60s/xfs/fpunch/performance
>
> 3efe7571c3ae2b64 245618f8e45ff4f79327627b474
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :6 100% 6:6 stress-ng.fpunch.fail
>
> since the failure is persistent, just report what we observed in our tests FYI.
>
>
> kernel test robot noticed "stress-ng.fpunch.fail" on:
>
> commit: 245618f8e45ff4f79327627b474b563da71c2c75 ("block: protect wbt_lat_usec using q->elevator_lock")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master b36ddb9210e6812eb1c86ad46b66cc46aa193487]
> [test failed on linux-next/master 8566fc3b96539e3235909d6bdda198e1282beaed]
> [test failed on fix commit 9730763f4756e32520cb86778331465e8d063a8f]
>
> in testcase: stress-ng
> version: stress-ng-x86_64-1c71921fd-1_20250212
> with following parameters:
>
> nr_threads: 100%
> disk: 1HDD
> testtime: 60s
> fs: xfs
> test: fpunch
> cpufreq_governor: performance
>
>
>
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202505221030.760980df-lkp@intel.com
>
> 2025-03-20 08:33:52 mkdir -p /mnt/stress-ng
> 2025-03-20 08:33:52 mount /dev/sdc1 /mnt/stress-ng
> 2025-03-20 08:33:52 cd /mnt/stress-ng
> File: "/mnt/stress-ng"
> ID: 82100000000 Namelen: 255 Type: xfs
> Block size: 4096 Fundamental block size: 4096
> Blocks: Total: 78604800 Free: 78518242 Available: 78518242
> Inodes: Total: 157286400 Free: 157286397
> 2025-03-20 08:33:52 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --fpunch 128
> stress-ng: info: [4680] setting to a 1 min run per stressor
> stress-ng: info: [4680] dispatching hogs: 128 fpunch
> stress-ng: info: [4680] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
> stress-ng: warn: [4680] metrics-check: all bogo-op counters are zero, data may be incorrect
> stress-ng: metrc: [4680] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
> stress-ng: metrc: [4680] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
> stress-ng: metrc: [4680] fpunch 0 557.92 0.40 19.56 0.00 0.00 0.03 3180
> stress-ng: metrc: [4680] miscellaneous metrics:
> stress-ng: metrc: [4680] fpunch 2049.12 extents per file (geometric mean of 128 instances)
> stress-ng: info: [4680] for a 620.45s run time:
> stress-ng: info: [4680] 79418.05s available CPU time
> stress-ng: info: [4680] 0.40s user time ( 0.00%)
> stress-ng: info: [4680] 19.59s system time ( 0.02%)
> stress-ng: info: [4680] 19.99s total time ( 0.03%)
> stress-ng: info: [4680] load average: 250.69 349.62 213.80
> stress-ng: info: [4680] skipped: 0
> stress-ng: info: [4680] passed: 128: fpunch (128)
> stress-ng: info: [4680] failed: 0
> stress-ng: info: [4680] metrics untrustworthy: 0
> stress-ng: info: [4680] successful run completed in 10 mins, 20.45 secs
>
>
> we don't observe any abnormal output in dmesg. below is an example from parent
> commit.
>
> 2025-03-20 09:12:39 mkdir -p /mnt/stress-ng
> 2025-03-20 09:12:39 mount /dev/sdc1 /mnt/stress-ng
> 2025-03-20 09:12:39 cd /mnt/stress-ng
> File: "/mnt/stress-ng"
> ID: 82100000000 Namelen: 255 Type: xfs
> Block size: 4096 Fundamental block size: 4096
> Blocks: Total: 78604800 Free: 78518242 Available: 78518242
> Inodes: Total: 157286400 Free: 157286397
> 2025-03-20 09:12:39 stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --fpunch 128
> stress-ng: info: [4689] setting to a 1 min run per stressor
> stress-ng: info: [4689] dispatching hogs: 128 fpunch
> stress-ng: info: [4689] note: /proc/sys/kernel/sched_autogroup_enabled is 1 and this can impact scheduling throughput for processes not attached to a tty. Setting this to 0 may improve performance metrics
> stress-ng: metrc: [4689] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s CPU used per RSS Max
> stress-ng: metrc: [4689] (secs) (secs) (secs) (real time) (usr+sys time) instance (%) (KB)
> stress-ng: metrc: [4689] fpunch 1166 60.31 0.11 34.66 19.33 33.54 0.45 3164
> stress-ng: metrc: [4689] miscellaneous metrics:
> stress-ng: metrc: [4689] fpunch 2051.97 extents per file (geometric mean of 128 instances)
> stress-ng: info: [4689] for a 60.91s run time:
> stress-ng: info: [4689] 7796.93s available CPU time
> stress-ng: info: [4689] 0.11s user time ( 0.00%)
> stress-ng: info: [4689] 34.68s system time ( 0.44%)
> stress-ng: info: [4689] 34.79s total time ( 0.45%)
> stress-ng: info: [4689] load average: 325.78 93.83 32.28
> stress-ng: info: [4689] skipped: 0
> stress-ng: info: [4689] passed: 128: fpunch (128)
> stress-ng: info: [4689] failed: 0
> stress-ng: info: [4689] metrics untrustworthy: 0
> stress-ng: info: [4689] successful run completed in 1 min
>
>
> from above, parent can finish run in 1 min, then has "bogo ops" and "bogo ops/s"
>
> for 245618f8e4, the test seems run much longer, and the results for "bogo ops"
> and "bogo ops/s" are all 0.
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20250522/202505221030.760980df-lkp@intel.com
>
I tried reproducing this issue but I couldn't recreate it. Is it possible
for you to run this test on your setup using stress-ng option "--iostat 1"
as shown below ?
# stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --fpunch 128 --iostat 1
If you can run test with above option then please collect logs and share it.
That might help to further debug this.
Thanks,
--Nilay
Powered by blists - more mailing lists