linux-kernel - Re: [Regression or Fix]perf: profiling stats sigificantly changed for aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM9d7chFQ1L0h0av7ziXU4ja_j1FMRgwd-CHULnOB5YuH9yo2w@mail.gmail.com>
Date:   Mon, 20 Nov 2023 14:59:18 -0800
From:   Namhyung Kim <namhyung@...nel.org>
To:     David Wang <00107082@....com>
Cc:     Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
        acme@...nel.org, mark.rutland@....com,
        alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
        irogers@...gle.com, adrian.hunter@...el.com,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [Regression or Fix]perf: profiling stats sigificantly changed for
 aio_write/read(ext4) between 6.7.0-rc1 and 6.6.0

On Fri, Nov 17, 2023 at 5:48 PM David Wang <00107082@....com> wrote:
>
>
> At 2023-11-18 05:11:02, "Namhyung Kim" <namhyung@...nel.org> wrote:
> >On Wed, Nov 15, 2023 at 8:09 PM David Wang <00107082@....com> wrote:
> >>
>
> >>
> >>
> >> From the data I collected, I think two problem could be observed for f06cc667f79909e9175460b167c277b7c64d3df0
> >> 1. sample missing.
> >> 2. sample unstable, total sample count drift a lot between tests.
> >
> >Hmm.. so the fio process was running in the background during
> >the profiling, right?  But I'm not sure how you measured the same
> >amount of time.  Probably you need to run this (for 10 seconds):
> >
> >  sudo perf record -a -G mytest -- sleep 10
> >
> >And I guess you don't run the perf command in the target cgroup
> >which is good.
> >
>
> Yes  profiling process was not in the target cgroup.
> I use  fio with `fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test  --bs=4k --iodepth=64 --size=1G --readwrite=randrw  --runtime=600 --numjobs=4 --time_based=1` which would run 600 seconds.
> There would be drifts in the profiling report between runs,  from those small  samples of test data I collected, maybe not enough to make a firm conclusion,  I feel when the commit is reverted, the expectation for total sample count is higher and the standard deviation is smaller.
>
> >And is there any chance if it's improved because of the change?
> >Are the numbers in 6.7 better or worse?
> >
> I have no idea whether the change of expected total sample count a bug or a fix,  but,  the observed result that total sample count drift a lot (bigger standard deviation), I think ,  is a bad thing.

Right.  Can you run perf stat to measure the number of context
switches and cgroup switches, then?

  sudo perf stat -a -e context-switches,cgroup-switches -- sleep 10

Thanks,
Namhyung