linux-kernel - Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAABZP2zj4B4P4D+3g7M8jdKSaj5qBVHuGtGJcdmLAfzBZiva7g@mail.gmail.com>
Date:   Mon, 28 Nov 2022 11:00:42 +0800
From:   Zhouyi Zhou <zhouzhouyi@...il.com>
To:     paulmck@...nel.org
Cc:     Thomas Gleixner <tglx@...utronix.de>, fweisbec@...il.com,
        mingo@...nel.org, dave@...olabs.net, josh@...htriplett.org,
        mpe@...erman.id.au, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH linux-next][RFC]torture: avoid offline tick_do_timer_cpu

Thank you all for your guidance and encouragement!

I learn how to construct commit message properly and learn how
important the role
that the torture test framework plays for the Linux kernel. Hope I can
be of benefit to the community by my work.

I am going to continue to study this topic and study the torture test
framework, and wait for your further instructions.

Best Regards
Zhouyi
On Mon, Nov 28, 2022 at 1:53 AM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Sun, Nov 27, 2022 at 01:40:28PM +0100, Thomas Gleixner wrote:
>
> [ . . . ]
>
> > >> No. We are not exporting this just to make a bogus test case happy.
> > >>
> > >> Fix the torture code to handle -EBUSY correctly.
> > > I am going to do a study on this, for now, I do a grep in the kernel tree:
> > > find . -name "*.c"|xargs grep cpuhp_setup_state|wc -l
> > > The result of the grep command shows that there are 268
> > > cpuhp_setup_state* cases.
> > > which may make our task more complicated.
> >
> > Why? The whole point of this torture thing is to stress the
> > infrastructure.
>
> Indeed.
>
> > There are quite some reasons why a CPU-hotplug or a hot-unplug operation
> > can fail, which is not a fatal problem, really.
> >
> > So if a CPU hotplug operation fails, then why can't the torture test
> > just move on and validate that the system still behaves correctly?
> >
> > That gives us more coverage than just testing the good case and giving
> > up when something unexpected happens.
>
> Agreed, with access to a function like the tick_nohz_full_timekeeper()
> suggested earlier in this email thread, then yes, it would make sense to
> try to offline the CPU anyway, then forgive the failure in cases where
> the CPU matches that indicated by tick_nohz_full_timekeeper().
>
> > I even argue that the torture test should inject random failures into
> > the hotplug state machine to achieve extended code coverage.
>
> I could imagine torture_onoff() telling various CPU-hotplug notifiers
> to refuse the transition using some TBD interface.  That would better
> test the CPU-hotplug common code's ability to deal with failures.
>
> Or did you have something else/additional in mind?
>
>                                                         Thanx, Paul