lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKc596+cs7SMT493HjfiX_1E+aGdL9aaPgP_KqT5+gXrgsZVCA@mail.gmail.com>
Date:   Tue, 28 Jul 2020 09:35:16 +0800
From:   jun qian <qianjun.kernel@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     peterz@...radead.org, will@...nel.org, luto@...nel.org,
        linux-kernel@...r.kernel.org, Yafang Shao <laoar.shao@...il.com>,
        Uladzislau Rezki <urezki@...il.com>
Subject: Re: [PATCH V4] Softirq:avoid large sched delay from the pending softirqs

On Mon, Jul 27, 2020 at 11:41 PM Thomas Gleixner <tglx@...utronix.de> wrote:
>
> Qian,
>
> qianjun.kernel@...il.com writes:
> >  /*
> >   * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> >   * but break the loop if need_resched() is set or after 2 ms.
> > - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> > - * certain cases, such as stop_machine(), jiffies may cease to
> > - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> > - * well to make sure we eventually return from this method.
> > + * In the loop, if the processing time of the softirq has exceeded 2
> > + * milliseconds, we also need to break the loop to wakeup the
> > ksofirqd.
>
> You are removing the MAX_SOFTIRQ_RESTART limit explanation and I rather
> have MAX_SOFTIRQ_TIME_NS there than '2 milliseconds' in case the value
> gets adjusted later on. Also while sched_clock() is granular on many
> systems it still can be jiffies based and then the above problem
> persists.
>
> > @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> >               }
> >               h++;
> >               pending >>= softirq_bit;
> > +
> > +             /*
> > +              * the softirq's action has been running for too much time
> > +              * so it may need to wakeup the ksoftirqd
> > +              */
> > +             if (need_resched() && sched_clock() > end) {
> > +                     /*
> > +                      * Ensure that the remaining pending bits are
> > +                      * handled.
> > +                      */
> > +                     or_softirq_pending(pending << (vec_nr + 1));
>
> To or the value interrupts need to be disabled because otherwise you can
> lose a bit when an interrupt happens in the middle of the RMW operation
> and raises a softirq which is not in @pending and not in the per CPU
> local softirq pending storage.
>
> There is another problem. Assume bit 0 and 1 are pending when the
> processing starts. Now it breaks out after bit 0 has been handled and
> stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
> again. ksoftirqd runs and handles bit 0, which takes more than the
> timeout. As a result the bit 0 processing can starve all other softirqs.
>
I got it. I will try to slove this problem. Thanks.

> So this needs more thought.
>
> Thanks,
>
>         tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ