[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sgddaru7.fsf@nanos.tec.linutronix.de>
Date: Mon, 27 Jul 2020 17:41:04 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: qianjun.kernel@...il.com, peterz@...radead.org, will@...nel.org,
luto@...nel.org
Cc: linux-kernel@...r.kernel.org, laoar.shao@...il.com,
urezki@...il.com, jun qian <qianjun.kernel@...il.com>
Subject: Re: [PATCH V4] Softirq:avoid large sched delay from the pending softirqs
Qian,
qianjun.kernel@...il.com writes:
> /*
> * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times,
> * but break the loop if need_resched() is set or after 2 ms.
> - * The MAX_SOFTIRQ_TIME provides a nice upper bound in most cases, but in
> - * certain cases, such as stop_machine(), jiffies may cease to
> - * increment and so we need the MAX_SOFTIRQ_RESTART limit as
> - * well to make sure we eventually return from this method.
> + * In the loop, if the processing time of the softirq has exceeded 2
> + * milliseconds, we also need to break the loop to wakeup the
> ksofirqd.
You are removing the MAX_SOFTIRQ_RESTART limit explanation and I rather
have MAX_SOFTIRQ_TIME_NS there than '2 milliseconds' in case the value
gets adjusted later on. Also while sched_clock() is granular on many
systems it still can be jiffies based and then the above problem
persists.
> @@ -299,6 +298,19 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
> }
> h++;
> pending >>= softirq_bit;
> +
> + /*
> + * the softirq's action has been running for too much time
> + * so it may need to wakeup the ksoftirqd
> + */
> + if (need_resched() && sched_clock() > end) {
> + /*
> + * Ensure that the remaining pending bits are
> + * handled.
> + */
> + or_softirq_pending(pending << (vec_nr + 1));
To or the value interrupts need to be disabled because otherwise you can
lose a bit when an interrupt happens in the middle of the RMW operation
and raises a softirq which is not in @pending and not in the per CPU
local softirq pending storage.
There is another problem. Assume bit 0 and 1 are pending when the
processing starts. Now it breaks out after bit 0 has been handled and
stores back bit 1 as pending. Before ksoftirqd runs bit 0 gets raised
again. ksoftirqd runs and handles bit 0, which takes more than the
timeout. As a result the bit 0 processing can starve all other softirqs.
So this needs more thought.
Thanks,
tglx
Powered by blists - more mailing lists