lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKohpomKj2Up9KibkXEL10rvd6E4fXdKghZAOhOCLt3TV82YLA@mail.gmail.com>
Date:	Fri, 9 May 2014 16:27:33 +0530
From:	Viresh Kumar <viresh.kumar@...aro.org>
To:	Preeti U Murthy <preeti@...ux.vnet.ibm.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Lists linaro-kernel <linaro-kernel@...ts.linaro.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Arvind Chauhan <arvind.chauhan@....com>,
	Kevin Hilman <khilman@...aro.org>
Subject: Re: [PATCH 1/2] hrtimer: reprogram event for expires=KTIME_MAX in hrtimer_force_reprogram()

On 9 May 2014 16:04, Preeti U Murthy <preeti@...ux.vnet.ibm.com> wrote:
> On 05/09/2014 02:10 PM, Viresh Kumar wrote:

> I looked through the code in arm_arch_timer.c and I think the more
> fundamental problem lies in the timer handler there. Ideally even before
> calling the tick event handler the timer handler must be programming the
> tick device to fire at some __MAX__ time.

Ideally, the device should have stopped events as we programmed it in
ONESHOT mode. And should have waited for kernel to set it again..

But probably that device doesn't have a ONESHOT mode and is firing
again and again. Anyway the real problem I was trying to solve wasn't
infinite interrupts coming from event dev, but the first extra event that
we should have got rid of .. It just happened that we got more problems
on this particular board.

> Then irrespective of whether the core kernel deems it appropriate to
> program it or not, the max time by which a timer interrupt will get
> deferred is __MAX__ and one will not find anomalies like what you saw.

We will still get a interrupt once the counter overflows. And that is bad too.

> The reason this got exposed in NOHZ_FULL config is because in a normal
> NOHZ scenario when the cpu goes idle, and there are no pending timers in
> timer_list, even then tick_sched_timer gets cancelled. Precisely the
> scenario that you have described.

I haven't tried but it looks like this problem will exist there as well.. Who is
disabling the event device in that case when tick_sched timer goes off ?
The same question that is applicable in this case as well..

>    But we don't get continuous interrupts then because the first time we
> get an interrupt, we queue the tick_sched_timer and program the tick
> device to the time of its expiry and therefore *push* the time at which
> your tick device should fire further.

Probably not.. We don't get continuous interrupts because that's a special
case for my platform. But I am quite sure you would be getting one extra
interrupt after tick period, but because we didn't had anything to service
hrtimer_interrupt() routine just returned and CPU went into idle.

> Moreover from the core kernel's perspective also this does not look like
> the right thing to do. The core timer code cannot *shutdown* a clock
> device simply because there are no pending timers.

Why? To me it looks the right thing to do..

> Some arch may change
> their notion of *shutdown* to rendering the tick device unusable. Some
> archs may already do that.

There is only one definition of 'Shutdown' for me which every platform
must implement. Stop the event device to give any new events. that's it.

>    Hence I don't think we should take a drastic measure as to shutdown
> the clock device in case of no pending timers,

Sorry, I still don't agree :) .. We don't know when is the next time we need
to use a service, so free it. What will we get by pushing it to a long
long time ?
What would we loose if we SHUTDOWN it now ?

> My suggestion is as pointed above to set the tick device to a KTIME_MAX
> equivalent before calling the timer interrupt event handler.

This would still interrupt on overflow, so isn't the right idea..
Not currently as there are limitations, but later on with NO_HZ_FULL a core
should be allowed to go into infinite isolation, unless the application running
on it wants.. And this pushing to KTIME_MAX wouldn't work in that case..

Thanks for your review and the long chats we had about this problem since
yesterday on IRC..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ