lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1908190947290.1923@nanos.tec.linutronix.de>
Date:   Mon, 19 Aug 2019 10:04:24 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Arul Jeniston <arul.jeniston@...il.com>
cc:     viro@...iv.linux.org.uk, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, arul_mc@...l.com
Subject: Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read
 function.

Arul,

On Mon, 19 Aug 2019, Arul Jeniston wrote:
> During normal run, we do see small amount (~1000 cycles) of backward
> time drifts one in a while.
> This is likely happens due to the race between multiple processors and
> ISR routines.

No.

> We have added a hook to read_tsc() and observed backward time drift
> when isr comes between reading tsc register and returning the value.
> This drifting time differs based on the number of isr handled and the
> time taken to service each isr.

This is not a drift. Please do not misuse technical expressions which have
a well defined meaning.

rdtsc()
  val = read()

interrupt
	....

  return val

Time does not go backwards in that case simply because at the point it was
taken it was correct.

Versus that timerfd problem this situaiton is completely irrelevant simply
because hrtimer_forward_now() happens _AFTER_ the timer was expired not
before.

So the read of CLOCK_MONOTONIC in hrtimer_forward_now() is _AFTER_ the read
of CLOCK_MONOTONIC in hrtimer_interrupt() which expires the timer and there
are only two issue which can make that read in hrtimer_forward_now() go
backwards vs. the time which was read when the timer was expired:

  1) TSCs are out of sync or affected otherwise

  2) Timekeeping has a bug.

That's where the problem lies it needs to be analyzed whether this is
caused by #1 or by #2. Once we know that we can discuss solutions.

> Agreed. Our intention is not to put a workaround. Intention is to
> write a reliable application that handles all values returned by a
> system call.
> At present, the application doesn't know  whether 0 return value is a
> bug or valid case.

Again, you are tackling the wrong end. You need to find, analyze and fix
the root cause.

> > Is the timer expiry and the timerfd_read() on the same CPU or on different
> > ones?
> 
> We don't have data to answer this. However, the kernel is configured
> to allow timer migration.
> So, we believe, the timer expiry and timerfd_read happens on different CPUs.

Believe is a matter of religion and pretty useless to analyze technical
problems. It's not rocket science to figure this out with tracing.

> > Can you please provide a full dmesg from boot to after the point where this
> > failure happens?
> 
> We don't see any logs in dmesg during the occurrence of this problem.
> We may not be able to share complete dmesg logs due to security reasons.
> We haven't seen any time drifting related messages too.
> Let us know, if you are looking for any specific log message.

I was asking for a full boot log for a reason. Is it impossible to stick
that into a mail?

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ