linux-kernel - Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1910262026340.10190@nanos.tec.linutronix.de>
Date:   Sat, 26 Oct 2019 20:48:27 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Christophe Leroy <christophe.leroy@....fr>
cc:     Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Paul Mackerras <paulus@...ba.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        vincenzo.frascino@....com, luto@...nel.org,
        linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation.

On Sat, 26 Oct 2019, Christophe Leroy wrote:
> Le 26/10/2019 à 17:53, Thomas Gleixner a écrit :
> > > > > > gettimeofday:    vdso: 750 nsec/call
> > > > > > 
> > > > > > gettimeofday:    vdso: 1533 nsec/call
> > > > 
> > > > Small improvement (3%) with the proposed change:
> > > > 
> > > > gettimeofday:    vdso: 1485 nsec/call
> > > 
> > > By inlining do_hres() I get the following:
> > > 
> > > gettimeofday:    vdso: 1072 nsec/call
> > 
> > What's the effect for clock_gettime()?
> > 
> > gettimeofday() is suboptimal vs. the PPC ASM variant due to an extra
> > division, but clock_gettime() should be 1:1 comparable.
> > 
> 
> Original PPC asm:
> clock-gettime-realtime:    vdso: 928 nsec/call
> 
> My original RFC:
> clock-gettime-realtime:    vdso: 1570 nsec/call
> 
> With your suggested vdso_calc_delta():
> clock-gettime-realtime:    vdso: 1512 nsec/call
> 
> With your vdso_calc_delta() and inlined do_hres():
> clock-gettime-realtime:    vdso: 1302 nsec/call

That does not make any sense at all.

gettimeofday() is basically the same as clock_gettime(REALTIME) and does
an extra division. So I would expect it to be slower. 

Let's look at the code:

__cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
{
        const struct vdso_data *vd = __arch_get_vdso_data();

        if (likely(tv != NULL)) {
		struct __kernel_timespec ts;

                if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts))
                        return gettimeofday_fallback(tv, tz);

                tv->tv_sec = ts.tv_sec;
                tv->tv_usec = (u32)ts.tv_nsec / NSEC_PER_USEC;

IIRC PPC did some magic math tricks to avoid that. Could you just for the
fun of it replace this division with

       (u32)ts.tv_nsec >> 10;

That's obviously incorrect, but it would tell us how heavy the division
is. If that brings us close we might do something special for
gettimeofday().

OTOH, last time I checked clock_gettime() was by far more used than
gettimeofday() but that obviously depends on the use cases.

        }
	...
}

and

__cvdso_clock_gettime_common(clockid_t clock, struct __kernel_timespec *ts)
{
        const struct vdso_data *vd = __arch_get_vdso_data();
        u32 msk;

	/* Check for negative values or invalid clocks */
        if (unlikely((u32) clock >= MAX_CLOCKS))
                return -1;

        /*
         * Convert the clockid to a bitmask and use it to check which
         * clocks are handled in the VDSO directly.
         */
        msk = 1U << clock;
        if (likely(msk & VDSO_HRES)) {
		return do_hres(&vd[CS_HRES_COARSE], clock, ts);

So this is the extra code which is executed for clock_gettime(REAL) which
is pure logic and certainly not as heavyweight as the division in the
gettimeofday() code path.

}

static __maybe_unused int
__cvdso_clock_gettime(clockid_t clock, struct __kernel_timespec *ts)
{
        int ret = __cvdso_clock_gettime_common(clock, ts);

	if (unlikely(ret))
                return clock_gettime_fallback(clock, ts);
        return 0;
}

One thing which might be worth to try as well is to mark all functions in
that file as inline. The speedup by the do_hres() inlining was impressive
on PPC.

Thanks,

	tglx