[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10b5e7e55f18c14567d4076e76c204c8805b33c8.camel@gmail.com>
Date: Fri, 05 Feb 2021 17:35:12 -0300
From: Leonardo Bras <leobras.c@...il.com>
To: Fabiano Rosas <farosas@...ux.ibm.com>,
Paul Mackerras <paulus@...abs.org>,
Michael Ellerman <mpe@...erman.id.au>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Christophe Leroy <christophe.leroy@...roup.eu>,
Athira Rajeev <atrajeev@...ux.vnet.ibm.com>,
"Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
Jordan Niethe <jniethe5@...il.com>,
Nicholas Piggin <npiggin@...il.com>,
Frederic Weisbecker <frederic@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Geert Uytterhoeven <geert+renesas@...der.be>
Cc: kvm-ppc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/1] powerpc/kvm: Save Timebase Offset to fix
sched_clock() while running guest code.
Hello Fabiano,
Thanks for reviewing!
(answers inline)
On Fri, 2021-02-05 at 10:09 -0300, Fabiano Rosas wrote:
> Leonardo Bras <leobras.c@...il.com> writes:
>
> > Before guest entry, TBU40 register is changed to reflect guest timebase.
> > After exitting guest, the register is reverted to it's original value.
> >
> > If one tries to get the timestamp from host between those changes, it
> > will present an incorrect value.
> >
> > An example would be trying to add a tracepoint in
> > kvmppc_guest_entry_inject_int(), which depending on last tracepoint
> > acquired could actually cause the host to crash.
> >
> > Save the Timebase Offset to PACA and use it on sched_clock() to always
> > get the correct timestamp.
> >
> > Signed-off-by: Leonardo Bras <leobras.c@...il.com>
> > Suggested-by: Paul Mackerras <paulus@...abs.org>
> > ---
> > Changes since v1:
> > - Subtracts offset only when CONFIG_KVM_BOOK3S_HANDLER and
> > CONFIG_PPC_BOOK3S_64 are defined.
> > ---
> > arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
> > arch/powerpc/kernel/asm-offsets.c | 1 +
> > arch/powerpc/kernel/time.c | 8 +++++++-
> > arch/powerpc/kvm/book3s_hv.c | 2 ++
> > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++
> > 5 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
> > index 078f4648ea27..e2c12a10eed2 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> > @@ -131,6 +131,7 @@ struct kvmppc_host_state {
> > u64 cfar;
> > u64 ppr;
> > u64 host_fscr;
> > + u64 tb_offset; /* Timebase offset: keeps correct
> > timebase while on guest */
>
> Couldn't you use the vc->tb_offset_applied for this? We have a reference
> for the vcore in the hstate already.
But it's a pointer, which means we would have to keep checking for NULL
every time we need sched_clock().
Potentially it would cost a cache miss for PACA memory region that
contain vc, another for getting the part of *vc that contains the
tb_offset_applied, instead of only one for PACA struct region that
contains tb_offset.
On the other hand, it got me thinking: If the offset is applied per
cpu, why don't we get this info only in PACA, instead of in vc?
It could be a general way to get an offset applied for any purpose and
still get the sched_clock() right.
(Not that I have any idea of any other purpose we could use it)
Best regards!
Leonardo Bras
>
> > #endif
> > };
> >
> > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> > index b12d7c049bfe..0beb8fdc6352 100644
> > --- a/arch/powerpc/kernel/asm-offsets.c
> > +++ b/arch/powerpc/kernel/asm-offsets.c
> > @@ -706,6 +706,7 @@ int main(void)
> > HSTATE_FIELD(HSTATE_CFAR, cfar);
> > HSTATE_FIELD(HSTATE_PPR, ppr);
> > HSTATE_FIELD(HSTATE_HOST_FSCR, host_fscr);
> > + HSTATE_FIELD(HSTATE_TB_OFFSET, tb_offset);
> > #endif /* CONFIG_PPC_BOOK3S_64 */
> >
> > #else /* CONFIG_PPC_BOOK3S */
> > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> > index 67feb3524460..f27f0163792b 100644
> > --- a/arch/powerpc/kernel/time.c
> > +++ b/arch/powerpc/kernel/time.c
> > @@ -699,7 +699,13 @@ EXPORT_SYMBOL_GPL(tb_to_ns);
> > */
> > notrace unsigned long long sched_clock(void)
> > {
> > - return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> > + u64 tb = get_tb() - boot_tb;
> > +
> > +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_HANDLER)
> > + tb -= local_paca->kvm_hstate.tb_offset;
> > +#endif
> > +
> > + return mulhdu(tb, tb_to_ns_scale) << tb_to_ns_shift;
> > }
> >
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index b3731572295e..c08593c63353 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3491,6 +3491,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> > if ((tb & 0xffffff) < (new_tb & 0xffffff))
> > mtspr(SPRN_TBU40, new_tb + 0x1000000);
> > vc->tb_offset_applied = vc->tb_offset;
> > + local_paca->kvm_hstate.tb_offset = vc->tb_offset;
> > }
> >
> > if (vc->pcr)
> > @@ -3594,6 +3595,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> > if ((tb & 0xffffff) < (new_tb & 0xffffff))
> > mtspr(SPRN_TBU40, new_tb + 0x1000000);
> > vc->tb_offset_applied = 0;
> > + local_paca->kvm_hstate.tb_offset = 0;
> > }
> >
> > mtspr(SPRN_HDEC, 0x7fffffff);
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index b73140607875..8f7a9f7f4ee6 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -632,6 +632,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> > cmpdi r8,0
> > beq 37f
> > std r8, VCORE_TB_OFFSET_APPL(r5)
> > + std r8, HSTATE_TB_OFFSET(r13)
> > mftb r6 /* current host timebase */
> > add r8,r8,r6
> > mtspr SPRN_TBU40,r8 /* update upper 40 bits */
> > @@ -1907,6 +1908,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> > beq 17f
> > li r0, 0
> > std r0, VCORE_TB_OFFSET_APPL(r5)
> > + std r0, HSTATE_TB_OFFSET(r13)
> > mftb r6 /* current guest timebase */
> > subf r8,r8,r6
> > mtspr SPRN_TBU40,r8 /* update upper 40 bits */
Powered by blists - more mailing lists