[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5509D62B.7030507@imgtec.com>
Date: Wed, 18 Mar 2015 12:46:51 -0700
From: Leonid Yegoshin <Leonid.Yegoshin@...tec.com>
To: James Hogan <James.Hogan@...tec.com>,
"linux-mips@...ux-mips.org" <linux-mips@...ux-mips.org>,
"wangr@...ote.com" <wangr@...ote.com>,
"peterz@...radead.org" <peterz@...radead.org>,
Qais Yousef <Qais.Yousef@...tec.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ralf@...ux-mips.org" <ralf@...ux-mips.org>,
"davidlohr@...com" <davidlohr@...com>,
"chenhc@...ote.com" <chenhc@...ote.com>,
"manuel.lauss@...il.com" <manuel.lauss@...il.com>,
"mingo@...nel.org" <mingo@...nel.org>
Subject: Re: [PATCH] MIPS: MSA: misaligned support
On 03/18/2015 04:27 AM, James Hogan wrote:
>
> + .align 4
> doesn't this mean the first one & label might not be suitably aligned.
> Would it be better to put this before the ld_d (no need for it after
> $w31 case) and putting another .align 4 before the Lmsa_to and Lmsa_from
> labels (so the label itself is aligned)?
Good point! I will issue V2.
>
> +
> + case 2: /* word */
> + to->val32[0] = from->val32[1];
> + to->val32[1] = from->val32[0];
> + to->val32[2] = from->val32[3];
> + to->val32[3] = from->val32[2];
> FWIW since the FP/MSA patches that Paul submitted, there are also
> working endian agnostic accessors created with BUILD_FPR_ACCESS, which
> use the FPR_IDX macro (see http://patchwork.linux-mips.org/patch/9169/),
> which should work for 8bit and 16bit sizes too.
>
> I wonder if the compiler would unroll/optimise this sort of thing:
> for (i = 0; i < (FPU_REG_WIDTH / 8); ++i)
> to_val8[i] = from->val[FPR_IDX(8, i)];
>
> No worries if not.
There is a simple logic behind it - any code rolling in macro/subroutine
definitely
has sense if any of that is correct:
- code can't be observed by single look (more than half page or window)
- there is a chance to reuse some part of it
- some code vars/limits/bit/positions/etc can be changed in future
- some logical connection to macro/subroutine is desirable to take attn
during code maintenance.
None of it besides last one is true here. Logical connection to union
fpureg is done. Changing of MSA register size definitely causes a lot of
other changes in logic and many assumptions may be broken in multiple
places, including process signal conventions.
Rolling code in macro (FPR_IDX) and turning it into loop from pretty
short assignment actually INCREASES a code complexity for future
maintenance.
>
>> + break;
>> +
>> + case 3: /* doubleword, no conversion */
>> + break;
> don't you still need to copy the value though?
Good point! Never test this subroutine yet, HW team still should produce
BIG ENDIAN CPU+MSA variant.
I will issue V2.
>
>> + }
>> +}
>> +#endif
>> +#endif
>> +
>> static void emulate_load_store_insn(struct pt_regs *regs,
>> void __user *addr, unsigned int __user *pc)
>> {
>> @@ -434,6 +497,10 @@ static void emulate_load_store_insn(struct pt_regs *regs,
>> #ifdef CONFIG_EVA
>> mm_segment_t seg;
>> #endif
>> +#ifdef CONFIG_CPU_HAS_MSA
>> + union fpureg msadatabase[2], *msadata;
>> + unsigned int func, df, rs, wd;
>> +#endif
>> origpc = (unsigned long)pc;
>> orig31 = regs->regs[31];
>>
>> @@ -703,6 +770,82 @@ static void emulate_load_store_insn(struct pt_regs *regs,
>> break;
>> return;
>>
>> +#ifdef CONFIG_CPU_HAS_MSA
>> + case msa_op:
>> + if (cpu_has_mdmx)
>> + goto sigill;
>> +
>> + func = insn.msa_mi10_format.func;
>> + switch (func) {
>> + default:
>> + goto sigbus;
>> +
>> + case msa_ld_op:
>> + case msa_st_op:
>> + ;
>> + }
>> +
>> + if (!thread_msa_context_live())
>> + goto sigbus;
> Will this ever happen? (I can't see AdE handler enabling interrupts).
>
> If the MSA context genuinely isn't live (i.e. it can be considered
> UNPREDICTABLE), then surely a load operation should still succeed?
thread_msa_context_live() == check of TIF_MSA_CTX_LIVE == existence of
MSA context for thread.
It differs from MSA is owned by thread, it just says that thread has
already initialized MSA.
Unfortunate choice of function name, I believe.
This is a guard against bad selection of exception priorities in HW...
sometime it happens.
>
>> +
>> + df = insn.msa_mi10_format.df;
>> + rs = insn.msa_mi10_format.rs;
>> + wd = insn.msa_mi10_format.wd;
>> + addr = (unsigned long *)(regs->regs[rs] + (insn.msa_mi10_format.s10 * (1 << df)));
> "* (1 << df)"?
> why not just "<< df"?
>
>> + /* align a working space in stack... */
>> + msadata = (union fpureg *)(((unsigned long)msadatabase + 15) & ~(unsigned long)0xf);
> Maybe you could just use __aligned(16) on a single local union fpureg.
I am not sure that it works in stack. I don't trust toolchain here -
they even are not able to align a frame in stack to 16bytes.
>
>> + }
>> + } else {
>> + if (!access_ok(VERIFY_WRITE, addr, 16))
>> + goto sigbus;
>> + compute_return_epc(regs);
> forgot to preempt_disable()?
Yes.
>
>> + if (test_thread_flag(TIF_USEDMSA)) {
>> +#ifdef __BIG_ENDIAN
>> + msa_from_wd(wd, ¤t->thread.fpu.fpr[wd]);
>> + msa_convert(msadata, ¤t->thread.fpu.fpr[wd], df);
>> +#else
>> + msa_from_wd(wd, msadata);
>> +#endif
>> + preempt_enable();
>> + } else {
>> + preempt_enable();
>> +#ifdef __BIG_ENDIAN
>> + msa_convert(msadata, ¤t->thread.fpu.fpr[wd], df);
>> +#else
>> + *msadata = current->thread.fpu.fpr[wd];
> hmm, you could cheat and change this to the following?:
> msadata = ¤t->thread.fpu.fpr[wd];
Yes. But has it sense? It is just 2 doublewords is replaced by single
PTR assignment. However, if msadata is not changed it gives a compiler
some room for optimization.
- Leonid.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists