lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2601141950110.6421@angie.orcam.me.uk>
Date: Wed, 14 Jan 2026 20:59:52 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: David Laight <david.laight.linux@...il.com>
cc: kernel test robot <lkp@...el.com>, oe-kbuild-all@...ts.linux.dev, 
    linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>, 
    Linux Memory Management List <linux-mm@...ck.org>, 
    Nicolas Pitre <npitre@...libre.com>, linux-mips@...r.kernel.org, 
    Thomas Bogendoerfer <tsbogend@...ha.franken.de>
Subject: Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'

On Wed, 14 Jan 2026, David Laight wrote:

> > 	dmul	$2,$5,$6	 # 9	[c=20 l=4]  muldi3_mul3_nohilo
> > 	dmuhu	$5,$5,$6	 # 10	[c=44 l=4]  umuldi3_highpart_r6
> > 	daddu	$7,$2,$7	 # 14	[c=4 l=4]  *adddi3/1
> > 	sltu	$2,$7,$2	 # 16	[c=4 l=4]  *sltu_didi
> > 	sd	$7,0($4)	 # 21	[c=4 l=4]  *movdi_64bit/4
> > 	jr	$31	 # 44	[c=0 l=4]  *simple_return
> > 	daddu	$2,$2,$5	 # 29	[c=4 l=4]  *adddi3/1
> > 
> > (hmm, I wonder why the cost for the high-part RTX is over twice that for 
> > the low-part one; this seems outright wrong, also taking the possibility 
> > of fusing into account).
> 
> They might be different, if the wide multiply is implemented with multiple
> narrow ones then the high result bits don't need to be generated if only
> the low result bits are needed.

 Well, it's GCC that has DImode multiplication in `muldi3_mul3_nohilo' RTX 
but then TImode one combined with a shift and a truncation operation in 
`umuldi3_highpart_r6' RTX, and then applies some generic cost figures to 
the respective complete expression.  Instead the MIPS backend ought to 
provide the correct cost in both cases.

 Given the technology involved with MIPS MDUs I'd expect the same latency 
for both operations (DMULT/U used to produce both parts in one operation, 
but required a dedicated MDU accumulator register, which complicated both 
the pipeline and instruction scheduling in the compiler), and indeed e.g. 
the figures for the MIPS I6500 CPU give the latency of 4 for both DMUL/U 
and DMUH/U each.  That would be 16 in terms of GCC insn costs, as that's 
cycles multplied by 4 so as to allow "fractional" costs in special cases, 
and while using 20 instead is not too bad, the value of 44 is way off as 
it's almost triple the actual cost.

 Incidentally, the repeat rate is 1 for all these instructions, so the 
multiplier is fully pipelined in the I6500 implementation.  No fusion is 
mentioned though.

> If those are gcc's costs I suspect they may not match reality, after all they
> usually only have to be 'good enough' or 'reasonable'.

 Well, they need to be good enough for the compiler not to come up with a 
worse alternative, such as e.g. with repeated addition when one of the 
operands is immediate.

  Maciej

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ