lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2601141530510.6421@angie.orcam.me.uk>
Date: Wed, 14 Jan 2026 15:50:09 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: David Laight <david.laight.linux@...il.com>
cc: kernel test robot <lkp@...el.com>, oe-kbuild-all@...ts.linux.dev, 
    linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>, 
    Linux Memory Management List <linux-mm@...ck.org>, 
    Nicolas Pitre <npitre@...libre.com>, linux-mips@...r.kernel.org, 
    Thomas Bogendoerfer <tsbogend@...ha.franken.de>
Subject: Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'

On Wed, 14 Jan 2026, David Laight wrote:

> > > Looking at the git log for that file there is a comment that includes:
> > > 	"we wouldn't expect any calls to __multi3 to be generated from
> > > 	 kernel code".
> > > Not true....
> > > Not sure why the link didn't fail before though, something subtle must
> > > have changed.
> > > 
> > > I think the fix is just to remove the gcc version check.  
> > 
> >  Or rather fix the version check.  The GCC fix went in with GCC 10:
> 
> Does that mean the GCC 10 generates the multiply instructions and never calls
> __multi3?
> (Rather than just not using __multi3() for that specific example.)

 Of course it still does call `__multi3' for 128x128bit multiplication.  
It doesn't for widening 64x64bit one though, which was a missed case for 
MIPS64r6 only, having been supported by GCC ever since MIPS III ISA.  I 
think we do want to fail link in the 128x128bit case.

> In this case gcc knows the high bits are all zero - so just needs the two
> instructions to generate the high and low parts.

 Distinct RTL insns are produced, so all the usual RTL optimisations 
apply (in addition to any tree optimisations already made):

mul_u64_u64_add_u64:
	.frame	$sp,0,$31		# vars= 0, regs= 0/0, args= 0, gp= 0
	.mask	0x00000000,0
	.fmask	0x00000000,0
	.set	noreorder
	.set	nomacro
	dmul	$2,$5,$6	 # 9	[c=20 l=4]  muldi3_mul3_nohilo
	dmuhu	$5,$5,$6	 # 10	[c=44 l=4]  umuldi3_highpart_r6
	daddu	$7,$2,$7	 # 14	[c=4 l=4]  *adddi3/1
	sltu	$2,$7,$2	 # 16	[c=4 l=4]  *sltu_didi
	sd	$7,0($4)	 # 21	[c=4 l=4]  *movdi_64bit/4
	jr	$31	 # 44	[c=0 l=4]  *simple_return
	daddu	$2,$2,$5	 # 29	[c=4 l=4]  *adddi3/1

(hmm, I wonder why the cost for the high-part RTX is over twice that for 
the low-part one; this seems outright wrong, also taking the possibility 
of fusing into account).

  Maciej

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ