[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.2601141530510.6421@angie.orcam.me.uk>
Date: Wed, 14 Jan 2026 15:50:09 +0000 (GMT)
From: "Maciej W. Rozycki" <macro@...am.me.uk>
To: David Laight <david.laight.linux@...il.com>
cc: kernel test robot <lkp@...el.com>, oe-kbuild-all@...ts.linux.dev,
linux-kernel@...r.kernel.org, Andrew Morton <akpm@...ux-foundation.org>,
Linux Memory Management List <linux-mm@...ck.org>,
Nicolas Pitre <npitre@...libre.com>, linux-mips@...r.kernel.org,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>
Subject: Re: mips64-linux-ld: div64.c:undefined reference to `__multi3'
On Wed, 14 Jan 2026, David Laight wrote:
> > > Looking at the git log for that file there is a comment that includes:
> > > "we wouldn't expect any calls to __multi3 to be generated from
> > > kernel code".
> > > Not true....
> > > Not sure why the link didn't fail before though, something subtle must
> > > have changed.
> > >
> > > I think the fix is just to remove the gcc version check.
> >
> > Or rather fix the version check. The GCC fix went in with GCC 10:
>
> Does that mean the GCC 10 generates the multiply instructions and never calls
> __multi3?
> (Rather than just not using __multi3() for that specific example.)
Of course it still does call `__multi3' for 128x128bit multiplication.
It doesn't for widening 64x64bit one though, which was a missed case for
MIPS64r6 only, having been supported by GCC ever since MIPS III ISA. I
think we do want to fail link in the 128x128bit case.
> In this case gcc knows the high bits are all zero - so just needs the two
> instructions to generate the high and low parts.
Distinct RTL insns are produced, so all the usual RTL optimisations
apply (in addition to any tree optimisations already made):
mul_u64_u64_add_u64:
.frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0
.mask 0x00000000,0
.fmask 0x00000000,0
.set noreorder
.set nomacro
dmul $2,$5,$6 # 9 [c=20 l=4] muldi3_mul3_nohilo
dmuhu $5,$5,$6 # 10 [c=44 l=4] umuldi3_highpart_r6
daddu $7,$2,$7 # 14 [c=4 l=4] *adddi3/1
sltu $2,$7,$2 # 16 [c=4 l=4] *sltu_didi
sd $7,0($4) # 21 [c=4 l=4] *movdi_64bit/4
jr $31 # 44 [c=0 l=4] *simple_return
daddu $2,$2,$5 # 29 [c=4 l=4] *adddi3/1
(hmm, I wonder why the cost for the high-part RTX is over twice that for
the low-part one; this seems outright wrong, also taking the possibility
of fusing into account).
Maciej
Powered by blists - more mailing lists