[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iJgRLPR_53vrd2zfYiU5ejcVWACtH6h_JPnvte6eSGOLg@mail.gmail.com>
Date: Fri, 26 May 2023 20:55:22 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
netdev <netdev@...r.kernel.org>
Subject: Re: x86 copy performance regression
On Fri, May 26, 2023 at 8:33 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Fri, May 26, 2023 at 10:51 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > Hmmm
> >
> > [ 25.532236] RIP: 0010:0xffffffffa5a85134
> > [ 25.536173] Code: Unable to access opcode bytes at 0xffffffffa5a8510a.
>
> This was the other reason I really didn't want to use alternatives on
> the conditional branch instructions. The relocations are really not
> very natural, and we have odd rules for those things. So I suspect our
> instruction rewriting simply gets this wrong, because that's such a
> nasty pattern.
>
> I really wanted my "just hardcode the instruction bytes" to work. Not
> only did it get me the small 2-byte conditional jump, it meant that
> there was no relocation on it. But objtool really hates not
> understanding what the alternatives code does.
>
> Which is fair enough, but it's frustrating here when it only results
> in more problems.
>
> Anyway, I guess *this* avoids all issues. It creates an extra jump to
> a jump for the case where the CPU doesn't have ERMS, but I guess we
> don't really care about those CPUs anyway.
>
> And it avoids all the "alternative instructions have relocations"
> issues. And it creates all small two-byte jumps, and the "rep movsb"
> fits exactly on that same 2 bytes too. Which I guess all argues for
> this being what I should have started with.
>
> This time it *really* works.
>
Indeed, this one is working and fixes the issue for me, thanks a lot !
New numbers look similar to 6.3 ones.
Tested-by: Eric Dumazet <edumazet@...gle.com>
Performance counter stats for 'taskset 02 ./tcp_mmap -H 2002:a05:6608:297::':
2,833.29 msec task-clock # 0.970
CPUs utilized
1,065 context-switches # 375.888
/sec
1 cpu-migrations # 0.353
/sec
128 page-faults # 45.177
/sec
10,297,389,329 cycles # 3.634
GHz
7,213,189,594 instructions # 0.70
insn per cycle
1,220,821,121 branches # 430.884
M/sec
10,430,907 branch-misses # 0.85% of
all branches
2.921180547 seconds time elapsed
0.005304000 seconds user
2.478561000 seconds sys
Powered by blists - more mailing lists