[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ebcd033-818c-93da-9b86-8cd5e81f9590@kernel.org>
Date: Sun, 10 Nov 2019 09:21:15 -0800
From: Andy Lutomirski <luto@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Brian Gerst <brgerst@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Stephen Hemminger <stephen@...workplumber.org>,
Willy Tarreau <w@....eu>, Juergen Gross <jgross@...e.com>,
Sean Christopherson <sean.j.christopherson@...el.com>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [patch 5/9] x86/ioport: Reduce ioperm impact for sane usage
further
On 11/7/19 1:44 PM, Linus Torvalds wrote:
> On Thu, Nov 7, 2019 at 1:00 PM Brian Gerst <brgerst@...il.com> wrote:
>>
>> There wouldn't have to be a flush on every task switch.
>
> No. But we'd have to flush on any switch that currently does that memcpy.
>
> And my point is that a tlb flush (even the single-page case) is likely
> more expensive than the memcpy.
>
>> Going a step further, we could track which task is mapped to the
>> current cpu like proposed above, and only flush when a different task
>> needs the IO bitmap, or when the bitmap is being freed on task exit.
>
> Well, that's exactly my "track the last task" optimization for copying
> the thing.
>
> IOW, it's the same optimization as avoiding the memcpy.
>
> Which I think is likely very effective, but also makes it fairly
> pointless to then try to be clever..
>
> So the basic issue remains that playing VM games has almost
> universally been slower and more complex than simply not playing VM
> games. TLB flushes - even invlpg - tends to be pretty slow.
>
With my TLB-handling-writing-and-reviewing code on, NAK to any VM games
here.
Honestly, I almost think we should unconditionally copy the whole 8K for
the case where the ioperm() syscall has been used (i.e. not emulating
iopl()). The benefit simply does not justify the risk of getting it
wrong. I'm okay, but barely, with optimizing the end of the copied
range. Optimizing the start of the copied range is pushing it. Playing
MMU tricks and getting all the per-task-ioperm and invalidation right is
way beyond reasonable.
Even the time spent discussing how to optimize a case that has literally
one known user that none of us can bring ourselves to care about much
seems wasteful.
Powered by blists - more mailing lists