[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMzpN2hCrcQg_u5sp7WWGjOBv13+ZWtSAecp6bWpT6rsTyo+-Q@mail.gmail.com>
Date: Thu, 7 Nov 2019 21:12:38 -0500
From: Brian Gerst <brgerst@...il.com>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
Stephen Hemminger <stephen@...workplumber.org>,
Willy Tarreau <w@....eu>, Juergen Gross <jgross@...e.com>,
Sean Christopherson <sean.j.christopherson@...el.com>
Subject: Re: [patch 5/9] x86/ioport: Reduce ioperm impact for sane usage further
On Thu, Nov 7, 2019 at 8:12 PM H. Peter Anvin <hpa@...or.com> wrote:
>
> On 2019-11-07 13:44, Linus Torvalds wrote:
> > On Thu, Nov 7, 2019 at 1:00 PM Brian Gerst <brgerst@...il.com> wrote:
> >>
> >> There wouldn't have to be a flush on every task switch.
> >
> > No. But we'd have to flush on any switch that currently does that memcpy.
> >
> > And my point is that a tlb flush (even the single-page case) is likely
> > more expensive than the memcpy.
> >
> >> Going a step further, we could track which task is mapped to the
> >> current cpu like proposed above, and only flush when a different task
> >> needs the IO bitmap, or when the bitmap is being freed on task exit.
> >
> > Well, that's exactly my "track the last task" optimization for copying
> > the thing.
> >
> > IOW, it's the same optimization as avoiding the memcpy.
> >
> > Which I think is likely very effective, but also makes it fairly
> > pointless to then try to be clever..
> >
> > So the basic issue remains that playing VM games has almost
> > universally been slower and more complex than simply not playing VM
> > games. TLB flushes - even invlpg - tends to be pretty slow.
> >
> > Of course, we probably end up invalidating the TLB's anyway, so maybe
> > in this case we don't care. The ioperm bitmap is _technically_
> > per-thread, though, so it should be flushed even if the VM isn't
> > flushed...
> >
>
> One option, probably a lot saner (if we care at all, after all, copying 8K
> really isn't that much, but it might have some impact on real-time processes,
> which is one of the rather few use cases for direct I/O) would be to keep the
> bitmask in a pre-formatted TSS (ioperm being per thread, so no concerns about
> the TSS being in use on another processor), and copy the TSS fields (88 bytes)
> over if and only if the thread has been migrated to a different CPU, then
> switch the TSS rather than switching For the common case (no ioperms) we use
> the standard per-cpu TSS.
>
> That being said, I don't actually know that copying 88 bytes + LTR is any
> cheaper than copying 8K.
I don't think that can work. The TSS has to be at a fixed address in
the cpu_entry_area so that it is visible when running in usermode
(thanks to Meltdown).
--
Brian Gerst
Powered by blists - more mailing lists