[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <45ca8086-b7ce-1561-c7d1-3580872d2ba8@intel.com>
Date: Mon, 2 Dec 2019 16:31:52 +0800
From: Rong Chen <rong.a.chen@...el.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, mceier@...il.com,
Davidlohr Bueso <dave@...olabs.net>,
Davidlohr Bueso <dbueso@...e.de>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Borislav Petkov <bp@...en8.de>,
LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
"Kenneth R. Crudup" <kenny@...ix.com>
Subject: Re: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search
Hi Ingo,
The patch fixes the regression reported by 0day-CI.
"[LKP] [x86/mm/pat] 8d04a5f97a: phoronix-test-suite.glmark2.0.score
-23.7% regression":
https://lkml.kernel.org/r/20191127005312.GD20422@shao2-debian
Best Regards,
Rong Chen
On 12/1/19 10:49 PM, Ingo Molnar wrote:
> * Ingo Molnar <mingo@...nel.org> wrote:
>
>> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>>> But the final difference is a real difference where it used to be WC,
>>> and is now UC-:
>>>
>>> -write-combining @ 0x2000000000-0x2100000000
>>> -write-combining @ 0x2000000000-0x2100000000
>>> +uncached-minus @ 0x2000000000-0x2100000000
>>> +uncached-minus @ 0x2000000000-0x2100000000
>>>
>>> which certainly could easily explain the huge performance degradation.
>> It's not an unconditional regression, as both Boris and me tried to
>> reproduce it on different systems that do ioremap_wc() as well and didn't
>> measure a slowdown, but something about the memory layout probably
>> triggers the tree management bug.
> Ok, I think I found at least one bug in the new PAT code, the conversion
> of memtype_check_conflict() is buggy I think:
>
> 8d04a5f97a5f: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
>
> dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
> found_type = match->type;
>
> - node = rb_next(&match->rb);
> - while (node) {
> - match = rb_entry(node, struct memtype, rb);
> -
> - if (match->start >= end) /* Checked all possible matches */
> - goto success;
> -
> - if (is_node_overlap(match, start, end) &&
> - match->type != found_type) {
> + match = memtype_interval_iter_next(match, start, end);
> + while (match) {
> + if (match->type != found_type)
> goto failure;
> - }
>
> - node = rb_next(&match->rb);
> + match = memtype_interval_iter_next(match, start, end);
> }
>
>
> Note how the '>= end' condition to end the interval check, got converted
> into:
>
> + match = memtype_interval_iter_next(match, start, end);
>
> This is subtly off by one, because the interval trees interfaces require
> closed interval parameters:
>
> include/linux/interval_tree_generic.h
>
> /* \
> * Iterate over intervals intersecting [start;last] \
> * \
> * Note that a node's interval intersects [start;last] iff: \
> * Cond1: ITSTART(node) <= last \
> * and \
> * Cond2: start <= ITLAST(node) \
> */ \
>
> ...
>
> if (ITSTART(node) <= last) { /* Cond1 */ \
> if (start <= ITLAST(node)) /* Cond2 */ \
> return node; /* node is leftmost match */ \
>
> [start;last] is a closed interval (note that '<= last' check) - while the
> PAT 'end' parameter is 1 byte beyond the end of the range, because
> ioremap() and the other mapping APIs usually use the [start,end)
> half-open interval, derived from 'size'.
>
> This is what ioremap() does for example:
>
> /*
> * Mappings have to be page-aligned
> */
> offset = phys_addr & ~PAGE_MASK;
> phys_addr &= PHYSICAL_PAGE_MASK;
> size = PAGE_ALIGN(last_addr+1) - phys_addr;
>
> retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
> pcm, &new_pcm);
>
>
> phys_addr+size will be on a page boundary, after the last byte of the
> mapped interval.
>
> So the correct parameter to use in the interval tree searches is not
> 'end' but 'end-1'.
>
> This could have relevance if conflicting PAT ranges are exactly adjacent,
> for example a future WC region is followed immediately by an already
> mapped UC- region - in this case memtype_check_conflict() would
> incorrectly deny the WC memtype region and downgrade the memtype to UC-.
>
> BTW., rather annoyingly this downgrading is done silently in
> memtype_check_insert():
>
> int memtype_check_insert(struct memtype *new,
> enum page_cache_mode *ret_type)
> {
> int err = 0;
>
> err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
> if (err)
> return err;
>
> if (ret_type)
> new->type = *ret_type;
>
> memtype_interval_insert(new, &memtype_rbroot);
> return 0;
> }
>
>
> So on such a conflict we'd just silently get UC- in *ret_type, and write
> it into the new region, never the wiser ...
>
> So assuming that the patch below fixes the primary bug the diagnostics
> side of ioremap() cache attribute downgrades would be another thing to
> fix.
>
> Anyway, I checked all the interval-tree iterations, and most of them are
> off by one - but I think the one related to memtype_check_conflict() is
> the one causing this particular performance regression.
>
> The only correct interval-tree searches were these two:
>
> arch/x86/mm/pat_interval.c: match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
> arch/x86/mm/pat_interval.c: match = memtype_interval_iter_next(match, 0, ULONG_MAX);
>
> The ULONG_MAX was hiding the off-by-one in plain sight. :-)
>
> So it would be nice if everyone who is seeing this bug could test the
> patch below against Linus's latest tree - does it fix the regression?
>
> If not then please send the before/after dump of
> /sys/kernel/debug/x86/pat_memtype_list - and even if it works please send
> the dumps so we can double check it all.
>
> Note that the bug was benign in the sense of implementing a too strict
> cache attribute conflict policy and downgrading cache attributes - so
> AFAICS the worst outcome of this bug would be a performance regression.
>
> Patch is only lightly tested, so take care. (Patch is emphatically not
> signed off yet, because I spent most of the day on this and I don't yet
> trust my fix - all of the affected sites need to be reviewed more
> carefully.)
>
> Thanks,
>
> Ingo
>
>
> ====================>
> From: Ingo Molnar <mingo@...nel.org>
> Date: Sun, 1 Dec 2019 15:25:50 +0100
> Subject: [PATCH] x86/pat: Fix off-by-one bugs in interval tree search
>
> NOT-Signed-off-by: Ingo Molnar <mingo@...nel.org>
> ---
> arch/x86/mm/pat_interval.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/mm/pat_interval.c b/arch/x86/mm/pat_interval.c
> index 47a1bf30748f..6855362eaf21 100644
> --- a/arch/x86/mm/pat_interval.c
> +++ b/arch/x86/mm/pat_interval.c
> @@ -56,7 +56,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
> {
> struct memtype *match;
>
> - match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> + match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
> while (match != NULL && match->start < end) {
> if ((match_type == MEMTYPE_EXACT_MATCH) &&
> (match->start == start) && (match->end == end))
> @@ -66,7 +66,7 @@ static struct memtype *memtype_match(u64 start, u64 end, int match_type)
> (match->start < start) && (match->end == end))
> return match;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> }
>
> return NULL; /* Returns NULL if there is no match */
> @@ -79,7 +79,7 @@ static int memtype_check_conflict(u64 start, u64 end,
> struct memtype *match;
> enum page_cache_mode found_type = reqtype;
>
> - match = memtype_interval_iter_first(&memtype_rbroot, start, end);
> + match = memtype_interval_iter_first(&memtype_rbroot, start, end-1);
> if (match == NULL)
> goto success;
>
> @@ -89,12 +89,12 @@ static int memtype_check_conflict(u64 start, u64 end,
> dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
> found_type = match->type;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> while (match) {
> if (match->type != found_type)
> goto failure;
>
> - match = memtype_interval_iter_next(match, start, end);
> + match = memtype_interval_iter_next(match, start, end-1);
> }
> success:
> if (newtype)
> @@ -160,7 +160,7 @@ struct memtype *memtype_erase(u64 start, u64 end)
> struct memtype *memtype_lookup(u64 addr)
> {
> return memtype_interval_iter_first(&memtype_rbroot, addr,
> - addr + PAGE_SIZE);
> + addr + PAGE_SIZE-1);
> }
>
> #if defined(CONFIG_DEBUG_FS)
Powered by blists - more mailing lists