lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190417211549.GA17200@agluck-desk>
Date:   Wed, 17 Apr 2019 14:15:49 -0700
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Cong Wang <xiyou.wangcong@...il.com>
Cc:     Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>, linux-edac@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 1/2] ras: fix an off-by-one error in __find_elem()

On Tue, Apr 16, 2019 at 07:37:41PM -0700, Cong Wang wrote:
> On Tue, Apr 16, 2019 at 7:31 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > Yes it is, I have a stacktrace in production which clearly shows
> > del_elem.isra.1+0x34/0x40, unlike the one I triggered via fake
> > PFN's. I can show you if you want, it is on 4.14, so very unlikely
> > it is interesting to anyone here.
> 
> Correct myself:
> 
> The one triggered via fake PFN's also shows
> del_elem.constprop.1+0x39/0x40.
> 
> Sorry for the mistake, I kinda read into another crash...

Ok. Now I follow what you are saying. I'll try to rephrase
here.

1) Initial setup is that we filled the array. Set the threshold
to a low value, and then repeatedly added a new pfn (with a
value higher than any in the array).

2) First time:
We find the array is full, so delete one item. Add the new pfn
to the end (in slot MAX_ELEM-1)

3) Next time. Array is full. Delete one item (assume not the one
for our target pfn). This moved our target to slot MAX_ELEM-2.

4) Next time. Array isn't full. We get a hit on our pfn. Bump
the count. Find that it exceeded the threshold, so ask to
offline it. We delete this from the array. It is a tail
deletion so we just decrement ca->n.

5) Final time. Array isn't full. When we look for our pfn in
the array it isn't in the array[0,ca->n) range. So we ought
to return ENOKEY. But we don't. When we fall out of the
search loop:

136         while (min < max) {
137                 int tmp = (max + min) >> 1;
138
139                 this_pfn = PFN(ca->array[tmp]);
140
141                 if (this_pfn < pfn)
142                         min = tmp + 1;
143                 else if (this_pfn > pfn)
144                         max = tmp;
145                 else {
146                         min = tmp;
147                         break;
148                 }
149         }

We exit because "min" is not less than "max". We assign
"*to" with the index of the available slot.

151         if (to)
152                 *to = min;

This line is just wrong. "min" is still within the allocated
bounds of the array, but the value is "ca->n", i.e. the first
unused slot. The "garbage" value we pick up from here is what
was last assigned to the slot when it was valid. Because we have
been adding the same pfn over and over, the value we pick up is
the pfn that we "deleted" in step "4" above. But remember we
deleted it just by adjusting ca->n, not by overwriting this
entry.

154         this_pfn = PFN(ca->array[min]);


So now this test succeeds, when it should fail. We return
an index of "ca->n".
156         if (this_pfn == pfn)
157                 return min;
158
159         return -ENOKEY;

Things go downhill fast from here. We update the count for
this entry. Find it is above the threshold, so offline the
page and try to delete.

226 static void del_elem(struct ce_array *ca, int idx)
227 {
228         /* Save us a function call when deleting the last element. */

idx is equal to ca->n. So this is "if (-1)" ... so we decide
we need to memmove some entries.  The array indices "idx" and "idx+1"
are not out of bounds for the allocated size, but they aren't what we
want. The real problem is the count passed to memmove.

229         if (ca->n - (idx + 1))
230                 memmove((void *)&ca->array[idx],
231                         (void *)&ca->array[idx + 1],
232                         (ca->n - (idx + 1)) * sizeof(u64));
233
234         ca->n--;
235 }


So I'm now liking this version of the patch by Cong:


diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 2d9ec378a8bc..b7cb0b9b1346 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -204,10 +204,12 @@ static int __find_elem(struct ce_array *ca, u64 pfn, unsigned int *to)
 	if (to)
 		*to = min;
 
-	this_pfn = PFN(ca->array[min]);
+	if (min < ca->n) {
+		this_pfn = PFN(ca->array[min]);
 
-	if (this_pfn == pfn)
-		return min;
+		if (this_pfn == pfn)
+			return min;
+	}
 
 	return -ENOKEY;
 }

Reviewed-by: Tony Luck <tony.luck@...el.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ