[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.21.1712052010280.15244@tbecker-rhat>
Date: Tue, 5 Dec 2017 21:03:02 -0200 (-02)
From: Thiago Rafael Becker <thiago.becker@...il.com>
To: Matthew Wilcox <willy@...radead.org>
cc: NeilBrown <neilb@...e.com>,
Thiago Rafael Becker <thiago.becker@...il.com>,
bfields@...ldses.org, linux-nfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3, V2] kernel: Move groups_sort to the caller of
set_groups.
On Tue, 5 Dec 2017, Matthew Wilcox wrote:
> On Tue, Dec 05, 2017 at 07:11:00AM +1100, NeilBrown wrote:
>> As we don't seem to be pursuing this possibility is probably isn't very
>> important, but I'd like to point out that the original fix isn't a true
>> fix.
>> It just sorts a shared group_info early. This does not stop corruption.
>> Every time a thread calls set_groups() on that group_info it will be
>> sorted again.
>> The sort algorithm used is the heap sort, and a heap sort always moves
>> elements in the array around - it does not leave a sorted array
>> untouched (unlike e.g. the quick sort which doesn't move anything in a
>> sorted array).
>> So it is still possible for two calls to groups_sort() to race.
>> We *need* to move groups_sort() out of set_groups().
Hum, makes sense. I've applied it to the most recent Fedora kernel (that
uses heapsort) and I didn't see the problem again. I should run a few more
repetitions to be sure.
> It must be relatively common to sort an already-sorted array. I wonder
> if something like this patch would be worthwhile?
>
> I have deliberately broken this patch so it can't be applied. I haven't
> tested it, and for all I know, I got the sign of cmp_func wrong.
>
> diff --git a/lib/sort.c b/lib/sort.c
> index d6b7a202b0b6..2b527fde6dad 100644
> --- a/lib/sort.c
> +++ b/lib/sort.c
> @@ -75,7 +75,14 @@ void sort(void *base, size_t num, size_t size,
> swap_func = generic_swap;
> }
>
> - /* heapify */
> + /* Do not sort an already-sorted array */
> + for (c = 0; c < (n - size); c += size) {
> + if (cmp_func(base + c, base + c + size) < 0)
> + goto heapify;
> + }
> + return;
> +
> +heapify:
> for ( ; i >= 0; i -= size) {
> for (r = i; r * 2 + size < n; r = c) {
> c = r * 2 + size;
The bug happens when two threads enter sort_groups for the same
group info in parallel, and one thread starts overwriting values
that another thread may already have "heapified" or sorted.
Thread A Thread B
Enter groups_sort
Enter groups_sort
.
.
.
Return from groups_sort
.
.
.
Return from groups_sort
Wouldn't this patch just make both threads see the structure as unsorted
and sort them?
Thanks,
trbecker
Powered by blists - more mailing lists