[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <95fa5a69-b029-08b7-4c2e-26661bcae052@gmx.com>
Date: Thu, 4 Nov 2021 19:40:01 +0800
From: Qu Wenruo <quwenruo.btrfs@....com>
To: dsterba@...e.cz, Linus Torvalds <torvalds@...ux-foundation.org>,
David Sterba <dsterba@...e.com>, Qu Wenruo <wqu@...e.com>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] Btrfs updates for 5.16
On 2021/11/2 22:50, David Sterba wrote:
> On Mon, Nov 01, 2021 at 01:03:25PM -0700, Linus Torvalds wrote:
>> On Mon, Nov 1, 2021 at 9:46 AM David Sterba <dsterba@...e.com> wrote:
>>>
>>> There's a merge conflict due to the last minute 5.15 changes (kmap
>>> reverts) and the conflict is not trivial.
>>
>> You don't say.
>>
>> I ended up just re-doing that resolution entirely, and as I did so, I
>> think I found a bug in the original revert that caused the conflict in
>> the first place.
>>
>> And since that revert made it into 5.15, I felt like I had to fix that
>> bug first - and separately - so that the fix can be backported to
>> stable.
>>
>> I then re-did my merge on top of that hopefully fixed state, and maybe
>> it's correct.
>>
>> Or maybe I messed up entirely.
>>
>> I did end up comparing it to your other branch too, but that was
>> equally as messy, apart from the "ok, I can mindlessly just take your
>> side".
>>
>> And it was fairly different from what I had done in my merge
>> resolution, so who knows.
>
> Looks good to me, thanks for catching the bug.
>
>> ANYWAY. What I'm trying to say is that you should look very very
>> carefully at commits
>>
>> 2cf3f8133bda ("btrfs: fix lzo_decompress_bio() kmap leakage")
>> 037c50bfbeb3 ("Merge tag 'for-5.16-tag' of git://git.../linux")
>>
>> because I marked that first one for stable, and the second is
>> obviously my entirely untested merge.
>>
>> It makes sense to me, but apart from "it builds", I've not actually
>> tested any of it. This is all purely from looking at the code and
>> trying to figure out what the RightThing(tm) is.
>>
>> Obviously the kmap thing tends to only be noticeable on 32-bit
>> platforms, and that lzo_decompress_bio() bug also needs just the
>> proper filesystem settings to trigger in the first place.
>>
>> Again - please take a careful look. Both at my merge and at that
>> alleged kmap fix.
>
> I checked the commits individually and in the source files, all the
> kmaps seem to be correctly paired with kunmaps. We'll do more tests too.
> I'm sorry that it turned out to be such mess.
>
OK, something is going weird now.
As an extra step to make sure there is no leakage, I ran fstests with
"compress=lzo" mount option, but the result can't be more ugly.
For the master branch (which includes the fix), it has a very strange
"leakage" behavior, at least in my x86 32bit VM.
The used memory reported from free would just suddenly sky rocket during
generic/027.
(From regular 100~200M to 800M and more).
Easily causing OOM for my 2G RAM VM.
I originally think it's btrfs LZO, but nope.
On v5.15 tag with the fix from Linus, generic/027 runs fine as expected
with lzo compression.
BTW, zlib/zstd compression runs fine.
I guess there is something changed in MM layer causing the problem.
Btrfs LZO has tons of quick kmap()/kunmap() pairs, unlike what we did in
zlib/zstd, thus I guess that may be a triggering factor.
Any clue?
Thanks,
Qu
Powered by blists - more mailing lists