linux-kernel - Re: squashfs enhance [RFC 3/5] squashfs: remove cache for normal data page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5243A78F.3050800@lougher.demon.co.uk>
Date:	Thu, 26 Sep 2013 04:18:39 +0100
From:	Phillip Lougher <phillip@...gher.demon.co.uk>
To:	Linux Kernel Development <linux-kernel@...r.kernel.org>
CC:	"minchan.kim" <minchan.kim@...il.com>,
	chintu kumar <chintu.kr78@...il.com>, ch0.han@....com,
	'Gunho Lee' <gunho.lee@....com>
Subject: Re: squashfs enhance [RFC 3/5] squashfs: remove cache for normal
 data page


Minchan Kim <minchan@...nel.org> wrote:
>
>Hello chintu,
>
>
>On Wed, Sep 25, 2013 at 5:13 PM, chintu kumar <chintu.kr78@...il.com> wrote:
>> Hi Minchan,
>>
>> Thanks for the response!
>>
>> I checked add_to_page_cache_lru and you are right for existing pages in page
>> cache it won't add the page you allocated so my questions are answered thank
>> you. However my concern is still regarding parallel calls to
>> squashfs_readpage for the same inode.
>>
>> I'll try to rephrase,
>>
>> Actually I was concerned about allocation on every squashfs_readpage. In the
>> original code suppose that all pages weren't present in page cache and we
>> are taking random read on the file. Now in the original code the
>> cache->entry would've been filled once for all the requested pages within
>> that block.
>>
>> However with your patch we always do I/O however and as you said
>> add_to_page_cache_lru would take care of adding that page to the inode's
>> mapping so we need not worry about that [Correct?] . Now suppose that you've
>> read those pages as well for which squashfs_readpage has been called however
>> since you won't be able to add the page to page_cache_lru it'll do I/O again
>> just to fill the page that a different squashfs_readpage wasn't able to
>> fill.
>>
>> timeline
>> |
>> |
>> ---->  Process 1 (VFS initiates read for page 2)
>> |
>> ----> Initiates read for pages 1-5
>> Process 2 (VFS intiates read for page 3)
>> |
>> ----> attempts to add the pages read in page cache
>> Initiates read for pages 1-5 again
>>        and fails for page 3 since VFS already initiated
>> attempts to add pages read in page cache
>>        read for page 3.
>> and fails for every page. Page 3 isn't added since you got it from VFS in
>> squashfs_readpage
>>
>>
>> So it means that cost of memcpy in the original code from cache->entry to
>> the page cache page IS MORE than   __page_cache_alloc + I/O? that you did in
>> your patch.
>
>If the race happen with same CPU, buffer cache will mitigate a problem
>but it happens on different CPUs,
>we will do unnecessary I/O. But I think it's not a severe regression
>because old code's cache coverage is
>too limited and even stuck with concurrent I/O due to *a* cache.
>
>One of solution is to add locked pages right before request I/O to
>page cache and if one of the page
>in the compressed block is found from page cache, it means another
>stream is going on so we can wait to finish
>I/O and copy, just return. Does it make sense?
>

No it doesn't, there's an obvious logical mistake there.

I' not convinced about the benefits of this patch.  The
use of the "unnecessary" cache was there to prevent this race
condition from happening.  The first racing process enters
readpage, and grabs the cache entry signifying it is doing a decompress
for that and the other pages.  A second racing process enters
readpage, see a decompress is in progress, and waits until it is finished.
The first process then fills in all the pages it can grab (bar the
page locked by the second process), and the second process fills in its
page from the cache entry.  Nice and simple, and no unnecessary extra
I/O or decompression takes place.

It is your job to prove that the removal of memcpy outweighs the
introduction of this regression.  Waving your hands and just
dismissing this inconvenient issue isn't good enough.

I have a couple of other reservations about this patch, as it
introduces two additional regressions.  This will be in my review.

As I said in my reply to your private message on Monday (where you
asked me to review the patches), I reviewed them on the weekend.  As yet, I
have not had time to pull my observations into something suitable for
the kernel mailing list.  That will be done ASAP, but, due to full-time
work commitments I may not have any time until the weekend again.

On the subject of review, I notice this email contains patch review and
discussion off-list.  That is not good kernel etiquette, patch discussion
should be open, public and in full view, let's keep it that way.
Also as maintainer I should have been CC'd on this email, rather than
accidently discovering it on the kernel mailing list.

Phillip


>>
>>
>> Regards,
>> Chintu
>>
>>
>>
>> On Wed, Sep 25, 2013 at 5:52 AM, Minchan Kim <minchan@...nel.org> wrote:
>>>
>>> Hello,
>>>
>>> On Tue, Sep 24, 2013 at 6:07 PM, chintu kumar <chintu.kr78@...il.com>
>>> wrote:
>>> > Hi Minchan,
>>> >
>>> >
>>> > In the below snippet
>>> >
>>> > + /* alloc buffer pages */
>>> > + for (i = 0; i < pages; i++) {
>>> > + if (page->index == start_index + i)
>>> > + push_page = page;
>>> > + else
>>> > + push_page = __page_cache_alloc(gfp_mask);
>>> >
>>> >
>>> > You are adding the allocated page in the inode mapping regardless of
>>> > whether
>>> > it was already there or not.
>>> >
>>> > ......
>>> > ......
>>> >
>>> > + if (add_to_page_cache_lru(push_page, mapping,
>>> > + start_index + i, gfp_mask)) {
>>> >   page_cache_release(push_page);
>>> >
>>> >
>>> >
>>> >
>>> > 1) My question is what happens to the previous mapped page; in case,
>>> > there
>>> > already was a mapping to that page and it was uptodate?.
>>>
>>> add_to_page_cache_lru would fail and the page will be freed by
>>> page_cache_release.
>>>
>>> >
>>> > 2.a) Another case occurs for multiple simultaneous squashfs_readpage
>>> > calls.
>>> > In those cases the page passed in via read_pages should be the one that
>>> > needs to be filled and unlocked however since while filling cache entry
>>> > only
>>> > a single squashfs_readpage would actually fill data the rest of the
>>> > processes waiting for the entry to be filled wouldn't get the page they
>>> > sent
>>> > in. [Correct?]
>>> >
>>> > 2.b) Wouldn't that result in a non updated page to be returned to the
>>> > process which would result in EIO?
>>> >
>>> > 2.c) But on subsequent calls it would get the freshly allocated page
>>> > which
>>> > you added above in page cache.[Correct?]
>>>
>>> I couldn't understand your questions. Could you elaborate on a bit?
>>> Anyway, it seem you worry about race by several sources for readahead.
>>> Thing is add_to_page_cache_lru which will prevent the race and free pages.
>>>
>>> Thanks for the review!
>>>
>>> >
>>> >
>>> >
>>> > Regards,
>>> > Chintu
>>> >
>>> >
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/