lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 19 Feb 2014 10:38:30 +0900
From:	Roman Peniaev <r.peniaev@...il.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Alexander Viro <viro@...iv.linux.org.uk>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH 1/1] fs/mpage.c: forgotten WRITE_SYNC in case of data
 integrity write

(my previous email was rejected by vger.kernel.org because google web
sent it as html.
 will resend the same one in plain text mode)

> What do REQ_SYNC and REQ_NOIDLE actually *do*?

Yep, this REQ_SYNC is very confusing to me.
First of all according to the sources of old school block buffer filesystems
(e.g. ext2) we can get this stack in case of fsync call:

     __filemap_fdatawrite_range(mapping, 0, LLONG_MAX, sync_mode)
      do_writepages(mapping, &wbc)
         mapping->a_ops->writepages(page, wbc)
         (ext2_writepages)
            mpage_writepages(mapping, wbc, fat_get_block);
              write_cache_pages(mapping, wbc, __mpage_writepage, &mpd)
                __mpage_writepage(page, wbc, data)
>>>>>             mpage_bio_submit(WRITE, bio) >>>> why WRITE? not WRITE_SYNC in case of WB_SYNC_ALL?
                    <or in case of not contiguous buffers>
                  mapping->a_ops->writepage(page, wbc)
                  (ext2_writepage)
                    block_write_full_page(page, fat_get_block, wbc)
                      block_write_full_page_endio(page, get_block, wbc,
                                                  end_buffer_async_write)
                        __block_write_full_page(inode, page, get_block, wbc,
                                                handler);
                          submit_bh(WRITE_SYNC)

So, it turns out to be that some bios for the same dirty range
can be submitted with REQ_WRITE|REQ_SYNC|REQ_NOIDLE and some of
the bios only with REQ_WRITE.
(according to the comment of __mpage_writepage:
 * If all blocks are found to be contiguous then the page can go into the
 * BIO.  Otherwise fall back to the mapping's writepage().
)

Also, it seems to me that all over the kernel WRITE_SYNC has meaning of:
1. try to get the block on-disk faster
2. if I have to do flush - mark my bio with WRITE_SYNC and wait for result

My patch is an attempt to make some unification in case of fsync call.

--
Roman


On Wed, Feb 19, 2014 at 8:59 AM, Andrew Morton
<akpm@...ux-foundation.org> wrote:
> On Sun, 16 Feb 2014 11:54:28 +0900 Roman Pen <r.peniaev@...il.com> wrote:
>
>> In case of wbc->sync_mode == WB_SYNC_ALL we need to do data integrity write,
>> thus mark request as WRITE_SYNC.
>
> gargh, the documentation for this stuff is useless.
>
> What do REQ_SYNC and REQ_NOIDLE actually *do*?
>
> For mpage writes, REQ_NOIDLE appears to be incorrect - we very much
> expect that there will be more writes and that they will be contiguous
> with this one.  But we won't be waiting on this write before submitting
> more writes, so perhaps REQ_NOIDLE is at least harmless.
>
> I dunno about REQ_SYNC - it requires delving into the bowels of CFQ
> and we shouldn't need to do that.
>
> Jens.  Help.  How is a poor kernel reader supposed to work this out?
>
>> --- a/fs/mpage.c
>> +++ b/fs/mpage.c
>> @@ -462,6 +462,7 @@ static int __mpage_writepage(struct page *page, struct writeback_control *wbc,
>>       struct buffer_head map_bh;
>>       loff_t i_size = i_size_read(inode);
>>       int ret = 0;
>> +     int wr = (wbc->sync_mode == WB_SYNC_ALL ?  WRITE_SYNC : WRITE);
>>
>>       if (page_has_buffers(page)) {
>>               struct buffer_head *head = page_buffers(page);
>> @@ -570,7 +571,7 @@ page_is_mapped:
>>        * This page will go to BIO.  Do we need to send this BIO off first?
>>        */
>>       if (bio && mpd->last_block_in_bio != blocks[0] - 1)
>> -             bio = mpage_bio_submit(WRITE, bio);
>> +             bio = mpage_bio_submit(wr, bio);
>>
>>  alloc_new:
>>       if (bio == NULL) {
>> @@ -587,7 +588,7 @@ alloc_new:
>>        */
>>       length = first_unmapped << blkbits;
>>       if (bio_add_page(bio, page, length, 0) < length) {
>> -             bio = mpage_bio_submit(WRITE, bio);
>> +             bio = mpage_bio_submit(wr, bio);
>>               goto alloc_new;
>>       }
>>
>> @@ -620,7 +621,7 @@ alloc_new:
>>       set_page_writeback(page);
>>       unlock_page(page);
>>       if (boundary || (first_unmapped != blocks_per_page)) {
>> -             bio = mpage_bio_submit(WRITE, bio);
>> +             bio = mpage_bio_submit(wr, bio);
>>               if (boundary_block) {
>>                       write_boundary_block(boundary_bdev,
>>                                       boundary_block, 1 << blkbits);
>> @@ -632,7 +633,7 @@ alloc_new:
>>
>>  confused:
>>       if (bio)
>> -             bio = mpage_bio_submit(WRITE, bio);
>> +             bio = mpage_bio_submit(wr, bio);
>>
>>       if (mpd->use_writepage) {
>>               ret = mapping->a_ops->writepage(page, wbc);
>> @@ -688,8 +689,11 @@ mpage_writepages(struct address_space *mapping,
>>               };
>>
>>               ret = write_cache_pages(mapping, wbc, __mpage_writepage, &mpd);
>> -             if (mpd.bio)
>> -                     mpage_bio_submit(WRITE, mpd.bio);
>> +             if (mpd.bio) {
>> +                     int wr = (wbc->sync_mode == WB_SYNC_ALL ?
>> +                               WRITE_SYNC : WRITE);
>> +                     mpage_bio_submit(wr, mpd.bio);
>> +             }
>>       }
>>       blk_finish_plug(&plug);
>>       return ret;
>> @@ -706,8 +710,11 @@ int mpage_writepage(struct page *page, get_block_t get_block,
>>               .use_writepage = 0,
>>       };
>>       int ret = __mpage_writepage(page, wbc, &mpd);
>> -     if (mpd.bio)
>> -             mpage_bio_submit(WRITE, mpd.bio);
>> +     if (mpd.bio) {
>> +             int wr = (wbc->sync_mode == WB_SYNC_ALL ?
>> +                       WRITE_SYNC : WRITE);
>> +             mpage_bio_submit(wr, mpd.bio);
>> +     }
>>       return ret;
>>  }
>>  EXPORT_SYMBOL(mpage_writepage);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ