lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5faac47-8a8c-90ff-877d-b793b715ac4d@virtuozzo.com>
Date:   Fri, 9 Aug 2019 08:39:17 +0000
From:   Pavel Tikhomirov <ptikhomirov@...tuozzo.com>
To:     "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     Christoph Hellwig <hch@....de>, Jens Axboe <axboe@...nel.dk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dennis Zhou <dennis@...nel.org>,
        Josef Bacik <josef@...icpanda.com>, Tejun Heo <tj@...nel.org>,
        Huang Ying <ying.huang@...el.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Omar Sandoval <osandov@...com>,
        Pavel Tikhomirov <ptikhomirov@...tuozzo.com>
Subject: mm/swap: possible problem introduced when replacing REQ_NOIDLE with
 REQ_IDLE

Hi, all.

Then porting patches from mainstream I've found some strange code:

 > commit a2b809672ee6fcb4d5756ea815725b3dbaea654e
 > Author: Christoph Hellwig <hch@....de>
 > Date:   Tue Nov 1 07:40:09 2016 -0600
 >
 >     block: replace REQ_NOIDLE with REQ_IDLE
 >
 >     Noidle should be the default for writes as seen by all the compounds
 >     definitions in fs.h using it.  In fact only direct I/O really should
 >     be using NODILE, so turn the whole flag around to get the defaults
 >     right, which will make our life much easier especially onces the
 >     WRITE_* defines go away.
 >
 >     This assumes all the existing "raw" users of REQ_SYNC for writes
 >     want noidle behavior, which seems to be spot on from a quick audit.
 >
 >     Signed-off-by: Christoph Hellwig <hch@....de>
 >     Signed-off-by: Jens Axboe <axboe@...com>
 >
 > diff --git a/include/linux/fs.h b/include/linux/fs.h
 > index ccedccb28ec8..46a74209917f 100644
 > --- a/include/linux/fs.h
 > +++ b/include/linux/fs.h
 > @@ -197,11 +197,11 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, 
loff_t offset,
 >  #define WRITE                  REQ_OP_WRITE
 >
 >  #define READ_SYNC              0
 > -#define WRITE_SYNC             (REQ_SYNC | REQ_NOIDLE)
 > -#define WRITE_ODIRECT          REQ_SYNC
 > -#define WRITE_FLUSH            (REQ_NOIDLE | REQ_PREFLUSH)
 > -#define WRITE_FUA              (REQ_NOIDLE | REQ_FUA)
 > -#define WRITE_FLUSH_FUA                (REQ_NOIDLE | REQ_PREFLUSH | 
REQ_FUA)
 > +#define WRITE_SYNC             REQ_SYNC
 > +#define WRITE_ODIRECT          (REQ_SYNC | REQ_IDLE)
 > +#define WRITE_FLUSH            REQ_PREFLUSH
 > +#define WRITE_FUA              REQ_FUA
 > +#define WRITE_FLUSH_FUA                (REQ_PREFLUSH | REQ_FUA)
 >
 >  /*
 >   * Attribute flags.  These should be or-ed together to figure out what

The above commit changes the meaning of the REQ_SYNC flag, before the 
patch it was equal to WRITE_ODIRECT and after the patch it is equal to 
WRITE_SYNC. And thus I think it became treated differently (I see only 
one place left in wbt_should_throttle.).

But in __swap_writepage() both before and after the mentioned patch we 
still pass a single REQ_SYNC without any REQ_IDLE/REQ_UNIDLE:

 > [snorch@...rch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e^ mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);
 > [snorch@...rch linux]$ git blame 
a2b809672ee6fcb4d5756ea815725b3dbaea654e mm/page_io.c | grep -a5 REQ_SYNC
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 319) 
         unlock_page(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 320) 
         ret = -ENOMEM;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 321) 
         goto out;
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 322)   }
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 323) 
if (wbc->sync_mode == WB_SYNC_ALL)
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 324) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_SYNC);
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 325) 
else
 > ba13e83ec334c (Jens Axboe            2016-08-01 09:38:44 -0600 326) 
         bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
 > f8891e5e1f93a (Christoph Lameter     2006-06-30 01:55:45 -0700 327) 
count_vm_event(PSWPOUT);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 328) 
set_page_writeback(page);
 > ^1da177e4c3f4 (Linus Torvalds        2005-04-16 15:20:36 -0700 329) 
unlock_page(page);

It looks like we've changed the way how we handle swap page writes from 
"odirect" way to "regular" sync write way, these can be wrong. This may 
also affect deprecated cfq io-scheduler on older kernels.

Thanks in advance for any advice on what to do with these, may be I miss 
something.

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ