lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4C4DE14F.9050208@vlnb.net>
Date:	Mon, 26 Jul 2010 23:26:07 +0400
From:	Vladislav Bolkhovitin <vst@...b.net>
To:	Gennadiy Nerubayev <parakie@...il.com>
CC:	James Bottomley <James.Bottomley@...e.de>,
	Christof Schmitt <christof.schmitt@...ibm.com>,
	Boaz Harrosh <bharrosh@...asas.com>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, Chris Mason <chris.mason@...cle.com>
Subject: Re: Wrong DIF guard tag on ext2 write

Gennadiy Nerubayev, on 07/26/2010 09:00 PM wrote:
> On Mon, Jul 26, 2010 at 8:22 AM, Vladislav Bolkhovitin<vst@...b.net>  wrote:
>> Gennadiy Nerubayev, on 07/24/2010 12:51 AM wrote:
>>>>>>
>>>>>> The real life problem we can see in an active-active DRBD-setup. In
>>>>>> this
>>>>>> configuration 2 nodes act as a single SCST-powered SCSI device and they
>>>>>> both
>>>>>> run DRBD to keep their backstorage in-sync. The initiator uses them as
>>>>>> a
>>>>>> single multipath device in an active-active round-robin load-balancing
>>>>>> configuration, i.e. sends requests to both nodes in parallel, then DRBD
>>>>>> takes care to replicate the requests to the other node.
>>>>>>
>>>>>> The problem is that sometimes DRBD complies about concurrent local
>>>>>> writes, like:
>>>>>>
>>>>>> kernel: drbd0: scsi_tgt0[12503] Concurrent local write detected!
>>>>>> [DISCARD
>>>>>> L] new: 144072784s +8192; pending: 144072784s +8192
>>>>>>
>>>>>> This message means that DRBD detected that both nodes received
>>>>>> overlapping writes on the same block(s) and DRBD can't figure out which
>>>>>> one
>>>>>> to store. This is possible only if the initiator sent the second write
>>>>>> request before the first one completed.
>>>>>>
>>>>>> The topic of the discussion could well explain the cause of that. But,
>>>>>> unfortunately, people who reported it forgot to note which OS they run
>>>>>> on
>>>>>> the initiator, i.e. I can't say for sure it's Linux.
>>>>>
>>>>> Sorry for the late chime in, but here's some more information of
>>>>> potential interest as I've previously inquired about this to the drbd
>>>>> mailing list:
>>>>>
>>>>> 1. It only happens when using blockio mode in IET or SCST. Fileio,
>>>>> nv_cache, and write_through do not generate the warnings.
>>>>
>>>> Some explanations for those who not familiar with the terminology:
>>>>
>>>>   - "Fileio" means Linux IO stack on the target receives IO via
>>>> vfs_readv()/vfs_writev()
>>>>
>>>>   - "NV_CACHE" means all the cache synchronization requests
>>>> (SYNCHRONIZE_CACHE, FUA) from the initiator are ignored
>>>>
>>>>   - "WRITE_THROUGH" means write through, i.e. the corresponding backend
>>>> file
>>>> for the device open with O_SYNC flag.
>>>>
>>>>> 2. It happens on active/passive drbd clusters (on the active node
>>>>> obviously), NOT active/active. In fact, I've found that doing round
>>>>> robin on active/active is a Bad Idea (tm) even with a clustered
>>>>> filesystem, until at least the target software is able to synchronize
>>>>> the command state of either node.
>>>>> 3. Linux and ESX initiators can generate the warning, but I've so far
>>>>> only been able to reliably reproduce it using a Windows initiator and
>>>>> sqlio or iometer benchmarks. I'll be trying again using iometer when I
>>>>> have the time.
>>>>> 4. It only happens using a random write io workload (any block size),
>>>>> with initiator threads>1, OR initiator queue depth>1. The higher
>>>>> either of those is, the more spammy the warnings become.
>>>>> 5. The transport does not matter (reproduced with iSCSI and SRP)
>>>>> 6. If DRBD is disconnected (primary/unknown), the warnings are not
>>>>> generated. As soon as it's reconnected (primary/secondary), the
>>>>> warnings will reappear.
>>>>
>>>> It would be great if you prove or disprove our suspicions that Linux can
>>>> produce several write requests for the same blocks simultaneously. To be
>>>> sure we need:
>>>>
>>>> 1. The initiator is Linux. Windows and ESX are not needed for this
>>>> particular case.
>>>>
>>>> 2. If you are able to reproduce it, we will need full description of
>>>> which
>>>> application used on the initiator to generate the load and in which mode.
>>>>
>>>> Target and DRBD configuration doesn't matter, you can use any.
>>>
>>> I just tried, and this particular DRBD warning is not reproducible
>>> with io (iometer) coming from a Linux initiator (2.6.30.10) The same
>>> iometer parameters were used as on windows, and both the base device
>>> as well as filesystem (ext3) were tested, both negative. I'll try a
>>> few more tests, but it seems that this is a nonissue with a Linux
>>> initiator.
>>
>> OK, but to be completely sure, can you check also with other load
>> generators, than IOmeter, please? IOmeter on Linux is a lot less effective
>> than on Windows, because it uses sync IO, while we need big multi-IO load to
>> trigger the problem we are discussing, if it exists. Plus, to catch it we
>> need an FS on the initiator side, not using raw devices. So, something like
>> fio over files on FS or diskbench should be more appropriate. Please don't
>> use direct IO to avoid the bug Dave Chinner pointed us out.
>
> I tried both fio and dbench, with the same results. With fio in
> particular, I think I used pretty much every possible combination of
> engines, directio, and sync settings with 8 threads, 32 queue depth
> and random write workload.
>
>> Also, you mentioned above about that Linux can generate the warning. Can you
>> recall on which configuration, including the kernel version, the load
>> application and its configuration, you have seen it?
>
> Sorry, after double checking, it's only ESX and Windows that generate
> them. The majority of the ESX virtuals in question are Windows, though
> I can see some indications of ESX servers that have Linux-only
> virtuals generating one here and there. It's somewhat difficult to
> tell historically, and I probably would not be able to determine what
> those virtuals were running at the time.

OK, I see. A negative result is also a result. Now we know that Linux 
(in contrast to VMware and Windows) works well in this area.

Thank you!
Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ