summaryrefslogtreecommitdiff
path: root/drivers/md/Makefile (unfollow)
Commit message (Collapse)Author
2017-04-26ANDROID: AVB error handler to invalidate vbmeta partition.David Zeuthen
If androidboot.vbmeta.device is set and points to a device with vbmeta magic, this header will be overwritten upon an irrecoverable dm-verity error. The side-effect of this is that the slot will fail to verify on next reboot, effectively triggering the boot loader to fallback to another slot. This work both if the vbmeta struct is at the start of a partition or if there's an AVB footer at the end. This code is based on drivers/md/dm-verity-chromeos.c from ChromiumOS. Example: [ 0.000000] Kernel command line: rootfstype=ext4 init=/init console=ttyS0,115200 androidboot.console=ttyS0 androidboot.hardware=uefi_x86_64 enforcing=0 androidboot.selinux=permissive androidboot.debuggable=1 buildvariant=eng dm="1 vroot none ro 1,0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks" root=0xfd00 androidboot.vbmeta.device=PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 androidboot.slot_suffix=_a androidboot.vbmeta.device_state=unlocked androidboot.vbmeta.hash_alg=sha256 androidboot.vbmeta.size=3200 androidboot.vbmeta.digest=14fe41c2b3696c31b7ad5eae7877d7d188995e1ab122c604aaaf4785850b91f7 skip_initramfs [...] [ 0.612802] device-mapper: verity-avb: AVB error handler initialized with vbmeta device: PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 [...] [ 1.213804] device-mapper: init: attempting early device configuration. [ 1.214752] device-mapper: init: adding target '0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks' [ 1.217643] device-mapper: init: dm-0 is ready [ 1.226694] device-mapper: verity: 8:6: data block 0 is corrupted [ 1.227666] device-mapper: verity-avb: AVB error handler called for PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 [ 1.234308] device-mapper: verity-avb: invalidate_vbmeta: found vbmeta partition [ 1.235848] device-mapper: verity-avb: invalidate_vbmeta: completed. [...] Bug: 31622239 Test: Manually tested (other arch). Change-Id: Idf6be32d6a3d28e15de9302aa26ad6a516d663aa Signed-off-by: David Zeuthen <zeuthen@google.com> Git-commit: 8d6f006d608c3b03652fb919e496945f2d4d4f1d Git-repo: https://android.googlesource.com/kernel/common/ [runminw@codeaurora.org: resolve trivial merge conflicts] Signed-off-by: Runmin Wang <runminw@codeaurora.org>
2017-04-25ANDROID: AVB error handler to invalidate vbmeta partition.David Zeuthen
If androidboot.vbmeta.device is set and points to a device with vbmeta magic, this header will be overwritten upon an irrecoverable dm-verity error. The side-effect of this is that the slot will fail to verify on next reboot, effectively triggering the boot loader to fallback to another slot. This work both if the vbmeta struct is at the start of a partition or if there's an AVB footer at the end. This code is based on drivers/md/dm-verity-chromeos.c from ChromiumOS. Example: [ 0.000000] Kernel command line: rootfstype=ext4 init=/init console=ttyS0,115200 androidboot.console=ttyS0 androidboot.hardware=uefi_x86_64 enforcing=0 androidboot.selinux=permissive androidboot.debuggable=1 buildvariant=eng dm="1 vroot none ro 1,0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks" root=0xfd00 androidboot.vbmeta.device=PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 androidboot.slot_suffix=_a androidboot.vbmeta.device_state=unlocked androidboot.vbmeta.hash_alg=sha256 androidboot.vbmeta.size=3200 androidboot.vbmeta.digest=14fe41c2b3696c31b7ad5eae7877d7d188995e1ab122c604aaaf4785850b91f7 skip_initramfs [...] [ 0.612802] device-mapper: verity-avb: AVB error handler initialized with vbmeta device: PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 [...] [ 1.213804] device-mapper: init: attempting early device configuration. [ 1.214752] device-mapper: init: adding target '0 2080496 verity 1 PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b PARTUUID=6779df46-78f6-4c69-bf53-59bb1fbf126b 4096 4096 260062 260062 sha1 4f76354c86e430e27426d584a726f2fbffecae32 7e4085342d634065269631ac9a199e1a43f4632c 1 ignore_zero_blocks' [ 1.217643] device-mapper: init: dm-0 is ready [ 1.226694] device-mapper: verity: 8:6: data block 0 is corrupted [ 1.227666] device-mapper: verity-avb: AVB error handler called for PARTUUID=b865935d-38fb-4c4e-b8b4-70dc67321552 [ 1.234308] device-mapper: verity-avb: invalidate_vbmeta: found vbmeta partition [ 1.235848] device-mapper: verity-avb: invalidate_vbmeta: completed. [...] Bug: 31622239 Test: Manually tested (other arch). Change-Id: Idf6be32d6a3d28e15de9302aa26ad6a516d663aa Signed-off-by: David Zeuthen <zeuthen@google.com>
2016-10-28ANDROID: dm: Add android verity targetBadhri Jagan Sridharan
This device-mapper target is virtually a VERITY target. This target is setup by reading the metadata contents piggybacked to the actual data blocks in the block device. The signature of the metadata contents are verified against the key included in the system keyring. Upon success, the underlying verity target is setup. BUG: 27175947 Change-Id: I7e99644a0960ac8279f02c0158ed20999510ea97 Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Git-commit: 36d01a590d035804dc100e105facf4049cf6fe43 Git-repo: https://android.googlesource.com/kernel/common Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2016-09-14ANDROID: dm: android-verity: Allow android-verity to be compiled as an ↵Badhri Jagan Sridharan
independent module Exports the device mapper callbacks of linear and dm-verity-target methods. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Change-Id: I0358be0615c431dce3cc78575aaac4ccfe3aacd7
2016-08-25ANDROID: dm: android-verity: Allow android-verity to be compiled as an ↵Badhri Jagan Sridharan
independent module Exports the device mapper callbacks of linear and dm-verity-target methods. Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com> Change-Id: I0358be0615c431dce3cc78575aaac4ccfe3aacd7
2016-08-18ANDROID: dm: Add android verity targetBadhri Jagan Sridharan
This device-mapper target is virtually a VERITY target. This target is setup by reading the metadata contents piggybacked to the actual data blocks in the block device. The signature of the metadata contents are verified against the key included in the system keyring. Upon success, the underlying verity target is setup. BUG: 27175947 Change-Id: I7e99644a0960ac8279f02c0158ed20999510ea97 Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
2016-08-10ANDROID: dm: Add android verity targetBadhri Jagan Sridharan
This device-mapper target is virtually a VERITY target. This target is setup by reading the metadata contents piggybacked to the actual data blocks in the block device. The signature of the metadata contents are verified against the key included in the system keyring. Upon success, the underlying verity target is setup. BUG: 27175947 Change-Id: I7e99644a0960ac8279f02c0158ed20999510ea97 Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
2016-04-07UPSTREAM: dm verity: add support for forward error correctionSami Tolvanen
Add support for correcting corrupted blocks using Reed-Solomon. This code uses RS(255, N) interleaved across data and hash blocks. Each error-correcting block covers N bytes evenly distributed across the combined total data, so that each byte is a maximum distance away from the others. This makes it possible to recover from several consecutive corrupted blocks with relatively small space overhead. In addition, using verity hashes to locate erasures nearly doubles the effectiveness of error correction. Being able to detect corrupted blocks also improves performance, because only corrupted blocks need to corrected. For a 2 GiB partition, RS(255, 253) (two parity bytes for each 253-byte block) can correct up to 16 MiB of consecutive corrupted blocks if erasures can be located, and 8 MiB if they cannot, with 16 MiB space overhead. Change-Id: Ife4f8889f7fbf0974bf3ed4be6d3322ae9b4cb0e Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit a739ff3f543afbb4a041c16cd0182c8e8d366e70)
2016-04-07UPSTREAM: dm verity: move dm-verity.c to dm-verity-target.cSami Tolvanen
Prepare for extending dm-verity with an optional object. Follows the naming convention used by other DM targets (e.g. dm-cache and dm-era). Change-Id: If6d2f27b290adf14fa77f3745fdc13aaa417c8dc Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit 03045cbafa2d663ad8d0a583ac219d202d824344)
2016-03-31UPSTREAM: dm verity: add support for forward error correctionSami Tolvanen
Add support for correcting corrupted blocks using Reed-Solomon. This code uses RS(255, N) interleaved across data and hash blocks. Each error-correcting block covers N bytes evenly distributed across the combined total data, so that each byte is a maximum distance away from the others. This makes it possible to recover from several consecutive corrupted blocks with relatively small space overhead. In addition, using verity hashes to locate erasures nearly doubles the effectiveness of error correction. Being able to detect corrupted blocks also improves performance, because only corrupted blocks need to corrected. For a 2 GiB partition, RS(255, 253) (two parity bytes for each 253-byte block) can correct up to 16 MiB of consecutive corrupted blocks if erasures can be located, and 8 MiB if they cannot, with 16 MiB space overhead. Change-Id: Ife4f8889f7fbf0974bf3ed4be6d3322ae9b4cb0e Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit a739ff3f543afbb4a041c16cd0182c8e8d366e70)
2016-03-31UPSTREAM: dm verity: move dm-verity.c to dm-verity-target.cSami Tolvanen
Prepare for extending dm-verity with an optional object. Follows the naming convention used by other DM targets (e.g. dm-cache and dm-era). Change-Id: If6d2f27b290adf14fa77f3745fdc13aaa417c8dc Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> (cherry picked from commit 03045cbafa2d663ad8d0a583ac219d202d824344)
2016-03-23dm: add snapshot of dm-req-cryptGilad Broner
This snapshot is taken as of msm-3.18 commit: 5684450d70 ("Promotion of kernel.lnx.3.18-151201"). dm-req-crypt is necessary for full disk encryption. Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2015-10-24raid5: add basic stripe logShaohua Li
This introduces a simple log for raid5. Data/parity writing to raid array first writes to the log, then write to raid array disks. If crash happens, we can recovery data from the log. This can speed up raid resync and fix write hole issue. The log structure is pretty simple. Data/meta data is stored in block unit, which is 4k generally. It has only one type of meta data block. The meta data block can track 3 types of data, stripe data, stripe parity and flush block. MD superblock will point to the last valid meta data block. Each meta data block has checksum/seq number, so recovery can scan the log correctly. We store a checksum of stripe data/parity to the metadata block, so meta data and stripe data/parity can be written to log disk together. otherwise, meta data write must wait till stripe data/parity is finished. For stripe data, meta data block will record stripe data sector and size. Currently the size is always 4k. This meta data record can be made simpler if we just fix write hole (eg, we can record data of a stripe's different disks together), but this format can be extended to support caching in the future, which must record data address/size. For stripe parity, meta data block will record stripe sector. It's size should be 4k (for raid5) or 8k (for raid6). We always store p parity first. This format should work for caching too. flush block indicates a stripe is in raid array disks. Fixing write hole doesn't need this type of meta data, it's for caching extension. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: NeilBrown <neilb@suse.com>
2015-06-11dm cache: add stochastic-multi-queue (smq) policyJoe Thornber
The stochastic-multi-queue (smq) policy addresses some of the problems with the current multiqueue (mq) policy. Memory usage ------------ The mq policy uses a lot of memory; 88 bytes per cache block on a 64 bit machine. SMQ uses 28bit indexes to implement it's data structures rather than pointers. It avoids storing an explicit hit count for each block. It has a 'hotspot' queue rather than a pre cache which uses a quarter of the entries (each hotspot block covers a larger area than a single cache block). All these mean smq uses ~25bytes per cache block. Still a lot of memory, but a substantial improvement nontheless. Level balancing --------------- MQ places entries in different levels of the multiqueue structures based on their hit count (~ln(hit count)). This means the bottom levels generally have the most entries, and the top ones have very few. Having unbalanced levels like this reduces the efficacy of the multiqueue. SMQ does not maintain a hit count, instead it swaps hit entries with the least recently used entry from the level above. The over all ordering being a side effect of this stochastic process. With this scheme we can decide how many entries occupy each multiqueue level, resulting in better promotion/demotion decisions. Adaptability ------------ The MQ policy maintains a hit count for each cache block. For a different block to get promoted to the cache it's hit count has to exceed the lowest currently in the cache. This means it can take a long time for the cache to adapt between varying IO patterns. Periodically degrading the hit counts could help with this, but I haven't found a nice general solution. SMQ doesn't maintain hit counts, so a lot of this problem just goes away. In addition it tracks performance of the hotspot queue, which is used to decide which blocks to promote. If the hotspot queue is performing badly then it starts moving entries more quickly between levels. This lets it adapt to new IO patterns very quickly. Performance ----------- In my tests SMQ shows substantially better performance than MQ. Once this matures a bit more I'm sure it'll become the default policy. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-04-15dm: add log writes targetJosef Bacik
Introduce a new target that is meant for file system developers to test file system integrity at particular points in the life of a file system. We capture all write requests and associated data and log them to a separate device for later replay. There is a userspace utility to do this replay. The idea behind this is to give file system developers a tool to verify that the file system is always consistent. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Zach Brown <zab@zabbo.net> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-23Create a separate module for clustering supportGoldwyn Rodrigues
Tagged as EXPERIMENTAL for now. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2014-03-27dm: add era targetJoe Thornber
dm-era is a target that behaves similar to the linear target. In addition it keeps track of which blocks were written within a user defined period of time called an 'era'. Each era target instance maintains the current era as a monotonically increasing 32-bit counter. Use cases include tracking changed blocks for backup software, and partially invalidating the contents of a cache to restore cache coherency after rolling back a vendor snapshot. dm-era is primarily expected to be paired with the dm-cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-01-14dm sysfs: fix a module unload raceMikulas Patocka
This reverts commit be35f48610 ("dm: wait until embedded kobject is released before destroying a device") and provides an improved fix. The kobject release code that calls the completion must be placed in a non-module file, otherwise there is a module unload race (if the process calling dm_kobject_release is preempted and the DM module unloaded after the completion is triggered, but before dm_kobject_release returns). To fix this race, this patch moves the completion code to dm-builtin.c which is always compiled directly into the kernel if BLK_DEV_DM is selected. The patch introduces a new dm_kobject_holder structure, its purpose is to keep the completion and kobject in one place, so that it can be accessed from non-module code without the need to export the layout of struct mapped_device to that code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-09-05dm: add statistics supportMikulas Patocka
Support the collection of I/O statistics on user-defined regions of a DM device. If no regions are defined no statistics are collected so there isn't any performance impact. Only bio-based DM devices are currently supported. Each user-defined region specifies a starting sector, length and step. Individual statistics will be collected for each step-sized area within the range specified. The I/O statistics counters for each step-sized area of a region are in the same format as /sys/block/*/stat or /proc/diskstats but extra counters (12 and 13) are provided: total time spent reading and writing in milliseconds. All these counters may be accessed by sending the @stats_print message to the appropriate DM device via dmsetup. The creation of DM statistics will allocate memory via kmalloc or fallback to using vmalloc space. At most, 1/4 of the overall system memory may be allocated by DM statistics. The admin can see how much memory is used by reading /sys/module/dm_mod/parameters/stats_current_allocated_bytes See Documentation/device-mapper/statistics.txt for more details. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-07-10dm: add switch targetJim Ramsay
dm-switch is a new target that maps IO to underlying block devices efficiently when there is a large number of fixed-sized address regions but there is no simple pattern to allow for a compact mapping representation such as dm-stripe. Though we have developed this target for a specific storage device, Dell EqualLogic, we have made an effort to keep it as general purpose as possible in the hope that others may benefit. Originally developed by Jim Ramsay. Simplified by Mikulas Patocka. Signed-off-by: Jim Ramsay <jim_ramsay@dell.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-23bcache: A block layer cacheKent Overstreet
Does writethrough and writeback caching, handles unclean shutdown, and has a bunch of other nifty features motivated by real world usage. See the wiki at http://bcache.evilpiepirate.org for more. Signed-off-by: Kent Overstreet <koverstreet@google.com>
2013-03-01dm cache: add cleaner policyHeinz Mauelshagen
A simple cache policy that writes back all data to the origin. This is used to decommission a dm cache by emptying it. Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm cache: add mq policyJoe Thornber
A cache policy that uses a multiqueue ordered by recent hit count to select which blocks should be promoted and demoted. This is meant to be a general purpose policy. It prioritises reads over writes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: add cache targetJoe Thornber
Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-10-12dm thin: move bio_prison code to separate moduleMike Snitzer
The bio prison code will be useful to other future DM targets so move it to a separate module. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-28dm: add verity targetMikulas Patocka
This device-mapper target creates a read-only device that transparently validates the data on one underlying device against a pre-generated tree of cryptographic checksums stored on a second device. Two checksum device formats are supported: version 0 which is already shipping in Chromium OS and version 1 which incorporates some improvements. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Elly Jones <ellyjones@chromium.org> Cc: Milan Broz <mbroz@redhat.com> Cc: Olof Johansson <olofj@chromium.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31dm: add thin provisioning targetJoe Thornber
Initial EXPERIMENTAL implementation of device-mapper thin provisioning with snapshot support. The 'thin' target is used to create instances of the virtual devices that are hosted in the 'thin-pool' target. The thin-pool target provides data sharing among devices. This sharing is made possible using the persistent-data library in the previous patch. The main highlight of this implementation, compared to the previous implementation of snapshots, is that it allows many virtual devices to be stored on the same data volume, simplifying administration and allowing sharing of data between volumes (thus reducing disk usage). Another big feature is support for arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots ...). The previous implementation of snapshots did this by chaining together lookup tables, and so performance was O(depth). This new implementation uses a single data structure so we don't get this degradation with depth. For further information and examples of how to use this, please read Documentation/device-mapper/thin-provisioning.txt Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-10-31dm: add bufioMikulas Patocka
The dm-bufio interface allows you to do cached I/O on devices, holding recently-read blocks in memory and performing delayed writes. We don't use buffer cache or page cache already present in the kernel, because: * we need to handle block sizes larger than a page * we can't allocate memory to perform reads or we'd have deadlocks Currently, when a cache is required, we limit its size to a fraction of available memory. Usage can be viewed and changed in /sys/module/dm_bufio/parameters/ . The first user is thin provisioning, but more dm users are planned. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-03-24dm: add flakey targetJosef Bacik
This target is the same as the linear target except that it returns I/O errors periodically. It's been found useful in simulating failing devices for testing purposes. I needed a dm target to do some failure testing on btrfs's raid code, and Mike pointed me at this. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-01-13dm: raid456 basic supportNeilBrown
This patch is the skeleton for the DM target that will be the bridge from DM to MD (initially RAID456 and later RAID1). It provides a way to use device-mapper interfaces to the MD RAID456 drivers. As with all device-mapper targets, the nominal public interfaces are the constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO and STATUSTYPE_TABLE). The CTR table looks like the following: 1: <s> <l> raid \ 2: <raid_type> <#raid_params> <raid_params> \ 3: <#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN> Line 1 contains the standard first three arguments to any device-mapper target - the start, length, and target type fields. The target type in this case is "raid". Line 2 contains the arguments that define the particular raid type/personality/level, the required arguments for that raid type, and any optional arguments. Possible raid types include: raid4, raid5_la, raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc. (again, raid1 is planned for the future.) The list of required and optional parameters is the same for all the current raid types. The required parameters are positional, while the optional parameters are given as key/value pairs. The possible parameters are as follows: <chunk_size> Chunk size in sectors. [[no]sync] Force/Prevent RAID initialization [rebuild <idx>] Rebuild the drive indicated by the index [daemon_sleep <ms>] Time between bitmap daemon work to clear bits [min_recovery_rate <kB/sec/disk>] Throttle RAID initialization [max_recovery_rate <kB/sec/disk>] Throttle RAID initialization [max_write_behind <value>] See '-write-behind=' (man mdadm) [stripe_cache <sectors>] Stripe cache size for higher RAIDs Line 3 contains the list of devices that compose the array in metadata/data device pairs. If the metadata is stored separately, a '-' is given for the metadata device position. If a drive has failed or is missing at creation time, a '-' can be given for both the metadata and data drives for a given position. Examples: # RAID4 - 4 data drives, 1 parity # No metadata devices specified to hold superblock/bitmap info # Chunk size of 1MiB # (Lines separated for easy reading) 0 1960893648 raid \ raid4 1 2048 \ 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 # RAID4 - 4 data drives, 1 parity (no metadata devices) # Chunk size of 1MiB, force RAID initialization, # min recovery rate at 20 kiB/sec/disk 0 1960893648 raid \ raid4 4 2048 min_recovery_rate 20 sync\ 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 Performing a 'dmsetup table' should display the CTR table used to construct the mapping (with possible reordering of optional parameters). Performing a 'dmsetup status' will yield information on the state and health of the array. The output is as follows: 1: <s> <l> raid \ 2: <raid_type> <#devices> <1 health char for each dev> <resync_ratio> Line 1 is standard DM output. Line 2 is best shown by example: 0 1960893648 raid raid4 5 AAAAA 2/490221568 Here we can see the RAID type is raid4, there are 5 devices - all of which are 'A'live, and the array is 2/490221568 complete with recovery. Cc: linux-raid@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-10-29md: Factor out RAID6 algorithms into lib/David Woodhouse
We'll want to use these in btrfs too. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-10-16md: drivers/md/unroll.pl replaced with awk analogVladimir Dronnikov
drivers/md/unroll.pl replaced by awk script to drop build-time dependency on perl Signed-off-by: Vladimir Dronnikov <dronnikov@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2009-06-22dm raid1: add userspace logJonthan Brassow
This patch contains a device-mapper mirror log module that forwards requests to userspace for processing. The structures used for communication between kernel and userspace are located in include/linux/dm-log-userspace.h. Due to the frequency, diversity, and 2-way communication nature of the exchanges between kernel and userspace, 'connector' was chosen as the interface for communication. The first log implementations written in userspace - "clustered-disk" and "clustered-core" - support clustered shared storage. A userspace daemon (in the LVM2 source code repository) uses openAIS/corosync to process requests in an ordered fashion with the rest of the nodes in the cluster so as to prevent log state corruption. Other implementations with no association to LVM or openAIS/corosync, are certainly possible. (Imagine if two machines are writing to the same region of a mirror. They would both mark the region dirty, but you need a cluster-aware entity that can handle properly marking the region clean when they are done. Otherwise, you might clear the region when the first machine is done, not the second.) Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22dm mpath: add service time load balancerKiyoshi Ueda
This patch adds a service time oriented dynamic load balancer, dm-service-time, which selects the path with the shortest estimated service time for the incoming I/O. The service time is estimated by dividing the in-flight I/O size by a performance value of each path. The performance value can be given as a table argument at the table loading time. If no performance value is given, all paths are considered equal. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-06-22dm mpath: add queue length load balancerKiyoshi Ueda
This patch adds a dynamic load balancer, dm-queue-length, which balances the number of in-flight I/Os across the paths. The code is based on the patch posted by Stefan Bader: https://www.redhat.com/archives/dm-devel/2005-October/msg00050.html Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-03-31md/raid6: move raid6 data processing to raid6_pq.koDan Williams
Move the raid6 data processing routines into a standalone module (raid6_pq) to prepare them to be called from async_tx wrappers and other non-md drivers/modules. This precludes a circular dependency of raid456 needing the async modules for data processing while those modules in turn depend on raid456 for the base level synchronous raid6 routines. To support this move: 1/ The exportable definitions in raid6.h move to include/linux/raid/pq.h 2/ The raid6_call, recovery calls, and table symbols are exported 3/ Extra #ifdef __KERNEL__ statements to enable the userspace raid6test to compile Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
2009-03-31cleanup drivers/md/MakefileChristoph Hellwig
Use the -y variables instead of the old -objs so we can easily add conditional objects to the modules. Also always use += to add subobjects to avoid problems when placing additional objects in some place in the file. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-06dm snapshot: split out exception store implementationsAlasdair G Kergon
Move the existing snapshot exception store implementations out into separate files. Later patches will place these behind a new interface in preparation for alternative implementations. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm: add name and uuid to sysfsMilan Broz
Implement simple read-only sysfs entry for device-mapper block device. This patch adds a simple sysfs directory named "dm" under block device properties and implements - name attribute (string containing mapped device name) - uuid attribute (string containing UUID, or empty string if not set) The kobject is embedded in mapped_device struct, so no additional memory allocation is needed for initializing sysfs entry. During the processing of sysfs attribute we need to lock mapped device which is done by a new function dm_get_from_kobj, which returns the md associated with kobject and increases the usage count. Each 'show attribute' function is responsible for its own locking. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-10-21dm raid1: separate region_hash interface part1Heinz Mauelshagen
Separate the region hash code from raid1 so it can be shared by forthcoming targets. Use BUG_ON() for failed async dm_io() calls. Signed-off-by: Heinz Mauelshagen <hjm@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-06-05[SCSI] scsi_dh: Remove hardware handler infrastructure from dmChandra Seetharaman
This patch just removes infrastructure that provided support for hardware handlers in the dm layer as it is not needed anymore. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-06-05[SCSI] scsi_dh: Remove hardware handlers from dmChandra Seetharaman
This patch removes the 3 hardware handlers that currently exist under dm as the functionality is moved to SCSI layer in the earlier patches. [jejb: removed more makefile hunks and rejection fixes] Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-04-25dm: move include filesAlasdair G Kergon
Publish the dm-io, dm-log and dm-kcopyd headers in include/linux. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2008-04-25dm log: move dirty region log code into separate moduleHeinz Mauelshagen
Move the dirty region log code into a separate module so other targets can share the code. Signed-off-by: Heinz Mauelshagen <hjm@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2007-10-20dm: add uevent to coreMike Anderson
This patch adds a uevent skeleton to device-mapper. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2007-10-20dm mpath: add hp handlerDave Wysochanski
This patch adds the most basic dm-multipath hardware support for the HP active/passive arrays. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2007-07-13xor: make 'xor_blocks' a library routine for use with async_txDan Williams
The async_tx api tries to use a dma engine for an operation, but will fall back to an optimized software routine otherwise. Xor support is implemented using the raid5 xor routines. For organizational purposes this routine is moved to a common area. The following fixes are also made: * rename xor_block => xor_blocks, suggested by Adrian Bunk * ensure that xor.o initializes before md.o in the built-in case * checkpatch.pl fixes * mark calibrate_xor_blocks __init, Adrian Bunk Cc: Adrian Bunk <bunk@stusta.de> Cc: NeilBrown <neilb@suse.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2007-07-12dm mpath: rdacChandra Seetharaman
This patch supports LSI/Engenio devices in RDAC mode. Like dm-emc it requires userspace support. In your multipath.conf file you must have: path_checker rdac hardware_handler "1 rdac" prio_callout "/sbin/mpath_prio_tpc /dev/%n" And you also then must have a updated multipath tools release which has rdac support. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09dm: delay targetHeinz Mauelshagen
New device-mapper target that can delay I/O (for testing). Reads can be separated from writes, redirected to different underlying devices and delayed by differing amounts of time. Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-06-26[PATCH] md: merge raid5 and raid6 codeNeilBrown
There is a lot of commonality between raid5.c and raid6main.c. This patches merges both into one module called raid456. This saves a lot of code, and paves the way for online raid5->raid6 migrations. There is still duplication, e.g. between handle_stripe5 and handle_stripe6. This will probably be cleaned up later. Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>