summaryrefslogtreecommitdiff
path: root/fs/f2fs/data.c (follow)
Commit message (Collapse)AuthorAge
* f2fs: updates on v4.16-rc1Jaegeuk Kim2018-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull f2fs updates from Jaegeuk Kim: "In this round, we've followed up to support some generic features such as cgroup, block reservation, linking fscrypt_ops, delivering write_hints, and some ioctls. And, we could fix some corner cases in terms of power-cut recovery and subtle deadlocks. Enhancements: - bitmap operations to handle NAT blocks - readahead to improve readdir speed - switch to use fscrypt_* - apply write hints for direct IO - add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid - modify b_avail and b_free to consider root reserved blocks - support cgroup writeback - support FIEMAP_FLAG_XATTR for fibmap - add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents - add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks - support inode creation time Bug fixs: - sysfile-based quota operations - memory footprint accounting - allow to write data on partial preallocation case - fix deadlock case on fallocate - fix to handle fill_super errors - fix missing inode updates of fsync'ed file - recover renamed file which was fsycn'ed before - drop inmemory pages in corner error case - keep last_disk_size correctly - recover missing i_inline flags during roll-forward Various clean-up patches were added as well" Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y: 5f9b3abb911f f2fs: support inode creation time 9fb0de175172 f2fs: rebuild sit page from sit info in mem 1062a0c01829 f2fs: stop issuing discard if fs is readonly fa043fae9030 f2fs: clean up duplicated assignment in init_discard_policy b007190234d6 f2fs: use GFP_F2FS_ZERO for cleanup 35b11839a1ae f2fs: allow to recover node blocks given updated checkpoint e56500860be0 f2fs: recover some i_inline flags 64aa9569a1bf f2fs: correct removexattr behavior for null valued extended attribute 70b3a923daff f2fs: drop page cache after fs shutdown 8069a0e983d9 f2fs: stop gc/discard thread after fs shutdown bb924f777717 f2fs: hanlde error case in f2fs_ioc_shutdown 700b53f21ee8 f2fs: split need_inplace_update f31d52811c1f f2fs: fix to update last_disk_size correctly eeb0118b8340 f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup c1b74c967092 f2fs: clean up error path of fill_super d5efd57e013b f2fs: avoid hungtask when GC encrypted block if io_bits is set c4027d08430b f2fs: allow quota to use reserved blocks 18d267c273a9 f2fs: fix to drop all inmem pages correctly 4dca47531eb0 f2fs: speed up defragment on sparse file 999f806a7c9e f2fs: support F2FS_IOC_PRECACHE_EXTENTS 84960fca96c4 f2fs: add an ioctl to disable GC for specific file 292c8e1cfd4d f2fs: prevent newly created inode from being dirtied incorrectly 58b1f5b0fcf1 f2fs: support FIEMAP_FLAG_XATTR 6afa9a94d09b f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock 10f4a4140b61 f2fs: check node page again in write end io b203c58dfd55 f2fs: fix to caclulate required free section correctly d49132d45cb0 f2fs: handle newly created page when revoking inmem pages 2ce6b9d8167e f2fs: add resgid and resuid to reserve root blocks f53dcf6799ab f2fs: implement cgroup writeback support 1338f376d5a3 f2fs: remove unused pend_list_tag d4f19f6266ab f2fs: avoid high cpu usage in discard thread b78e9302e2e3 f2fs: make local functions static 62438ba87b79 f2fs: add reserved blocks for root user 06a366757ff7 f2fs: check segment type in __f2fs_replace_block 4c6bc4be375a f2fs: update inode info to inode page for new file 591b33638733 f2fs: show precise # of blocks that user/root can use b242d7edc537 f2fs: clean up unneeded declaration 87b8168e9ef0 f2fs: continue to do direct IO if we only preallocate partial blocks 2b4d859bd9d8 f2fs: enable quota at remount from r to w 54bf13a0adcd f2fs: skip stop_checkpoint for user data writes 25ef3006ba23 f2fs: fix missing error number for xattr operation cff2c7fe417b f2fs: recover directory operations by fsync e2bb618a0a6b f2fs: return error during fill_super 8a2c11d8658d f2fs: fix an error case of missing update inode page cd38d5ada5a4 f2fs: fix potential hangtask in f2fs_trace_pid e81cafbeba4b f2fs: no need return value in restore summary process 04d44000d633 f2fs: use unlikely for release case 925d0933d8f0 f2fs: don't return value in truncate_data_blocks_range f7986c416d1b f2fs: clean up f2fs_map_blocks e4f5e26cdadf f2fs: clean up hash codes 1f994d47080c f2fs: fix error handling in fill_super e7db649b5fb1 f2fs: spread f2fs_k{m,z}alloc 5d4e487b9929 f2fs: inject fault to kvmalloc 8b33886c37cd f2fs: inject fault to kzalloc d94680798786 f2fs: remove a redundant conditional expression 3bc01114a338 f2fs: apply write hints to select the type of segment for direct write c80f01959114 f2fs: switch to fscrypt_prepare_setattr() bb8b850365ff f2fs: switch to fscrypt_prepare_lookup() 9ab470eaf8a8 f2fs: switch to fscrypt_prepare_rename() aeaac517a12d f2fs: switch to fscrypt_prepare_link() 101c6a96ad1c f2fs: switch to fscrypt_file_open() 6d025237a1f8 f2fs: remove repeated f2fs_bug_on b01e03d724de f2fs: remove an excess variable e1f9be2f7c82 f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem e5c7c8601030 f2fs: remove unused parameter f130dbb98a68 f2fs: still write data if preallocate only partial blocks 47ee9b259811 f2fs: introduce sysfs readdir_ra to readahead inode block in readdir 55e2f89181ce f2fs: fix concurrent problem for updating free bitmap e1398f6554b4 f2fs: remove unneeded memory footprint accounting 2d69561135f2 f2fs: no need to read nat block if nat_block_bitmap is set 4dd2d0733809 f2fs: reserve nid resource for quota sysfile Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
* f2fs: updates on 4.15-rc1Jaegeuk Kim2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull f2fs updates from Jaegeuk Kim: "In this round, we introduce sysfile-based quota support which is required for Android by default. In addition, we allow that users are able to reserve some blocks in runtime to mitigate performance drops in low free space. Enhancements: - assign proper data segments according to write_hints given by user - issue cache_flush on dirty devices only among multiple devices - exploit cp_error flag and add more faults to enhance fault injection test - conduct more readaheads during f2fs_readdir - add a range for discard commands Bug fixes: - fix zero stat->st_blocks when inline_data is set - drop crypto key and free stale memory pointer while evict_inode is failing - fix some corner cases in free space and segment management - fix wrong last_disk_size This series includes lots of clean-ups and code enhancement in terms of xattr operations, discard/flush command control. In addition, it adds versatile debugfs entries to monitor f2fs status" Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y: 56a07b070510 f2fs: deny accessing encryption policy if encryption is off c394842e26e5 f2fs: inject fault in inc_valid_node_count 926292251022 f2fs: fix to clear FI_NO_PREALLOC e6cfc5de2d05 f2fs: expose quota information in debugfs c4cd2efe835b f2fs: separate nat entry mem alloc from nat_tree_lock 48c72b4c8c50 f2fs: validate before set/clear free nat bitmap baf9275a4bbd f2fs: avoid opened loop codes in __add_ino_entry 47af6c72d944 f2fs: apply write hints to select the type of segments for buffered write ac9819160586 f2fs: introduce scan_curseg_cache for cleanup ca28e9670e80 f2fs: optimize the way of traversing free_nid_bitmap 460688b59e8b f2fs: keep scanning until enough free nids are acquired 0186182c0c4d f2fs: trace checkpoint reason in fsync() 5d4b6efcfd09 f2fs: keep isize once block is reserved cross EOF 3c8f767e1374 f2fs: avoid race in between GC and block exchange 4423778adf0e f2fs: save a multiplication for last_nid calculation 3e3b40557525 f2fs: fix summary info corruption 44889e487981 f2fs: remove dead code in update_meta_page 55c7b9595bb9 f2fs: remove unneeded semicolon 8b92814117d5 f2fs: don't bother with inode->i_version 42c7c71824fc f2fs: check curseg space before foreground GC c5470498e59b f2fs: use rw_semaphore to protect SIT cache 82750d346ab7 f2fs: support quota sys files 26dfec49b25a f2fs: add quota_ino feature infra ddb8e2ae9811 f2fs: optimize __update_nat_bits f46ae958c701 f2fs: modify for accurate fggc node io stat c713fdb5a23c Revert "f2fs: handle dirty segments inside refresh_sit_entry" 873ec505cb07 f2fs: add a function to move nid ae66786296b4 f2fs: export SSR allocation threshold 90c28a18d2a4 f2fs: give correct trimmed blocks in fstrim 5612922fb0ac f2fs: support bio allocation error injection 583b7a274c27 f2fs: support get_page error injection 09a073cc8c56 f2fs: add missing sysfs description e945474a9c1b f2fs: support soft block reservation b7b2e629b6f6 f2fs: handle error case when adding xattr entry 7368e30495c5 f2fs: support flexible inline xattr size ada4061e191b f2fs: show current cp state 5b8ff1301a61 f2fs: add missing quota_initialize 46d4a691f035 f2fs: show # of dirty segments via sysfs fc13f9d7ce1e f2fs: stop all the operations by cp_error flag 91bea0c391b3 f2fs: remove several redundant assignments 807486c79534 f2fs: avoid using timespec 03b1cb0bb4a2 f2fs: fix to correct no_fggc_candidate 5c15033ceaea Revert "f2fs: return wrong error number on f2fs_quota_write" 5f5f59322240 f2fs: remove obsolete pointer for truncate_xattr_node 032a6906825a f2fs: retry ENOMEM for quota_read|write 171b638fc49b f2fs: limit # of inmemory pages 83ed7a615f0a f2fs: update ctx->pos correctly when hitting hole in directory 4d6e68be2534 f2fs: relocate readahead codes in readdir() c8be47b54018 f2fs: allow readdir() to be interrupted 2b903fe94cd0 f2fs: trace f2fs_readdir bb0db666d4bc f2fs: trace f2fs_lookup 40d6250f046a f2fs: skip searching non-exist range in truncate_hole 8e84f379df61 f2fs: expose some sectors to user in inline data or dentry case cb98f70dea02 f2fs: avoid stale fi->gdirty_list pointer 5562a3c53963 f2fs/crypto: drop crypto key at evict_inode only 85853e7e38d7 f2fs: fix to avoid race when accessing last_disk_size 0c47a892d555 f2fs: Fix bool initialization/comparison 68e801abc520 f2fs: give up CP_TRIMMED_FLAG if it drops discards df74eacb2075 f2fs: trace f2fs_remove_discard bd502c6e3e7a f2fs: reduce cmd_lock coverage in __issue_discard_cmd a34ab5ca4f94 f2fs: split discard policy 1e65afd14d32 f2fs: wrap discard policy 684447dad138 f2fs: support issuing/waiting discard in range 27eaad09380f f2fs: fix to flush multiple device in checkpoint 08bb9d68d51b f2fs: enhance multiple device flush 9c2526ac2ecb f2fs: fix to show ino management cache size correctly 814b463d262f f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush f555b0a117d3 f2fs: obsolete ALLOC_NID_LIST list 75d3164ae128 f2fs: convert inline data for direct I/O & FI_NO_PREALLOC 4de0ceb6b7ef f2fs: allow readpages with NULL file pointer 322a45d17212 f2fs: show flush list status in sysfs 6d625a93b4a8 f2fs: introduce read_xattr_block 8ea6e1c327c5 f2fs: introduce read_inline_xattr dbce11e9ee5b Revert "f2fs: reuse nids more aggressively" 131bc9f6b7f9 Revert "f2fs: node segment is prior to data segment selected victim" Change-Id: I93b9cd867b859a667a448b39299ff44a2b841b8c Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
* f2fs: catch up to v4.14-rc1Jaegeuk Kim2017-10-03
| | | | | | | | | | | | | | | This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y. Changes include: commit c7fd9e2b4a6876 ("f2fs: hurry up to issue discard after io interruption") commit 603dde39653d6d ("f2fs: fix to show correct discard_granularity in sysfs") ... commit 565f0225f95f15 ("f2fs: factor out discard command info into discard_cmd_control") commit c4cc29d19eaf01 ("f2fs: remove batched discard in f2fs_trim_fs") Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5 Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
* f2fs: introduce FI_ATOMIC_COMMITChao Yu2017-10-03
| | | | | | | | | | | | | commit 5fe457430e554a2f5188f13c1a2e36ad845640c5 upstream. This patch introduces a new flag to indicate inode status of doing atomic write committing, so that, we can keep atomic write status for inode during atomic committing, then we can skip GCing pages of atomic write inode, that avoids random GCed datas being mixed with current transaction, so isolation of transaction can be kept. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: clean up with list_{first, last}_entryChao Yu2017-10-03
| | | | | | | commit 939afa943c5290a3b92f01612a792af17bc98115 upstream. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support IO alignment for DATA and NODE writesJaegeuk Kim2017-10-03
| | | | | | | | | | | | | | | | | commit 0a595ebaaa6b53a2226d3fee2a2fd616ea5ba378 upstream. This patch implements IO alignment by filling dummy blocks in DATA and NODE write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs, we can eliminate underlying dummy page problem which FTL conducts in order to close MLC or TLC partial written pages. Note that, - it requires "-o mode=lfs". - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256. - read IO is still 4KB. - do checkpoint at fsync, if dummy NODE page was written. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add submit_bio tracepointJaegeuk Kim2017-10-03
| | | | | | | | | commit 554b5125f5cfca6653461fd52bad24d4ef35ec29 upstream. This patch adds final submit_bio() tracepoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add a case of no need to read a page in write beginYunlei He2017-10-03
| | | | | | | | | | | commit 746e2403927efbd7c7f2e796314e3cfb3cfabaa4 upstream. If the range we write cover the whole valid data in the last page, we do not need to read it. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: resolve op and op_flags confilctsJaegeuk Kim2017-10-03
| | | | | | | | commit 70fd76140a6cb63262bd47b68d57b42e889c10ee upstream. This patch backported ("block,fs: use REQ_* flags directly") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: return AOP_WRITEPAGE_ACTIVATE for writepageChao Yu2017-09-25
| | | | | | | | | | commit 0002b61bdaac732bcff364a18f5bd57c95def0a5 upstream. We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't wait writeback for datas during checkpointChao Yu2017-09-25
| | | | | | | | | | | | | | | | | | commit 36951b38d13ac7cce9fcf89e0e01c22ed0d05688 upstream. Normally, while committing checkpoint, we will wait on all pages to be writebacked no matter the page is data or metadata, so in scenario where there are lots of data IO being submitted with metadata, we may suffer long latency for waiting writeback during checkpoint. Indeed, we only care about persistence for pages with metadata, but not pages with data, as file system consistent are only related to metadate, so in order to avoid encountering long latency in above scenario, let's recognize and reference metadata in submitted IOs, wait writeback only for metadatas. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix redundant block allocationJaegeuk Kim2017-09-25
| | | | | | | | | | | | | | | | | | commit c040ff9d69fd1d782fe577ba9e35c1f5798158ae upstream. In direct_IO path of f2fs_file_write_iter(), 1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO) -> allocate LBA X 2. f2fs_direct_IO() -> return 0; Then, f2fs_write_data_page() will allocate another LBA X+1. This makes EIO triggered by HM-SMR. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use err for f2fs_preallocate_blocksJaegeuk Kim2017-09-25
| | | | | | | | commit a7de608691f766cd148971a71d4f13aa1692d4c8 upstream. This patch has no functional change. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support multiple devicesJaegeuk Kim2017-09-25
| | | | | | | | | | | | | | commit 3c62be17d4f562f43fe1d03b48194399caa35aa5 upstream. This patch implements multiple devices support for f2fs. Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big volume under one f2fs instance. Internal block management is very simple, but we will modify block allocation and background GC policy to boost IO speed by exploiting them accoording to each device speed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: allow dio read for LFS modeJaegeuk Kim2017-09-25
| | | | | | | | | commit e57e9ae5b179a6b243c42bf6d9549d1595c27089 upstream. We can allow dio reads for LFS mode, while doing buffered writes for dio writes. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: revert segment allocation for direct IOJaegeuk Kim2017-09-25
| | | | | | | | | | | | commit 6ae1be13e85f4c42c8ca371fda50ae39eebbfd96 upstream. Now we don't need to be too much careful about storage alignment for dio, since its speed becomes quite fast and we'd better avoid any misalignment first. Revert: 38aa0889b250 (f2fs: align direct_io'ed data to section) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: Use generic zoned block device terminologyDamien Le Moal2017-09-25
| | | | | | | | | | | | | | commit 0bfd7a091c19132489a0f977b8dbf9f6b5ae0a1c upstream. SMR stands for "Shingled Magnetic Recording" which makes sense only for hard disk drives (spinning rust). The ZBC/ZAC standards enable management of SMR disks, but solid state drives may also support those standards. So rename the HMSMR feature to BLKZONED to avoid a HDD centric terminology. For the same reason, rename f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: hide a maybe-uninitialized warningArnd Bergmann2017-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | commit 230436b3ef3fd7d4a1da19edf5e87bb2d74e0fc2 upstream. gcc is unsure about the use of last_ofs_in_node, which might happen without a prior initialization: fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’: fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) { As pointed out by Chao Yu, the code is actually correct as 'prealloc' is only set if the last_ofs_in_node has been set, the two always get updated together. This initializes last_ofs_in_node to dn.ofs_in_node for each new dnode at the start of the 'next_block' loop, which at that point is a correct initialization as well. I assume that compilers that correctly track the contents of the variables and do not warn about the condition also figure out that they can eliminate the extra assignment here. Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use BIO_MAX_PAGES for bio allocationJaegeuk Kim2017-09-25
| | | | | | | | commit 664ba972df9b96942191db3068274cc1db899774 upstream. We don't need to allocate bio partially in order to maximize sequential writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: be aware of extent beyond EOF in fiemapChao Yu2017-09-25
| | | | | | | | | | | commit 58736fa60f6ae659ac72da8b1580c308b47e8edd upstream. f2fs can support fallocating blocks beyond file size without changing the size, but ->fiemap of f2fs was restricted and can't detect these extents fallocated past EOF, now relieve the restriction. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't miss any f2fs_balance_fs casesChao Yu2017-09-25
| | | | | | | | | | commit 6f2d8ed654bfa391854df4de854953f772a16a9d upstream. In f2fs_map_blocks, let f2fs_balance_fs detects node page modification with dn.node_changed to avoid miss some corner cases. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: give a chance to detach from dirty listChao Yu2017-09-25
| | | | | | | | | | | commit 933439c8f3474e329709b715b43b0b8168bbecf8 upstream. If there is no dirty pages in inode, we should give a chance to detach the inode from global dirty list, otherwise it needs to call another unnecessary .writepages for detaching. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of ↵Jaegeuk Kim2017-09-25
| | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* ANDROID: Refactor fs readpage/write tracepoints.Mohan Srinivasan2017-02-10
| | | | | | | | | | | Refactor the fs readpage/write tracepoints to move the inode->path lookup outside the tracepoint code, and pass a pointer to the path into the tracepoint code instead. This is necessary because the tracepoint code runs non-preemptible. Thanks to Trilok Soni for catching this in 4.4. Change-Id: I7486c5947918d155a30c61d6b9cd5027cf8fbe15 Signed-off-by: Mohan Srinivasan <srmohan@google.com>
* ANDROID: fs: FS tracepoints to track IO.Mohan Srinivasan2016-09-20
| | | | | | | | | Adds tracepoints in ext4/f2fs/mpage to track readpages/buffered write()s. This allows us to track files that are being read/written to PIDs. Change-Id: I26bd36f933108927d6903da04d8cb42fd9c3ef3d Signed-off-by: Mohan Srinivasan <srmohan@google.com>
* f2fs: support fiemap for inline_dataJaegeuk Kim2015-10-20
| | | | | | | There is a FIEMAP_EXTENT_INLINE_DATA, pointed out by Marc. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: flush dirty data for bmapJaegeuk Kim2015-10-20
| | | | | | | | Users expect bmap will give allocated block addresses. Let's play likewise ext4. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs crypto: fix racing of accessing encrypted page amongChao Yu2015-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | different competitors Since we use different page cache (normally inode's page cache for R/W and meta inode's page cache for GC) to cache the same physical block which is belong to an encrypted inode. Writeback of these two page cache should be exclusive, but now we didn't handle writeback state well, so there may be potential racing problem: a) kworker: f2fs_gc: - f2fs_write_data_pages - f2fs_write_data_page - do_write_data_page - write_data_page - f2fs_submit_page_mbio (page#1 in inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) - gc_data_segment - move_encrypted_block - pagecache_get_page (page#2 in meta inode's page cache was cached with the invalid datas of physical block located in new blkaddr) - f2fs_submit_page_mbio (page#1 was submitted, later, page#2 with invalid data will be submitted) b) f2fs_gc: - gc_data_segment - move_encrypted_block - f2fs_submit_page_mbio (page#1 in meta inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) user thread: - f2fs_write_begin - f2fs_submit_page_bio (we submit the request to block layer to update page#2 in inode's page cache with physical block located in new blkaddr, so here we may read gabbage data from new blkaddr since GC hasn't writebacked the page#1 yet) This patch fixes above potential racing problem for encrypted inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add a tracepoint for f2fs_read_data_pagesChao Yu2015-10-12
| | | | | | | | This patch adds a tracepoint for f2fs_read_data_pages to trace when pages are readahead by VFS. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: set GFP_NOFS for grab_cache_pageJaegeuk Kim2015-10-12
| | | | | | | | | | | | For normal inodes, their pages are allocated with __GFP_FS, which can cause filesystem calls when reclaiming memory. This can incur a dead lock condition accordingly. So, this patch addresses this problem by introducing f2fs_grab_cache_page(.., bool for_write), which calls grab_cache_page_write_begin() with AOP_FLAG_NOFS. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Revert "f2fs: do not skip dentry block writes"Jaegeuk Kim2015-10-12
| | | | | | | | | The periodic checkpoint can resolve the previous issue. So, now we can use this again to improve the reported performance regression: https://lkml.org/lkml/2015/10/8/20 This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.
* f2fs: do not skip dentry block writesJaegeuk Kim2015-10-09
| | | | | | | | | | | | | | Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory pressure and the number of dirty pages is pretty small. But, we didn't skip for normal data writes, which gives us not much big impact on overall performance. Moreover, by skipping some data writes, kworker falls into infinite loop to try to write blocks, when many dir inodes have only one dentry block. So, this patch removes skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use correct flag in f2fs_map_blocks()Chao Yu2015-10-09
| | | | | | | | | We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc8865 ("f2fs: fix incorrect mapping for bmap"), but forget to use this flag in the right place, fix it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to handle io error in ->direct_IOChao Yu2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is a oops reported as following message when testing generic/019 of xfstest: ------------[ cut here ]------------ kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882! invalid opcode: 0000 [#1] SMP Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_def CPU: 2 PID: 25441 Comm: fio Tainted: G O 4.3.0-rc1+ #6 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000 RIP: 0010:[<ffffffffa0784981>] [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP: 0018:ffff8803fd61f918 EFLAGS: 00010246 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001 FS: 00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0 Stack: 000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b 0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000 ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358 Call Trace: [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs] [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs] [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs] [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs] [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160 [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0 [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200 [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20 [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs] [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs] [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290 [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50 [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0 [<ffffffff8120b913>] do_io_submit+0x373/0x630 [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0 [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20 [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb RIP [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs] RSP <ffff8803fd61f918> ---[ end trace 2e577d7f711ddb86 ]--- The reason is that: in the test of generic/019, we will trigger a manmade IO error in block layer through debugfs, after that, prefree segment will no longer be freed, because we always skip doing gc or checkpoint when there occurs an IO error. Meanwhile fio with aio engine generated a large number of direct IOs, which continue allocating spaces in free segment until we run out of them, eventually, results in panic in new_curseg as no more free segment was found. So, this patch changes to return EIO in direct_IO for this condition. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: reorganize f2fs_map_blocksChao Yu2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In this patch, we try to reorganize f2fs_map_blocks to make block mapping flow more clear by using following structure: /* check status of mapping */ if (unmapped) { /* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */ if (create) { /* write path, handle dio write case here */ alloc_and_map; } else { /* * handle read cases from all call paths: * 1. generic read; * 2. dio read; * 3. fiemap; * 4. bmap */ } } /* map buffer_header */ Besides, this patch handles the missing case correctly for dio write: When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will not allocate blocks correctly for preallocated blocks, but returning with an unmapped buffer head, which will result in failure of dio write. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix overflow of size calculationChao Yu2015-10-09
| | | | | | | | | | | | | | | | We have potential overflow issue when calculating size of object, when we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only 32-bits space in 32-bit architecture, left shifting will incur overflow, i.e: pgoff_t index = 0xFFFFFFFF; loff_t size = index << PAGE_CACHE_SHIFT; size: 0xFFFFF000 So we should cast index with 64-bits type to avoid this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Merge tag 'for-f2fs-4.3' of ↵Linus Torvalds2015-09-03
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The major work includes fixing and enhancing the existing extent_cache feature, which has been well settling down so far and now it becomes a default mount option accordingly. Also, this version newly registers a f2fs memory shrinker to reclaim several objects consumed by a couple of data structures in order to avoid memory pressures. Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which triggers a cleaning job explicitly by users. Most of the other patches are to fix bugs occurred in the corner cases across the whole code area" * tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits) f2fs: upset segment_info repair f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent f2fs: update extent tree in batches f2fs: fix to release inode correctly f2fs: handle f2fs_truncate error correctly f2fs: avoid unneeded initializing when converting inline dentry f2fs: atomically set inode->i_flags f2fs: fix wrong pointer access during try_to_free_nids f2fs: use __GFP_NOFAIL to avoid infinite loop f2fs: lookup neighbor extent nodes for merging later f2fs: split __insert_extent_tree_ret for readability f2fs: kill dead code in __insert_extent_tree f2fs: adjust showing of extent cache stat f2fs: add largest/cached stat in extent cache f2fs: fix incorrect mapping for bmap f2fs: add annotation for space utilization of regular/inline dentry f2fs: fix to update cached_en of extent tree properly f2fs: fix typo f2fs: check the node block address of newly allocated nid f2fs: go out for insert_inode_locked failure ...
| * f2fs: fix incorrect mapping for bmapChao Yu2015-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test step is like below: 1. touch file 2. truncate -s $((1024*1024)) file 3. fallocate -o 0 -l $((1024*1024)) file 4. fibmap.f2fs file Our result of fibmap.f2fs showed below is not correct: file_pos start_blk end_blk blks 0 -937166132 -937166132 1 4096 -937166132 -937166132 1 8192 -937166132 -937166132 1 12288 -937166132 -937166132 1 16384 -937166132 -937166132 1 20480 -937166132 -937166132 1 ... 1040384 -937166132 -937166132 1 1044480 -937166132 -937166132 1 This is because f2fs_map_blocks will return with no error when meeting a hole or preallocated block, the caller __get_data_block will map the uninitialized variable value to bh->b_blocknr. Unfortunately generic_block_bmap will neither check the return value of get_data() nor check mapping info of buffer_head, result in returning the random block address. After fixing the issue, our result shows correctly: file_pos start_blk end_blk blks 0 0 0 256 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: handle failed bio allocationJaegeuk Kim2015-08-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: remove inmem radix treeChao Yu2015-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit 042b7816aaeb ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: report EINVAL for unalignment direct IOChao Yu2015-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in detail is as fallow: dio04 <<<test_start>>> tag=dio04 stime=1432278894 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TFAIL : diotest4.c:129: write allows odd count.returns 1: Success diotest4 4 TFAIL : diotest4.c:183: Odd count of read and write diotest4 5 TPASS : Read beyond the file size ...... the result of ext4 with same environment: dio04 <<<test_start>>> tag=dio04 stime=1432259643 cmdline="diotest4" contacts="" analysis=exit <<<test_output>>> diotest4 1 TPASS : Negative Offset diotest4 2 TPASS : removed diotest4 3 TPASS : Odd count of read and write diotest4 4 TPASS : Read beyond the file size ...... The reason is that when triggering DIO in f2fs, we will return zero value in ->direct_IO if writer's buffer offset, file offset and transfer size is not alignment to block size of filesystem, resulting in falling back into buffered write instead of returning -EINVAL. This patch fixes that problem by returning correct error number for above case, and removing the judgement condition in check_direct_IO to make sure the verification will be enabled for direct reader too. Besides, Jaegeuk Kim pointed out that there is expectional cases we should always make direct-io falling back into buffered write, such as dio in encrypted file. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Chao Yu make small change and add detail description in commit message] Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: use extent cache to optimize f2fs_reserve_blockFan Li2015-08-05
| | | | | | | | | | | | | | | | | | | | | | In some cases, we only need the block address when we call f2fs_reserve_block, other fields of struct dnode_of_data aren't necessary. We can try extent cache first for such cases in order to speed up the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix to release inode page correctlyChao Yu2015-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In following call path, we will pass a locked and referenced ipage pointer to get_new_data_page: - init_inode_metadata - make_empty_dir - get_new_data_page There are two exit paths in get_new_data_page when error occurs: 1) grab_cache_page fails, ipage will not be released; 2) f2fs_reserve_block fails, ipage will be released in callee. So, it's not consistent for error handling in get_new_data_page. For f2fs_reserve_block, it's not very easy to change the rule of error handling, since it's already complicated. Here we deside to choose an easy way to fix this issue: If any error occur in get_new_data_page, we will ensure releasing ipage in this function. The same issue is in f2fs_convert_inline_dir, fix that too. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: change the timing of f2fs_wait_on_page_writebackFan Li2015-08-05
| | | | | | | | | | | | | | | | | | | | | | some backing devices need pages to be stable during writeback. It doesn't matter if the page is completely overwritten or already uptodate, it needs to wait before write. Signed-off-by: Fan li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: skip writing in ->writepages when no dirty pages existChao Yu2015-08-05
| | | | | | | | | | | | | | | | | | When flushing comes from background, if there is no dirty page in the mapping of inode, we'd better to skip seeking dirty page from mapping for writebacking. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: optimize f2fs_write_cache_pagesTiezhu Yang2015-08-05
| | | | | | | | | | | | | | | | | | | | | | | | The if statement "goto continue_unlock" is exactly the same when each if condition is true that is depended on the value of both "step" and "is_cold_data(page)" are 0 or 1. That means when the value of "step" equals to "is_cold_data(page)", the if condition is true and the if statement "goto continue_unlock" appears only once, so it can be optimized to reduce the duplicated code. Signed-off-by: Tiezhu Yang <kernelpatch@126.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: callers take care of the page from bio errorJaegeuk Kim2015-08-05
| | | | | | | | | | | | | | This patch changes for a caller to handle the page after its bio gets an error. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: expose f2fs_write_cache_pagesChao Yu2015-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are gced dirty pages and normal dirty pages in the mapping of one inode, we might writeback them alternately with discontinuous block address, resulting in low performance. This patch introduces f2fs_write_cache_pages with codes copied from write_cache_pages in mm/page-writeback.c. In this function, we refactor flow with two steps: 1) writeback all cold type pages. 2) writeback all non-cold type pages. By using this method, f2fs will writeback dirty pages with the same temperature in bunch mode, it makes writeouted block being with more continuous address, so they can be merged as much as possible in f2fs bio cache, and also it will reduce the chance of submiting small IO from block layer. Test environment: 8g nokia sd card (very old sd card, but it shows better effect when testing with this patch, and with a 32g kingston sd card, I didn't see much more improvement). Test step: 1. touch testfile; 2. truncate -s 512K testfile; 3. write all pages with odd index; 4. trigger gc by ioctl; 5. write all pages with even index; 6. time fsync testfile. before: real 0m0.402s user 0m0.000s sys 0m0.000s after: real 0m0.143s user 0m0.004s sys 0m0.004s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: maintain extent cache in separated fileChao Yu2015-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves extent cache related code from data.c into extent_cache.c since extent cache is independent feature, and its codes are not relate to others in data.c, it's better for us to maintain them in separated place. There is no functionality change, but several small coding style fixes including: * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting; * rename misspelled word 'untill' to 'until'; * remove unneeded 'return' in the end of f2fs_destroy_extent_tree(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: don't try to split extents shorter than F2FS_MIN_EXTENT_LENFan Li2015-08-04
| | | | | | | | | | | | | | | | | | | | Since only parts of extents longer than F2FS_MIN_EXTENT_LEN will be kept in extent cache after split, extents already shorter than F2FS_MIN_EXTENT_LEN don't need to try split at all. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>