summaryrefslogtreecommitdiff
path: root/fs/f2fs/node.c (follow)
Commit message (Collapse)AuthorAge
* f2fs: updates on v4.16-rc1Jaegeuk Kim2018-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull f2fs updates from Jaegeuk Kim: "In this round, we've followed up to support some generic features such as cgroup, block reservation, linking fscrypt_ops, delivering write_hints, and some ioctls. And, we could fix some corner cases in terms of power-cut recovery and subtle deadlocks. Enhancements: - bitmap operations to handle NAT blocks - readahead to improve readdir speed - switch to use fscrypt_* - apply write hints for direct IO - add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid - modify b_avail and b_free to consider root reserved blocks - support cgroup writeback - support FIEMAP_FLAG_XATTR for fibmap - add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents - add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks - support inode creation time Bug fixs: - sysfile-based quota operations - memory footprint accounting - allow to write data on partial preallocation case - fix deadlock case on fallocate - fix to handle fill_super errors - fix missing inode updates of fsync'ed file - recover renamed file which was fsycn'ed before - drop inmemory pages in corner error case - keep last_disk_size correctly - recover missing i_inline flags during roll-forward Various clean-up patches were added as well" Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y: 5f9b3abb911f f2fs: support inode creation time 9fb0de175172 f2fs: rebuild sit page from sit info in mem 1062a0c01829 f2fs: stop issuing discard if fs is readonly fa043fae9030 f2fs: clean up duplicated assignment in init_discard_policy b007190234d6 f2fs: use GFP_F2FS_ZERO for cleanup 35b11839a1ae f2fs: allow to recover node blocks given updated checkpoint e56500860be0 f2fs: recover some i_inline flags 64aa9569a1bf f2fs: correct removexattr behavior for null valued extended attribute 70b3a923daff f2fs: drop page cache after fs shutdown 8069a0e983d9 f2fs: stop gc/discard thread after fs shutdown bb924f777717 f2fs: hanlde error case in f2fs_ioc_shutdown 700b53f21ee8 f2fs: split need_inplace_update f31d52811c1f f2fs: fix to update last_disk_size correctly eeb0118b8340 f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup c1b74c967092 f2fs: clean up error path of fill_super d5efd57e013b f2fs: avoid hungtask when GC encrypted block if io_bits is set c4027d08430b f2fs: allow quota to use reserved blocks 18d267c273a9 f2fs: fix to drop all inmem pages correctly 4dca47531eb0 f2fs: speed up defragment on sparse file 999f806a7c9e f2fs: support F2FS_IOC_PRECACHE_EXTENTS 84960fca96c4 f2fs: add an ioctl to disable GC for specific file 292c8e1cfd4d f2fs: prevent newly created inode from being dirtied incorrectly 58b1f5b0fcf1 f2fs: support FIEMAP_FLAG_XATTR 6afa9a94d09b f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock 10f4a4140b61 f2fs: check node page again in write end io b203c58dfd55 f2fs: fix to caclulate required free section correctly d49132d45cb0 f2fs: handle newly created page when revoking inmem pages 2ce6b9d8167e f2fs: add resgid and resuid to reserve root blocks f53dcf6799ab f2fs: implement cgroup writeback support 1338f376d5a3 f2fs: remove unused pend_list_tag d4f19f6266ab f2fs: avoid high cpu usage in discard thread b78e9302e2e3 f2fs: make local functions static 62438ba87b79 f2fs: add reserved blocks for root user 06a366757ff7 f2fs: check segment type in __f2fs_replace_block 4c6bc4be375a f2fs: update inode info to inode page for new file 591b33638733 f2fs: show precise # of blocks that user/root can use b242d7edc537 f2fs: clean up unneeded declaration 87b8168e9ef0 f2fs: continue to do direct IO if we only preallocate partial blocks 2b4d859bd9d8 f2fs: enable quota at remount from r to w 54bf13a0adcd f2fs: skip stop_checkpoint for user data writes 25ef3006ba23 f2fs: fix missing error number for xattr operation cff2c7fe417b f2fs: recover directory operations by fsync e2bb618a0a6b f2fs: return error during fill_super 8a2c11d8658d f2fs: fix an error case of missing update inode page cd38d5ada5a4 f2fs: fix potential hangtask in f2fs_trace_pid e81cafbeba4b f2fs: no need return value in restore summary process 04d44000d633 f2fs: use unlikely for release case 925d0933d8f0 f2fs: don't return value in truncate_data_blocks_range f7986c416d1b f2fs: clean up f2fs_map_blocks e4f5e26cdadf f2fs: clean up hash codes 1f994d47080c f2fs: fix error handling in fill_super e7db649b5fb1 f2fs: spread f2fs_k{m,z}alloc 5d4e487b9929 f2fs: inject fault to kvmalloc 8b33886c37cd f2fs: inject fault to kzalloc d94680798786 f2fs: remove a redundant conditional expression 3bc01114a338 f2fs: apply write hints to select the type of segment for direct write c80f01959114 f2fs: switch to fscrypt_prepare_setattr() bb8b850365ff f2fs: switch to fscrypt_prepare_lookup() 9ab470eaf8a8 f2fs: switch to fscrypt_prepare_rename() aeaac517a12d f2fs: switch to fscrypt_prepare_link() 101c6a96ad1c f2fs: switch to fscrypt_file_open() 6d025237a1f8 f2fs: remove repeated f2fs_bug_on b01e03d724de f2fs: remove an excess variable e1f9be2f7c82 f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem e5c7c8601030 f2fs: remove unused parameter f130dbb98a68 f2fs: still write data if preallocate only partial blocks 47ee9b259811 f2fs: introduce sysfs readdir_ra to readahead inode block in readdir 55e2f89181ce f2fs: fix concurrent problem for updating free bitmap e1398f6554b4 f2fs: remove unneeded memory footprint accounting 2d69561135f2 f2fs: no need to read nat block if nat_block_bitmap is set 4dd2d0733809 f2fs: reserve nid resource for quota sysfile Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
* f2fs: updates on 4.15-rc1Jaegeuk Kim2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull f2fs updates from Jaegeuk Kim: "In this round, we introduce sysfile-based quota support which is required for Android by default. In addition, we allow that users are able to reserve some blocks in runtime to mitigate performance drops in low free space. Enhancements: - assign proper data segments according to write_hints given by user - issue cache_flush on dirty devices only among multiple devices - exploit cp_error flag and add more faults to enhance fault injection test - conduct more readaheads during f2fs_readdir - add a range for discard commands Bug fixes: - fix zero stat->st_blocks when inline_data is set - drop crypto key and free stale memory pointer while evict_inode is failing - fix some corner cases in free space and segment management - fix wrong last_disk_size This series includes lots of clean-ups and code enhancement in terms of xattr operations, discard/flush command control. In addition, it adds versatile debugfs entries to monitor f2fs status" Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y: 56a07b070510 f2fs: deny accessing encryption policy if encryption is off c394842e26e5 f2fs: inject fault in inc_valid_node_count 926292251022 f2fs: fix to clear FI_NO_PREALLOC e6cfc5de2d05 f2fs: expose quota information in debugfs c4cd2efe835b f2fs: separate nat entry mem alloc from nat_tree_lock 48c72b4c8c50 f2fs: validate before set/clear free nat bitmap baf9275a4bbd f2fs: avoid opened loop codes in __add_ino_entry 47af6c72d944 f2fs: apply write hints to select the type of segments for buffered write ac9819160586 f2fs: introduce scan_curseg_cache for cleanup ca28e9670e80 f2fs: optimize the way of traversing free_nid_bitmap 460688b59e8b f2fs: keep scanning until enough free nids are acquired 0186182c0c4d f2fs: trace checkpoint reason in fsync() 5d4b6efcfd09 f2fs: keep isize once block is reserved cross EOF 3c8f767e1374 f2fs: avoid race in between GC and block exchange 4423778adf0e f2fs: save a multiplication for last_nid calculation 3e3b40557525 f2fs: fix summary info corruption 44889e487981 f2fs: remove dead code in update_meta_page 55c7b9595bb9 f2fs: remove unneeded semicolon 8b92814117d5 f2fs: don't bother with inode->i_version 42c7c71824fc f2fs: check curseg space before foreground GC c5470498e59b f2fs: use rw_semaphore to protect SIT cache 82750d346ab7 f2fs: support quota sys files 26dfec49b25a f2fs: add quota_ino feature infra ddb8e2ae9811 f2fs: optimize __update_nat_bits f46ae958c701 f2fs: modify for accurate fggc node io stat c713fdb5a23c Revert "f2fs: handle dirty segments inside refresh_sit_entry" 873ec505cb07 f2fs: add a function to move nid ae66786296b4 f2fs: export SSR allocation threshold 90c28a18d2a4 f2fs: give correct trimmed blocks in fstrim 5612922fb0ac f2fs: support bio allocation error injection 583b7a274c27 f2fs: support get_page error injection 09a073cc8c56 f2fs: add missing sysfs description e945474a9c1b f2fs: support soft block reservation b7b2e629b6f6 f2fs: handle error case when adding xattr entry 7368e30495c5 f2fs: support flexible inline xattr size ada4061e191b f2fs: show current cp state 5b8ff1301a61 f2fs: add missing quota_initialize 46d4a691f035 f2fs: show # of dirty segments via sysfs fc13f9d7ce1e f2fs: stop all the operations by cp_error flag 91bea0c391b3 f2fs: remove several redundant assignments 807486c79534 f2fs: avoid using timespec 03b1cb0bb4a2 f2fs: fix to correct no_fggc_candidate 5c15033ceaea Revert "f2fs: return wrong error number on f2fs_quota_write" 5f5f59322240 f2fs: remove obsolete pointer for truncate_xattr_node 032a6906825a f2fs: retry ENOMEM for quota_read|write 171b638fc49b f2fs: limit # of inmemory pages 83ed7a615f0a f2fs: update ctx->pos correctly when hitting hole in directory 4d6e68be2534 f2fs: relocate readahead codes in readdir() c8be47b54018 f2fs: allow readdir() to be interrupted 2b903fe94cd0 f2fs: trace f2fs_readdir bb0db666d4bc f2fs: trace f2fs_lookup 40d6250f046a f2fs: skip searching non-exist range in truncate_hole 8e84f379df61 f2fs: expose some sectors to user in inline data or dentry case cb98f70dea02 f2fs: avoid stale fi->gdirty_list pointer 5562a3c53963 f2fs/crypto: drop crypto key at evict_inode only 85853e7e38d7 f2fs: fix to avoid race when accessing last_disk_size 0c47a892d555 f2fs: Fix bool initialization/comparison 68e801abc520 f2fs: give up CP_TRIMMED_FLAG if it drops discards df74eacb2075 f2fs: trace f2fs_remove_discard bd502c6e3e7a f2fs: reduce cmd_lock coverage in __issue_discard_cmd a34ab5ca4f94 f2fs: split discard policy 1e65afd14d32 f2fs: wrap discard policy 684447dad138 f2fs: support issuing/waiting discard in range 27eaad09380f f2fs: fix to flush multiple device in checkpoint 08bb9d68d51b f2fs: enhance multiple device flush 9c2526ac2ecb f2fs: fix to show ino management cache size correctly 814b463d262f f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush f555b0a117d3 f2fs: obsolete ALLOC_NID_LIST list 75d3164ae128 f2fs: convert inline data for direct I/O & FI_NO_PREALLOC 4de0ceb6b7ef f2fs: allow readpages with NULL file pointer 322a45d17212 f2fs: show flush list status in sysfs 6d625a93b4a8 f2fs: introduce read_xattr_block 8ea6e1c327c5 f2fs: introduce read_inline_xattr dbce11e9ee5b Revert "f2fs: reuse nids more aggressively" 131bc9f6b7f9 Revert "f2fs: node segment is prior to data segment selected victim" Change-Id: I93b9cd867b859a667a448b39299ff44a2b841b8c Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
* f2fs: catch up to v4.14-rc1Jaegeuk Kim2017-10-03
| | | | | | | | | | | | | | | This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y. Changes include: commit c7fd9e2b4a6876 ("f2fs: hurry up to issue discard after io interruption") commit 603dde39653d6d ("f2fs: fix to show correct discard_granularity in sysfs") ... commit 565f0225f95f15 ("f2fs: factor out discard command info into discard_cmd_control") commit c4cc29d19eaf01 ("f2fs: remove batched discard in f2fs_trim_fs") Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5 Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
* f2fs: check in-memory nat version bitmapChao Yu2017-10-03
| | | | | | | | | | | commit 599a09b2c1ac222e6aad0c22515d1ccde7c3b702 upstream. This patch adds a mirror for nat version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't cache nat entry if out of memoryChao Yu2017-10-03
| | | | | | | | | | | commit 5c9e418436f3445d7cc4f3ba2964f231a4b33f17 upstream. If we run out of memory, in cache_nat_entry, it's better to avoid loop for allocating memory to cache nat entry, so in low memory scenario, for read path of node block, I expect this can avoid unneeded latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: resolve op and op_flags confilctsJaegeuk Kim2017-10-03
| | | | | | | | commit 70fd76140a6cb63262bd47b68d57b42e889c10ee upstream. This patch backported ("block,fs: use REQ_* flags directly") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to account total free nid correctlyChao Yu2017-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 04d47e673863c637a2b44ad34a558aeb5d0a727e upstream. Thread A Thread B Thread C - f2fs_create - f2fs_new_inode - f2fs_lock_op - alloc_nid alloc last nid - f2fs_unlock_op - f2fs_create - f2fs_new_inode - f2fs_lock_op - alloc_nid as node count still not be increased, we will loop in alloc_nid - f2fs_write_node_pages - f2fs_balance_fs_bg - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all - f2fs_lock_op While creating new inode, we do not allocate and account nid atomically, so that when there is almost no free nids left, we may encounter deadloop like above stack. In order to avoid that, reuse nm_i::available_nids for accounting free nids and make nid allocation and counting being atomical during node creation. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix an infinite loop when flush nodes in cpYunlei He2017-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d40a43af0a57a017eba9ad2679183791587ceb6a upstream. Thread A Thread B - write_checkpoint - block_operations -blk_start_plug -sync_node_pages - f2fs_do_sync_file - fsync_node_pages - f2fs_wait_on_page_writeback Thread A wait for global F2FS_DIRTY_NODES decreased to zero, it start a plug list, some requests have been added to this list. Thread B lock one dirty node page, and wait this page write back. But this page has been in plug list of thread A with PG_writeback flag. Thread A keep on running and its plug list has no chance to finish, so it seems a deadlock between cp and fsync path. This patch add a wait on page write back before set node page dirty to avoid this problem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use BIO_MAX_PAGES for bio allocationJaegeuk Kim2017-09-25
| | | | | | | | commit 664ba972df9b96942191db3068274cc1db899774 upstream. We don't need to allocate bio partially in order to maximize sequential writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: declare static function for __build_free_nidsJaegeuk Kim2017-09-25
| | | | | | | | commit 3e7b5bbbef7f5eb8a19aa61b611c704bf8230937 upstream. This patch avoids build warning. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't interrupt free nids building during nid allocationChao Yu2017-09-25
| | | | | | | | | | | | commit 3a2ad5672bb36ee9c07bab97dadc8b0f70d391f4 upstream. Let build_free_nids support sync/async methods, in allocation flow of nids, we use synchronuous method, so that we can avoid looping in alloc_nid when free memory is low; in unblock_operations and f2fs_balance_fs_bg we use asynchronuous method in where low memory condition can interrupt us. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: clean up free nid list operationsJaegeuk Kim2017-09-25
| | | | | | | | | commit eb0aa4b80784b8551bd5be577024e067bc83ef94 upstream. This patch cleans up to use consistent free nid list ops. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: split free nid listChao Yu2017-09-25
| | | | | | | | | | | | | | | | | | | | | commit b8559dc242d1d47dcf99660a4d6afded727e0cc0 upstream. During free nid allocation, in order to do preallocation, we will tag free nid entry as allocated one and still leave it in free nid list, for other allocators who want to grab free nids, it needs to traverse the free nid list for lookup. It becomes overhead in scenario of allocating free nid intensively by multithreads. This patch splits free nid list to two list: {free,alloc}_nid_list, to keep free nids and preallocated free nids separately, after that, traverse latency will be gone, besides split nid_cnt for separate statistic. Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix sparse warningsEric Biggers2017-09-25
| | | | | | | | | | | | | commit 0c0b471e43e7acf0747c6eb410863bf78c14750d upstream. f2fs contained a number of endianness conversion bugs. Also, one function should have been 'static'. Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/' Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix error handling in fsync_node_pagesChao Yu2017-09-25
| | | | | | | | | | commit 9de69279750e9740bc7221c7051a40c0516a58fb upstream. In fsync_node_pages, if f2fs was taged with CP_ERROR_FLAG, make sure bio cache was flushed before return. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: give a chance to detach from dirty listChao Yu2017-09-25
| | | | | | | | | | | commit 933439c8f3474e329709b715b43b0b8168bbecf8 upstream. If there is no dirty pages in inode, we should give a chance to detach the inode from global dirty list, otherwise it needs to call another unnecessary .writepages for detaching. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: exclude free nids building and allocationChao Yu2017-09-25
| | | | | | | | | | | | | | | | commit 2411cf5befa5804e4ced4c45a3212d7653869286 upstream. During nid allocation, it needs to exclude building and allocating flow of free nids, this is because while building free nid cache, there are two steps: a) load free nids from unused nat entries in NAT pages, b) update free nid cache by checking nat journal. The two steps should be atomical, otherwise an used nid can be allocated as free one after a) and before b). This patch adds missing lock which covers build_free_nids in unlock_operation and f2fs_balance_fs_bg to avoid that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of ↵Jaegeuk Kim2017-09-25
| | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: export ra_nid_pages to sysfsChao Yu2015-10-12
| | | | | | | | | | | | | | After finishing building free nid cache, we will try to readahead asynchronously 4 more pages for the next reloading, the count of readahead nid pages is fixed. In some case, like SMR drive, read less sectors with fixed count each time we trigger RA may be low efficient, since we will face high seeking overhead, so we'd better let user to configure this parameter from sysfs in specific workload. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: readahead for free nids buildingChao Yu2015-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | When there is no free nid in nid cache, all new node allocaters stop their job to wait for reloading of free nids, however reloading is synchronous as we will read 4 NAT pages for building nid cache, it cause the long latency. This patch tries to readahead more NAT pages with READA request flag after reloading of free nids. It helps to improve performance when users allocate node id intensively. Env: Sandisk 32G sd card time for i in `seq 1 60000`; { echo -n > /mnt/f2fs/$i; echo XXXXXX > /mnt/f2fs/$i;} Before: real 0m2.814s user 0m1.220s sys 0m1.536s After: real 0m2.711s user 0m1.136s sys 0m1.568s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support lower priority asynchronous readahead in ra_meta_pagesChao Yu2015-10-12
| | | | | | | | | | | | | | Now, we use ra_meta_pages to reads continuous physical blocks as much as possible to improve performance of following reads. However, ra_meta_pages uses a synchronous readahead approach by submitting bio with READ, as READ is with high priority, it can not be used in the case of preloading blocks, and it's not sure when these RAed pages will be used. This patch supports asynchronous readahead in ra_meta_pages by tagging bio with READA flag in order to allow preloading. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't tag REQ_META for temporary non-meta pagesChao Yu2015-10-12
| | | | | | | | | | | | In recovery or checkpoint flow, we grab pages temperarily in meta inode's mapping for caching temperary data, actually, datas in these pages were not meta data of f2fs, but still we tag them with REQ_META flag. However, lower device like eMMC may do some optimization for data of such type. So in order to avoid wrong optimization, we'd better remove such flag for temperary non-meta pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Revert "f2fs: do not skip dentry block writes"Jaegeuk Kim2015-10-12
| | | | | | | | | The periodic checkpoint can resolve the previous issue. So, now we can use this again to improve the reported performance regression: https://lkml.org/lkml/2015/10/8/20 This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.
* f2fs: do not skip dentry block writesJaegeuk Kim2015-10-09
| | | | | | | | | | | | | | Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory pressure and the number of dirty pages is pretty small. But, we didn't skip for normal data writes, which gives us not much big impact on overall performance. Moreover, by skipping some data writes, kworker falls into infinite loop to try to write blocks, when many dir inodes have only one dentry block. So, this patch removes skipping data writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: cover number of dirty node pages under node_write lockJaegeuk Kim2015-10-09
| | | | | | | This number is referenced by checkpoint under node_write lock. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to release inode correctlyChao Yu2015-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In following call stack, if unfortunately we lose all chances to truncate inode page in remove_inode_page, eventually we will add the nid allocated previously into free nid cache, this nid is with NID_NEW status and with NEW_ADDR in its blkaddr pointer: - f2fs_create - f2fs_add_link - __f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr(, NEW_ADDR) - f2fs_init_acl failed - remove_inode_page failed - handle_failed_inode - remove_inode_page failed - iput - f2fs_evict_inode - remove_inode_page failed - alloc_nid_failed cache a nid with valid blkaddr: NEW_ADDR This may not only cause resource leak of previous inode, but also may cause incorrect use of the previous blkaddr which is located in NO.nid node entry when this nid is reused by others. This patch tries to add this inode to orphan list if we fail to truncate inode, so that we can obtain a second chance to release it in orphan recovery flow. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix wrong pointer access during try_to_free_nidsJaegeuk Kim2015-08-24
| | | | | | | If we release the lock in list_for_each_entry_safe, we can lose the tmp pointer by alloc_nid. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use __GFP_NOFAIL to avoid infinite loopJaegeuk Kim2015-08-24
| | | | | | | | | | __GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and bio_alloc. And, it also fixes the use cases of GFP_ATOMIC correctly. Suggested-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: check the node block address of newly allocated nidJaegeuk Kim2015-08-20
| | | | | | | | | | This patch adds a routine which checks the block address of newly allocated nid. If an nid has already allocated by other thread due to subtle data races, it will result in filesystem corruption. So, it needs to check whether its block address was already allocated or not in prior to nid allocation as the last chance. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: reuse nids more aggressivelyJaegeuk Kim2015-08-20
| | | | | | | | If we can reuse nids as many as possible, we can mitigate producing obsolete node pages in the page cache. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: shrink free_nids entriesChao Yu2015-08-20
| | | | | | | | This patch introduces __count_free_nids/try_to_free_nids and registers them in slab shrinker for shrinking under memory pressure. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix to build free nids from readaheaded nat pagesChao Yu2015-08-05
| | | | | | | | | | | | | | | | | When there is no enough free nids in free nid cache, we will try to readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode, then, reading nat entries in nat page for adding free nids to free nid cache. But when traversing all nat pages we readaheaded in a circulation, our exit condition is not set right, one more nat page will be scanned without readaheading, resulting worse read performance. This patch fixes to read the correct number nat pages to avoid bad performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: callers take care of the page from bio errorJaegeuk Kim2015-08-05
| | | | | | | This patch changes for a caller to handle the page after its bio gets an error. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: shrink nat_cache entriesJaegeuk Kim2015-08-04
| | | | | | | This patch registers shrinking nat_cache entries. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-blockLinus Torvalds2015-06-25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...
| * writeback: move bandwidth related fields from backing_dev_info into ↵Tejun Heo2015-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdi_writeback Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bandwidth related fields from backing_dev_info into bdi_writeback. * The moved fields are: bw_time_stamp, dirtied_stamp, written_stamp, write_bandwidth, avg_write_bandwidth, dirty_ratelimit, balanced_dirty_ratelimit, completions and dirty_exceeded. * writeback_chunk_size() and over_bground_thresh() now take @wb instead of @bdi. * bdi_writeout_fraction(bdi, ...) -> wb_writeout_fraction(wb, ...) bdi_dirty_limit(bdi, ...) -> wb_dirty_limit(wb, ...) bdi_position_ration(bdi, ...) -> wb_position_ratio(wb, ...) bdi_update_writebandwidth(bdi, ...) -> wb_update_write_bandwidth(wb, ...) [__]bdi_update_bandwidth(bdi, ...) -> [__]wb_update_bandwidth(wb, ...) bdi_{max|min}_pause(bdi, ...) -> wb_{max|min}_pause(wb, ...) bdi_dirty_limits(bdi, ...) -> wb_dirty_limits(wb, ...) * Init/exits of the relocated fields are moved to bdi_wb_init/exit() respectively. Note that explicit zeroing is dropped in the process as wb's are cleared in entirety anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[] introducing no behavior changes. v2: Typo in description fixed as suggested by Jan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | f2fs crypto: add encryption support in read/write pathsJaegeuk Kim2015-05-28
| | | | | | | | | | | | | | | | | | | | This patch adds encryption support in read and write paths. Note that, in f2fs, we need to consider cleaning operation. In cleaning procedure, we must avoid encrypting and decrypting written blocks. So, this patch implements move_encrypted_block(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: do not re-lookup nat cache with same nidChao Yu2015-05-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In set_node_addr, we try to lookup cached nat entry of inode and then set flag in it. But previously in this function, we have already grabbed nat entry with current node id, if the node id is the same as the one of inode, we do not need to lookup it in cache again. So this patch adds condition judgment for reducing unneeded lookup. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: add need_dentry_markJaegeuk Kim2015-05-28
| | | | | | | | | | | | | | This patch introduces need_dentry_mark() to clean up and avoid redundant node locks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: add sbi and page pointer in f2fs_io_infoJaegeuk Kim2015-05-28
| | | | | | | | | | | | | | This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure. With this change, we can reduce a lot of parameters for IO functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | f2fs: make has_fsynced_inode staticChao Yu2015-05-07
|/ | | | | | | has_fsynced_inode() has no other caller out of node.c, make it static. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix unlocked nat set cache operationWanpeng Li2015-04-10
| | | | | | | | | | | nm_i->nat_tree_lock is used to sync both the operations of nat entry cache tree and nat set cache tree, however, it isn't held when flush nat entries during checkpoint which lead to potential race, this patch fix it by holding the lock when gang lookup nat set cache and delete item from nat set cache. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: report -ENOENT for unreached data indicesJaegeuk Kim2015-04-10
| | | | | | | | If inode has inline_data, it should report -ENOENT when accessing out-of-bound region. This is used by f2fs_fiemap which treats -ENOENT with no error. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: clear page's up-to-date if block was deallocatedJaegeuk Kim2015-04-10
| | | | | | | If page's on-disk block was deallocated, let's remove up-to-date flag to avoid further access with wrong contents. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add core functions for rb-tree extent cacheChao Yu2015-03-03
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * register rb-based extent cache shrink with mm shrink interface. v2: o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h. o introduce __{attach,detach}_extent_node for code readability. o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert. o fix some coding style and typo issues. v3: o fix oops due to using an unassigned pointer. o use list_del to remove extent node in shrink list. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: add static for some funcitons and declare in f2fs.h] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix accessing wrong indexed data blocksJaegeuk Kim2015-02-11
| | | | | | | | | | | | | | | | | | | | | This patch fixes the following test. This causes: attempt to access beyond end of device sdb2: rw=16384, want=14413962000, limit=16777216 The reason is: - f2fs_write_begin - f2fs_convert_inline_inode returns -ENOSPC - f2fs_write_failed - truncate_blocks - truncate_partial_data_page - find_data_page - get_dnode_of_data returns wrong data index retrieved from inline_data - f2fs_submit_page_bio(wrong data index) - submit_bio(wrong data index) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: check node page contents all the timeJaegeuk Kim2015-02-11
| | | | | | | | | In get_node_page, if the page is up-to-date, we assumed that the page was not reclaimed at all. But, sometimes it was reported that its contents was missing. So, just for sure, let's check its mapping and contents. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: merge {invalidate,release}page for meta/node/data pagesChao Yu2015-02-11
| | | | | | | | | This patch merges ->{invalidate,release}page function for meta/node/data pages. After this, duplication of codes could be removed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: keep PagePrivate during releasepageJaegeuk Kim2015-02-11
| | | | | | | | | | | | | | | | | | | | | | | If PagePrivate is removed by releasepage, f2fs loses counting dirty pages. e.g., try_to_release_page will not release page when the page is dirty, but our releasepage removes PagePrivate. [<ffffffff81188d75>] try_to_release_page+0x35/0x50 [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0 [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs] [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs] [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs] [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170 [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350 [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0 [<ffffffff81203081>] new_sync_write+0x81/0xb0 [<ffffffff81203837>] vfs_write+0xb7/0x1f0 [<ffffffff81204459>] SyS_write+0x49/0xb0 [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: merge flags in struct f2fs_sb_infoChao Yu2015-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>