Merge android-4.4.142 (8ec9fd8) into msm-4.4

* refs/heads/tmp-8ec9fd8 ANDROID: sdcardfs: Check stacked filesystem depth Fix backport of "tcp: detect malicious patterns in tcp_collapse_ofo_queue()" tcp: detect malicious patterns in tcp_collapse_ofo_queue() tcp: avoid collapses in tcp_prune_queue() if possible x86_64_cuttlefish_defconfig: Enable android-verity x86_64_cuttlefish_defconfig: enable verity cert Linux 4.4.142 perf tools: Move syscall number fallbacks from perf-sys.h to tools/arch/x86/include/asm/ x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 Kbuild: fix # escaping in .cmd files for future Make ANDROID: Fix massive cpufreq_times memory leaks ANDROID: Reduce use of #ifdef CONFIG_CPU_FREQ_TIMES UPSTREAM: binder: replace "%p" with "%pK" UPSTREAM: binder: free memory on error UPSTREAM: binder: fix proc->files use-after-free UPSTREAM: Revert "FROMLIST: binder: fix proc->files use-after-free" UPSTREAM: ANDROID: binder: change down_write to down_read UPSTREAM: ANDROID: binder: correct the cmd print for BINDER_WORK_RETURN_ERROR UPSTREAM: ANDROID: binder: remove 32-bit binder interface. UPSTREAM: ANDROID: binder: re-order some conditions UPSTREAM: android: binder: use VM_ALLOC to get vm area UPSTREAM: android: binder: Use true and false for boolean values UPSTREAM: android: binder: Use octal permissions UPSTREAM: android: binder: Prefer __func__ to using hardcoded function name UPSTREAM: ANDROID: binder: make binder_alloc_new_buf_locked static and indent its arguments UPSTREAM: android: binder: Check for errors in binder_alloc_shrinker_init(). treewide: Use array_size in f2fs_kvzalloc() treewide: Use array_size() in f2fs_kzalloc() treewide: Use array_size() in f2fs_kmalloc() overflow.h: Add allocation size calculation helpers f2fs: fix to clear FI_VOLATILE_FILE correctly f2fs: let sync node IO interrupt async one f2fs: don't change wbc->sync_mode f2fs: fix to update mtime correctly fs: f2fs: insert space around that ':' and ', ' fs: f2fs: add missing blank lines after declarations fs: f2fs: changed variable type of offset "unsigned" to "loff_t" f2fs: clean up symbol namespace f2fs: make set_de_type() static f2fs: make __f2fs_write_data_pages() static f2fs: fix to avoid accessing cross the boundary f2fs: fix to let caller retry allocating block address disable loading f2fs module on PAGE_SIZE > 4KB f2fs: fix error path of move_data_page f2fs: don't drop dentry pages after fs shutdown f2fs: fix to avoid race during access gc_thread pointer f2fs: clean up with clear_radix_tree_dirty_tag f2fs: fix to don't trigger writeback during recovery f2fs: clear discard_wake earlier f2fs: let discard thread wait a little longer if dev is busy f2fs: avoid stucking GC due to atomic write f2fs: introduce sbi->gc_mode to determine the policy f2fs: keep migration IO order in LFS mode f2fs: fix to wait page writeback during revoking atomic write f2fs: Fix deadlock in shutdown ioctl f2fs: detect synchronous writeback more earlier mm: remove nr_pages argument from pagevec_lookup_{,range}_tag() ceph: use pagevec_lookup_range_nr_tag() mm: add variant of pagevec_lookup_range_tag() taking number of pages mm: use pagevec_lookup_range_tag() in write_cache_pages() mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range() nilfs2: use pagevec_lookup_range_tag() gfs2: use pagevec_lookup_range_tag() f2fs: use find_get_pages_tag() for looking up single page f2fs: simplify page iteration loops f2fs: use pagevec_lookup_range_tag() ext4: use pagevec_lookup_range_tag() ceph: use pagevec_lookup_range_tag() btrfs: use pagevec_lookup_range_tag() mm: implement find_get_pages_range_tag() f2fs: clean up with is_valid_blkaddr() f2fs: fix to initialize min_mtime with ULLONG_MAX f2fs: fix to let checkpoint guarantee atomic page persistence f2fs: fix to initialize i_current_depth according to inode type Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc" f2fs: don't drop any page on f2fs_cp_error() case f2fs: fix spelling mistake: "extenstion" -> "extension" f2fs: enhance sanity_check_raw_super() to avoid potential overflows f2fs: treat volatile file's data as hot one f2fs: introduce release_discard_addr() for cleanup f2fs: fix potential overflow f2fs: rename dio_rwsem to i_gc_rwsem f2fs: move mnt_want_write_file after range check f2fs: fix missing clear FI_NO_PREALLOC in some error case f2fs: enforce fsync_mode=strict for renamed directory f2fs: sanity check for total valid node blocks f2fs: sanity check on sit entry f2fs: avoid bug_on on corrupted inode f2fs: give message and set need_fsck given broken node id f2fs: clean up commit_inmem_pages() f2fs: do not check F2FS_INLINE_DOTS in recover f2fs: remove duplicated dquot_initialize and fix error handling f2fs: stop issue discard if something wrong with f2fs f2fs: fix return value in f2fs_ioc_commit_atomic_write f2fs: allocate hot_data for atomic write more strictly f2fs: check if inmem_pages list is empty correctly f2fs: fix race in between GC and atomic open f2fs: change le32 to le16 of f2fs_inode->i_extra_size f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries f2fs: correct return value of f2fs_trim_fs f2fs: fix to show missing bits in FS_IOC_GETFLAGS f2fs: remove unneeded F2FS_PROJINHERIT_FL f2fs: don't use GFP_ZERO for page caches f2fs: issue all big range discards in umount process f2fs: remove redundant block plug f2fs: remove unmatched zero_user_segment when convert inline dentry f2fs: introduce private inode status mapping fscrypt: log the crypto algorithm implementations crypto: api - Add crypto_type_has_alg helper crypto: skcipher - Add low-level skcipher interface crypto: skcipher - Add helper to retrieve driver name crypto: skcipher - Add default key size helper fscrypt: add Speck128/256 support fscrypt: only derive the needed portion of the key fscrypt: separate key lookup from key derivation fscrypt: use a common logging function fscrypt: remove internal key size constants fscrypt: remove unnecessary check for non-logon key type fscrypt: make fscrypt_operations.max_namelen an integer fscrypt: drop empty name check from fname_decrypt() fscrypt: drop max_namelen check from fname_decrypt() fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info() fscrypt: don't clear flags on crypto transform fscrypt: remove stale comment from fscrypt_d_revalidate() fscrypt: remove error messages for skcipher_request_alloc() failure fscrypt: remove unnecessary NULL check when allocating skcipher fscrypt: clean up after fscrypt_prepare_lookup() conversions fscrypt: use unbound workqueue for decryption f2fs: run fstrim asynchronously if runtime discard is on f2fs: turn down IO priority of discard from background f2fs: don't split checkpoint in fstrim f2fs: issue discard commands proactively in high fs utilization f2fs: add fsync_mode=nobarrier for non-atomic files f2fs: let fstrim issue discard commands in lower priority f2fs: avoid fsync() failure caused by EAGAIN in writepage() f2fs: clear PageError on writepage - part 2 f2fs: check cap_resource only for data blocks Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer" f2fs: clear PageError on writepage f2fs: call unlock_new_inode() before d_instantiate() f2fs: refactor read path to allow multiple postprocessing steps fscrypt: allow synchronous bio decryption f2fs: remain written times to update inode during fsync f2fs: make assignment of t->dentry_bitmap more readable f2fs: truncate preallocated blocks in error case f2fs: fix a wrong condition in f2fs_skip_inode_update f2fs: reserve bits for fs-verity f2fs: Add a segment type check in inplace write f2fs: no need to initialize zero value for GFP_F2FS_ZERO f2fs: don't track new nat entry in nat set f2fs: clean up with F2FS_BLK_ALIGN f2fs: check blkaddr more accuratly before issue a bio f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read f2fs: introduce a new mount option test_dummy_encryption f2fs: introduce F2FS_FEATURE_LOST_FOUND feature f2fs: release locks before return in f2fs_ioc_gc_range() f2fs: align memory boundary for bitops f2fs: remove unneeded set_cold_node() f2fs: add nowait aio support f2fs: wrap all options with f2fs_sb_info.mount_opt f2fs: Don't overwrite all types of node to keep node chain f2fs: introduce mount option for fsync mode f2fs: fix to restore old mount option in ->remount_fs f2fs: wrap sb_rdonly with f2fs_readonly f2fs: avoid selinux denial on CAP_SYS_RESOURCE f2fs: support hot file extension f2fs: fix to avoid race in between atomic write and background GC f2fs: do gc in greedy mode for whole range if gc_urgent mode is set f2fs: issue discard aggressively in the gc_urgent mode f2fs: set readdir_ra by default f2fs: add auto tuning for small devices f2fs: add mount option for segment allocation policy f2fs: don't stop GC if GC is contended f2fs: expose extension_list sysfs entry f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range f2fs: introduce sb_lock to make encrypt pwsalt update exclusive f2fs: remove redundant initialization of pointer 'p' f2fs: flush cp pack except cp pack 2 page at first f2fs: clean up f2fs_sb_has_xxx functions f2fs: remove redundant check of page type when submit bio f2fs: fix to handle looped node chain during recovery f2fs: handle quota for orphan inodes f2fs: support passing down write hints to block layer with F2FS policy f2fs: support passing down write hints given by users to block layer f2fs: fix to clear CP_TRIMMED_FLAG f2fs: support large nat bitmap f2fs: fix to check extent cache in f2fs_drop_extent_tree f2fs: restrict inline_xattr_size configuration f2fs: fix heap mode to reset it back f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET fscrypt: fix build with pre-4.6 gcc versions fscrypt: fix up fscrypt_fname_encrypted_size() for internal use fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names fscrypt: calculate NUL-padding length in one place only fscrypt: move fscrypt_symlink_data to fscrypt_private.h fscrypt: remove fscrypt_fname_usr_to_disk() f2fs: switch to fscrypt_get_symlink() f2fs: switch to fscrypt ->symlink() helper functions fscrypt: new helper function - fscrypt_get_symlink() fscrypt: new helper functions for ->symlink() fscrypt: trim down fscrypt.h includes fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h fscrypt: move fscrypt_operations declaration to fscrypt_supp.h fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h fscrypt: move fscrypt_control_page() to supp/notsupp headers fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers f2fs: don't put dentry page in pagecache into highmem f2fs: support inode creation time f2fs: rebuild sit page from sit info in mem f2fs: stop issuing discard if fs is readonly f2fs: clean up duplicated assignment in init_discard_policy f2fs: use GFP_F2FS_ZERO for cleanup f2fs: allow to recover node blocks given updated checkpoint f2fs: recover some i_inline flags f2fs: correct removexattr behavior for null valued extended attribute f2fs: drop page cache after fs shutdown f2fs: stop gc/discard thread after fs shutdown f2fs: hanlde error case in f2fs_ioc_shutdown f2fs: split need_inplace_update f2fs: fix to update last_disk_size correctly f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup f2fs: clean up error path of fill_super f2fs: avoid hungtask when GC encrypted block if io_bits is set f2fs: allow quota to use reserved blocks f2fs: fix to drop all inmem pages correctly f2fs: speed up defragment on sparse file f2fs: support F2FS_IOC_PRECACHE_EXTENTS f2fs: add an ioctl to disable GC for specific file f2fs: prevent newly created inode from being dirtied incorrectly f2fs: support FIEMAP_FLAG_XATTR f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock f2fs: check node page again in write end io f2fs: fix to caclulate required free section correctly f2fs: handle newly created page when revoking inmem pages f2fs: add resgid and resuid to reserve root blocks f2fs: implement cgroup writeback support f2fs: remove unused pend_list_tag f2fs: avoid high cpu usage in discard thread f2fs: make local functions static f2fs: add reserved blocks for root user f2fs: check segment type in __f2fs_replace_block f2fs: update inode info to inode page for new file f2fs: show precise # of blocks that user/root can use f2fs: clean up unneeded declaration f2fs: continue to do direct IO if we only preallocate partial blocks f2fs: enable quota at remount from r to w f2fs: skip stop_checkpoint for user data writes f2fs: fix missing error number for xattr operation f2fs: recover directory operations by fsync f2fs: return error during fill_super f2fs: fix an error case of missing update inode page f2fs: fix potential hangtask in f2fs_trace_pid f2fs: no need return value in restore summary process f2fs: use unlikely for release case f2fs: don't return value in truncate_data_blocks_range f2fs: clean up f2fs_map_blocks f2fs: clean up hash codes f2fs: fix error handling in fill_super f2fs: spread f2fs_k{m,z}alloc f2fs: inject fault to kvmalloc f2fs: inject fault to kzalloc f2fs: remove a redundant conditional expression f2fs: apply write hints to select the type of segment for direct write f2fs: switch to fscrypt_prepare_setattr() f2fs: switch to fscrypt_prepare_lookup() f2fs: switch to fscrypt_prepare_rename() f2fs: switch to fscrypt_prepare_link() f2fs: switch to fscrypt_file_open() f2fs: remove repeated f2fs_bug_on f2fs: remove an excess variable f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem f2fs: remove unused parameter f2fs: still write data if preallocate only partial blocks f2fs: introduce sysfs readdir_ra to readahead inode block in readdir f2fs: fix concurrent problem for updating free bitmap f2fs: remove unneeded memory footprint accounting f2fs: no need to read nat block if nat_block_bitmap is set f2fs: reserve nid resource for quota sysfile fscrypt: resolve some cherry-pick bugs fscrypt: move to generic async completion crypto: introduce crypto wait for async op fscrypt: lock mutex before checking for bounce page pool fscrypt: new helper function - fscrypt_prepare_setattr() fscrypt: new helper function - fscrypt_prepare_lookup() fscrypt: new helper function - fscrypt_prepare_rename() fscrypt: new helper function - fscrypt_prepare_link() fscrypt: new helper function - fscrypt_file_open() fscrypt: new helper function - fscrypt_require_key() fscrypt: remove unneeded empty fscrypt_operations structs fscrypt: remove ->is_encrypted() fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED() fs, fscrypt: add an S_ENCRYPTED inode flag fscrypt: clean up include file mess fscrypt: fix dereference of NULL user_key_payload fscrypt: make ->dummy_context() return bool f2fs: deny accessing encryption policy if encryption is off f2fs: inject fault in inc_valid_node_count f2fs: fix to clear FI_NO_PREALLOC f2fs: expose quota information in debugfs f2fs: separate nat entry mem alloc from nat_tree_lock f2fs: validate before set/clear free nat bitmap f2fs: avoid opened loop codes in __add_ino_entry f2fs: apply write hints to select the type of segments for buffered write f2fs: introduce scan_curseg_cache for cleanup f2fs: optimize the way of traversing free_nid_bitmap f2fs: keep scanning until enough free nids are acquired f2fs: trace checkpoint reason in fsync() f2fs: keep isize once block is reserved cross EOF f2fs: avoid race in between GC and block exchange f2fs: save a multiplication for last_nid calculation f2fs: fix summary info corruption f2fs: remove dead code in update_meta_page f2fs: remove unneeded semicolon f2fs: don't bother with inode->i_version f2fs: check curseg space before foreground GC f2fs: use rw_semaphore to protect SIT cache f2fs: support quota sys files f2fs: add quota_ino feature infra f2fs: optimize __update_nat_bits f2fs: modify for accurate fggc node io stat Revert "f2fs: handle dirty segments inside refresh_sit_entry" f2fs: add a function to move nid f2fs: export SSR allocation threshold f2fs: give correct trimmed blocks in fstrim f2fs: support bio allocation error injection f2fs: support get_page error injection f2fs: add missing sysfs description f2fs: support soft block reservation f2fs: handle error case when adding xattr entry f2fs: support flexible inline xattr size f2fs: show current cp state f2fs: add missing quota_initialize f2fs: show # of dirty segments via sysfs f2fs: stop all the operations by cp_error flag f2fs: remove several redundant assignments f2fs: avoid using timespec f2fs: fix to correct no_fggc_candidate Revert "f2fs: return wrong error number on f2fs_quota_write" f2fs: remove obsolete pointer for truncate_xattr_node f2fs: retry ENOMEM for quota_read|write f2fs: limit # of inmemory pages f2fs: update ctx->pos correctly when hitting hole in directory f2fs: relocate readahead codes in readdir() f2fs: allow readdir() to be interrupted f2fs: trace f2fs_readdir f2fs: trace f2fs_lookup f2fs: skip searching non-exist range in truncate_hole f2fs: expose some sectors to user in inline data or dentry case f2fs: avoid stale fi->gdirty_list pointer f2fs/crypto: drop crypto key at evict_inode only f2fs: fix to avoid race when accessing last_disk_size f2fs: Fix bool initialization/comparison f2fs: give up CP_TRIMMED_FLAG if it drops discards f2fs: trace f2fs_remove_discard f2fs: reduce cmd_lock coverage in __issue_discard_cmd f2fs: split discard policy f2fs: wrap discard policy f2fs: support issuing/waiting discard in range f2fs: fix to flush multiple device in checkpoint f2fs: enhance multiple device flush f2fs: fix to show ino management cache size correctly f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush f2fs: obsolete ALLOC_NID_LIST list f2fs: convert inline data for direct I/O & FI_NO_PREALLOC f2fs: allow readpages with NULL file pointer f2fs: show flush list status in sysfs f2fs: introduce read_xattr_block f2fs: introduce read_inline_xattr Revert "f2fs: reuse nids more aggressively" Revert "f2fs: node segment is prior to data segment selected victim" f2fs: fix potential panic during fstrim f2fs: hurry up to issue discard after io interruption f2fs: fix to show correct discard_granularity in sysfs f2fs: detect dirty inode in evict_inode f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared f2fs: speed up gc_urgent mode with SSR f2fs: better to wait for fstrim completion f2fs: avoid race in between read xattr & write xattr f2fs: make get_lock_data_page to handle encrypted inode f2fs: use generic terms used for encrypted block management f2fs: introduce f2fs_encrypted_file for clean-up Revert "f2fs: add a new function get_ssr_cost" f2fs: constify super_operations f2fs: fix to wake up all sleeping flusher f2fs: avoid race in between atomic_read & atomic_inc f2fs: remove unneeded parameter of change_curseg f2fs: update i_flags correctly f2fs: don't check inode's checksum if it was dirtied or writebacked f2fs: don't need to update inode checksum for recovery f2fs: trigger fdatasync for non-atomic_write file f2fs: fix to avoid race in between aio and gc f2fs: wake up discard_thread iff there is a candidate f2fs: return error when accessing insane flie offset f2fs: trigger normal fsync for non-atomic_write file f2fs: clear FI_HOT_DATA correctly f2fs: fix out-of-order execution in f2fs_issue_flush f2fs: issue discard commands if gc_urgent is set f2fs: introduce discard_granularity sysfs entry f2fs: remove unused function overprovision_sections f2fs: check hot_data for roll-forward recovery f2fs: add tracepoint for f2fs_gc f2fs: retry to revoke atomic commit in -ENOMEM case f2fs: let fill_super handle roll-forward errors f2fs: merge equivalent flags F2FS_GET_BLOCK_[READ|DIO] f2fs: support journalled quota f2fs: fix potential overflow when adjusting GC cycle f2fs: avoid unneeded sync on quota file f2fs: introduce gc_urgent mode for background GC f2fs: use IPU for cold files f2fs: fix the size value in __check_sit_bitmap f2fs: add app/fs io stat f2fs: do not change the valid_block value if cur_valid_map was wrongly set or cleared f2fs: update cur_valid_map_mir together with cur_valid_map f2fs: use printk_ratelimited for f2fs_msg f2fs: expose features to sysfs entry f2fs: support inode checksum f2fs: return wrong error number on f2fs_quota_write f2fs: provide f2fs_balance_fs to __write_node_page f2fs: introduce f2fs_statfs_project f2fs: don't need to wait for node writes for atomic write f2fs: avoid naming confusion of sysfs init f2fs: support project quota f2fs: record quota during dot{,dot} recovery f2fs: enhance on-disk inode structure scalability f2fs: make max inline size changeable f2fs: add ioctl to expose current features f2fs: make background threads of f2fs being aware of freezing f2fs: don't give partially written atomic data from process crash f2fs: give a try to do atomic write in -ENOMEM case f2fs: preserve i_mode if __f2fs_set_acl() fails f2fs: alloc new nids for xattr block in recovery f2fs: spread struct f2fs_dentry_ptr for inline path f2fs: remove unused input parameter f2fs: avoid cpu lockup f2fs: include seq_file.h for sysfs.c f2fs: Don't clear SGID when inheriting ACLs f2fs: remove extra inode_unlock() in error path fscrypt: add support for AES-128-CBC fscrypt: inline fscrypt_free_filename() f2fs: make more close to v4.13-rc1 f2fs: support plain user/group quota f2fs: avoid deadlock caused by lock order of page and lock_op f2fs: use spin_{,un}lock_irq{save,restore} f2fs: relax migratepage for atomic written page f2fs: don't count inode block in in-memory inode.i_blocks Revert "f2fs: fix to clean previous mount option when remount_fs" f2fs: do not set LOST_PINO for renamed dir f2fs: do not set LOST_PINO for newly created dir f2fs: skip ->writepages for {mete,node}_inode during recovery f2fs: introduce __check_sit_bitmap f2fs: stop gc/discard thread in prior during umount f2fs: introduce reserved_blocks in sysfs f2fs: avoid redundant f2fs_flush after remount f2fs: report # of free inodes more precisely f2fs: add ioctl to do gc with target block address f2fs: don't need to check encrypted inode for partial truncation f2fs: measure inode.i_blocks as generic filesystem f2fs: set CP_TRIMMED_FLAG correctly f2fs: require key for truncate(2) of encrypted file f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c f2fs: clean up sysfs codes f2fs: fix wrong error number of fill_super f2fs: fix to show injection rate in ->show_options f2fs: Fix a return value in case of error in 'f2fs_fill_super' f2fs: use proper variable name f2fs: fix to avoid panic when encountering corrupt node f2fs: don't track newly allocated nat entry in list f2fs: add f2fs_bug_on in __remove_discard_cmd f2fs: introduce __wait_one_discard_bio f2fs: dax: fix races between page faults and truncating pages f2fs: simplify the way of calulating next nat address f2fs: sanity check size of nat and sit cache f2fs: fix a panic caused by NULL flush_cmd_control f2fs: remove the unnecessary cast for PTR_ERR f2fs: remove false-positive bug_on f2fs: Do not issue small discards in LFS mode f2fs: don't bother checking for encryption key in ->write_iter() f2fs: don't bother checking for encryption key in ->mmap() f2fs: wait discard IO completion without cmd_lock held f2fs: wake up all waiters in f2fs_submit_discard_endio f2fs: show more info if fail to issue discard f2fs: introduce io_list for serialize data/node IOs f2fs: split wio_mutex f2fs: combine huge num of discard rb tree consistence checks f2fs: fix a bug caused by NULL extent tree f2fs: try to freeze in gc and discard threads f2fs: add a new function get_ssr_cost f2fs: declare load_free_nid_bitmap static f2fs: avoid f2fs_lock_op for IPU writes f2fs: split bio cache f2fs: use fio instead of multiple parameters f2fs: remove unnecessary read cases in merged IO flow f2fs: use f2fs_submit_page_bio for ra_meta_pages f2fs: make sure f2fs_gc returns consistent errno f2fs: load inode's flag from disk f2fs: sanity check checkpoint segno and blkoff f2fs, block_dump: give WRITE direction to submit_bio fscrypt: correct collision claim for digested names f2fs: switch to using fscrypt_match_name() fscrypt: introduce helper function for filename matching fscrypt: fix context consistency check when key(s) unavailable fscrypt: Move key structure and constants to uapi fscrypt: remove fscrypt_symlink_data_len() fscrypt: remove unnecessary checks for NULL operations fscrypt: eliminate ->prepare_context() operation fscrypt: remove broken support for detecting keyring key revocation fscrypt: avoid collisions when presenting long encrypted filenames f2fs: check entire encrypted bigname when finding a dentry f2fs: sync f2fs_lookup() with ext4_lookup() f2fs: fix a mount fail for wrong next_scan_nid f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS f2fs: show available_nids in f2fs/status f2fs: flush dirty nats periodically f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard f2fs: allow cpc->reason to indicate more than one reason f2fs: release cp and dnode lock before IPU f2fs: shrink size of struct discard_cmd f2fs: don't hold cmd_lock during waiting discard command f2fs: nullify fio->encrypted_page for each writes f2fs: sanity check segment count f2fs: introduce valid_ipu_blkaddr to clean up f2fs: lookup extent cache first under IPU scenario f2fs: reconstruct code to write a data page f2fs: introduce __wait_discard_cmd f2fs: introduce __issue_discard_cmd f2fs: enable small discard by default f2fs: delay awaking discard thread f2fs: seperate read nat page from nat_tree_lock f2fs: fix multiple f2fs_add_link() having same name for inline dentry f2fs: skip encrypted inode in ASYNC IPU policy f2fs: fix out-of free segments f2fs: improve definition of statistic macros f2fs: assign allocation hint for warm/cold data f2fs: fix _IOW usage f2fs: add ioctl to flush data from faster device to cold area f2fs: introduce async IPU policy f2fs: add undiscard blocks stat f2fs: unlock cp_rwsem early for IPU writes f2fs: introduce __check_rb_tree_consistence f2fs: trace __submit_discard_cmd f2fs: in prior to issue big discard f2fs: clean up discard_cmd_control structure f2fs: use rb-tree to track pending discard commands f2fs: avoid dirty node pages in check_only recovery f2fs: fix not to set fsync/dentry mark f2fs: allocate hot_data for atomic writes f2fs: give time to flush dirty pages for checkpoint f2fs: fix fs corruption due to zero inode page f2fs: shrink blk plug region f2fs: extract rb-tree operation infrastructure f2fs: avoid frequent checkpoint during f2fs_gc f2fs: clean up some macros in terms of GET_SEGNO f2fs: clean up get_valid_blocks with consistent parameter f2fs: use segment number for get_valid_blocks f2fs: guard macro variables with braces f2fs: fix comment on f2fs_flush_merged_bios() after 86531d6b f2fs: prevent waiter encountering incorrect discard states f2fs: introduce f2fs_wait_discard_bios f2fs: split discard_cmd_list Revert "f2fs: put allocate_segment after refresh_sit_entry" f2fs: split make_dentry_ptr() into block and inline versions f2fs: submit bio of in-place-update pages f2fs: remove the redundant variable definition f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE f2fs: write small sized IO to hot log f2fs: use bitmap in discard_entry f2fs: clean up destroy_discard_cmd_control f2fs: count discard command entry f2fs: show issued flush/discard count f2fs: relax node version check for victim data in gc f2fs: start SSR much eariler to avoid FG_GC f2fs: allocate node and hot data in the beginning of partition f2fs: fix wrong max cost initialization f2fs: allow write page cache when writting cp f2fs: don't reserve additional space in xattr block f2fs: clean up xattr operation f2fs: don't track volatile file in dirty inode list f2fs: show the max number of volatile operations f2fs: fix race condition in between free nid allocator/initializer f2fs: use set_page_private marcro in f2fs_trace_pid f2fs: fix recording invalid last_victim f2fs: more reasonable mem_size calculating of ino_entry f2fs: calculate the f2fs_stat_info into base_mem f2fs: avoid stat_inc_atomic_write for non-atomic file f2fs: sanity check of crc_offset from raw checkpoint f2fs: cleanup the disk level filename updating f2fs: cover update_free_nid_bitmap with nid_list_lock f2fs: fix bad prefetchw of NULL page f2fs: clear FI_DATA_EXIST flag in truncate_inline_inode f2fs: move mnt_want_write_file after arguments checking f2fs: check new size by inode_newsize_ok in f2fs_insert_range f2fs: avoid copy date to user-space if move file range fail f2fs: drop duplicate new_size assign in f2fs_zero_range f2fs: adjust the way of calculating nat block f2fs: add fault injection on f2fs_truncate f2fs: check range before defragment f2fs: use parameter max_items instead of PIDVEC_SIZE f2fs: add a punch discard command function f2fs: allocate a bio for discarding when actually issuing it f2fs: skip writeback meta pages if cp_mutex acquire failed f2fs: show more precise message on orphan recovery failure f2fs: remove dead macro PGOFS_OF_NEXT_DNODE f2fs: drop duplicate radix tree lookup of nat_entry_set f2fs: make sure trace all f2fs_issue_flush f2fs: don't allow volatile writes for non-regular file f2fs: don't allow atomic writes for not regular files f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer f2fs: build stat_info before orphan inode recovery f2fs: fix the fault of calculating blkstart twice f2fs: fix the fault of checking F2FS_LINK_MAX for rename inode f2fs: don't allow to get pino when filename is encrypted f2fs: fix wrong error injection for evict_inode f2fs: le32_to_cpu for ckpt->cp_pack_total_block_count f2fs: le16_to_cpu for xattr->e_value_size f2fs: don't need to invalidate wrong node page f2fs: fix an error return value in truncate_partial_data_page f2fs: combine nat_bits and free_nid_bitmap cache f2fs: skip scanning free nid bitmap of full NAT blocks f2fs: use __set{__clear}_bit_le f2fs: update_free_nid_bitmap() can be static f2fs: __update_nat_bits() can be static f2fs: le16_to_cpu for xattr->e_value_size f2fs: don't overwrite node block by SSR f2fs: don't need to invalidate wrong node page f2fs: fix an error return value in truncate_partial_data_page fscrypt: catch up to v4.11-rc1 f2fs: avoid to flush nat journal entries f2fs: avoid to issue redundant discard commands f2fs: fix a plint compile warning f2fs: add f2fs_drop_inode tracepoint f2fs: Fix zoned block device support f2fs: remove redundant set_page_dirty() f2fs: fix to enlarge size of write_io_dummy mempool f2fs: fix memory leak of write_io_dummy mempool during umount f2fs: fix to update F2FS_{CP_}WB_DATA count correctly f2fs: use MAX_FREE_NIDS for the free nids target f2fs: introduce free nid bitmap f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint f2fs: update the comment of default nr_pages to skipping f2fs: drop the duplicate pval in f2fs_getxattr f2fs: Don't update the xattr data that same as the exist f2fs: kill __is_extent_same f2fs: avoid bggc->fggc when enough free segments are avaliable after cp f2fs: select target segment with closer temperature in SSR mode f2fs: show simple call stack in fault injection message fscrypt: catch fscrypto_get_policy in v4.10-rc6 f2fs: use __clear_bit_le f2fs: no need lock_op in f2fs_write_inline_data f2fs: add bitmaps for empty or full NAT blocks f2fs: replace rw semaphore extent_tree_lock with mutex lock f2fs: avoid m_flags overlay when allocating more data blocks f2fs: remove unsafe bitmap checking f2fs: init local extent_info to avoid stale stack info in tp f2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc f2fs: do SSR for node segments more aggresively f2fs: check discard alignment only for SEQWRITE zones f2fs: wait for discard completion after submission f2fs: much larger batched trim_fs job f2fs: avoid very large discard command f2fs: find data segments across all the types f2fs: do SSR in higher priority f2fs: do SSR for data when there is enough free space f2fs: node segment is prior to data segment selected victim f2fs: put allocate_segment after refresh_sit_entry f2fs: add ovp valid_blocks check for bg gc victim to fg_gc f2fs: do not wait for writeback in write_begin f2fs: replace __get_victim by dirty_segments in FG_GC f2fs: fix multiple f2fs_add_link() calls having same name f2fs: show actual device info in tracepoints f2fs: use SSR for warm node as well f2fs: enable inline_xattr by default f2fs: introduce noinline_xattr mount option f2fs: avoid reading NAT page by get_node_info f2fs: remove build_free_nids() during checkpoint f2fs: change recovery policy of xattr node block f2fs: super: constify fscrypt_operations structure f2fs: show checkpoint version at mount time f2fs: remove preflush for nobarrier case f2fs: check last page index in cached bio to decide submission f2fs: check io submission more precisely f2fs: fix trim_fs assignment Revert "f2fs: remove batched discard in f2fs_trim_fs" f2fs: fix missing bio_alloc(1) f2fs: call internal __write_data_page directly f2fs: avoid out-of-order execution of atomic writes f2fs: move write_node_page above fsync_node_pages f2fs: move flush tracepoint f2fs: show # of APPEND and UPDATE inodes f2fs: fix 446 coding style warnings in f2fs.h f2fs: fix 3 coding style errors in f2fs.h f2fs: declare missing static function f2fs: show the fault injection mount option f2fs: fix null pointer dereference when issuing flush in ->fsync f2fs: fix to avoid overflow when left shifting page offset f2fs: enhance lookup xattr f2fs: fix a dead loop in f2fs_fiemap() f2fs: do not preallocate blocks which has wrong buffer f2fs: show # of on-going flush and discard bios f2fs: add a kernel thread to issue discard commands asynchronously f2fs: factor out discard command info into discard_cmd_control f2fs: remove batched discard in f2fs_trim_fs f2fs: reorganize stat information f2fs: clean up flush/discard command namings f2fs: check in-memory sit version bitmap f2fs: check in-memory nat version bitmap f2fs: check in-memory block bitmap f2fs: introduce FI_ATOMIC_COMMIT f2fs: clean up with list_{first, last}_entry f2fs: return fs_trim if there is no candidate f2fs: avoid needless checkpoint in f2fs_trim_fs f2fs: relax async discard commands more f2fs: drop exist_data for inline_data when truncated to 0 f2fs: don't allow encrypted operations without keys f2fs: show the max number of atomic operations f2fs: get io size bit from mount option f2fs: support IO alignment for DATA and NODE writes f2fs: add submit_bio tracepoint f2fs: reassign new segment for mode=lfs f2fs: fix a missing discard prefree segments f2fs: use rb_entry_safe f2fs: add a case of no need to read a page in write begin f2fs: fix a problem of using memory after free f2fs: remove unneeded condition f2fs: don't cache nat entry if out of memory f2fs: remove unused values in recover_fsync_data f2fs: support async discard based on v4.9 f2fs: resolve op and op_flags confilcts f2fs: remove wrong backported codes f2fs: fix a missing size change in f2fs_setattr fs/super.c: fix race between freeze_super() and thaw_super() scripts/tags.sh: catch 4.9-rc6 f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting f2fs: avoid BG_GC in f2fs_balance_fs f2fs: fix redundant block allocation f2fs: use err for f2fs_preallocate_blocks f2fs: support multiple devices f2fs: allow dio read for LFS mode f2fs: revert segment allocation for direct IO f2fs: return directly if block has been removed from the victim Revert "f2fs: do not recover from previous remained wrong dnodes" f2fs: remove checkpoint in f2fs_freeze f2fs: assign segments correctly for direct_io f2fs: fix wrong i_atime recovery f2fs: record inode updating status correctly f2fs: Trace reset zone events f2fs: Reset sequential zones on zoned block devices f2fs: Cache zoned block devices zone type f2fs: Do not allow adaptive mode for host-managed zoned block devices f2fs: Always enable discard for zoned blocks devices f2fs: Suppress discard warning message for zoned block devices f2fs: Check zoned block feature for host-managed zoned block devices f2fs: Use generic zoned block device terminology f2fs: Add missing break in switch-case f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes f2fs: report error of f2fs_fill_dentries fs/crypto: catch up 4.9-rc6 f2fs: hide a maybe-uninitialized warning f2fs: remove percpu_count due to performance regression f2fs: make clean inodes when flushing inode page f2fs: keep dirty inodes selectively for checkpoint f2fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps f2fs: use BIO_MAX_PAGES for bio allocation f2fs: declare static function for __build_free_nids f2fs: call f2fs_balance_fs for setattr f2fs: count dirty inodes to flush node pages during checkpoint f2fs: avoid casted negative value as shrink count f2fs: don't interrupt free nids building during nid allocation f2fs: clean up free nid list operations f2fs: split free nid list f2fs: clear nlink if fail to add_link f2fs: fix sparse warnings f2fs: fix error handling in fsync_node_pages f2fs: fix to update largest extent under lock f2fs: be aware of extent beyond EOF in fiemap f2fs: don't miss any f2fs_balance_fs cases f2fs: add missing f2fs_balance_fs in f2fs_zero_range f2fs: give a chance to detach from dirty list f2fs: fix to release discard entries during checkpoint f2fs: exclude free nids building and allocation f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix overflow due to condition check order posix_acl: Clear SGID bit when setting file permissions f2fs: fix wrong sum_page pointer in f2fs_gc f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs) Change-Id: I6c7208efc63ce7b13f26f0ec1cd3c8aef410eff0 Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org> Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
author: Srinivasarao P <spathi@codeaurora.org> 2018-08-02 10:10:30 +0530
committer: Srinivasarao P <spathi@codeaurora.org> 2018-08-03 16:59:20 +0530
commit: c2e09fadec5ce348e125150e66a9a32b4af44756 (patch)
tree: 652cf573762608aecdb28c230363166a03a24f39 /Documentation/filesystems
parent: 414b079b7f4ba955f4679cd87be64199b4e875c7 (diff)
parent: 8ec9fd8936b20ca2d18160f8b18acb4b732c2771 (diff)
1 files changed, 626 insertions, 0 deletions
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
new file mode 100644
index 000000000000..48b424de85bb
--- /dev/null
+++ b/Documentation/filesystems/fscrypt.rst
@@ -0,0 +1,626 @@
+=====================================
+Filesystem-level encryption (fscrypt)
+=====================================
+
+Introduction
+============
+
+fscrypt is a library which filesystems can hook into to support
+transparent encryption of files and directories.
+
+Note: "fscrypt" in this document refers to the kernel-level portion,
+implemented in ``fs/crypto/``, as opposed to the userspace tool
+`fscrypt <https://github.com/google/fscrypt>`_.  This document only
+covers the kernel-level portion.  For command-line examples of how to
+use encryption, see the documentation for the userspace tool `fscrypt
+<https://github.com/google/fscrypt>`_.  Also, it is recommended to use
+the fscrypt userspace tool, or other existing userspace tools such as
+`fscryptctl <https://github.com/google/fscryptctl>`_ or `Android's key
+management system
+<https://source.android.com/security/encryption/file-based>`_, over
+using the kernel's API directly.  Using existing tools reduces the
+chance of introducing your own security bugs.  (Nevertheless, for
+completeness this documentation covers the kernel's API anyway.)
+
+Unlike dm-crypt, fscrypt operates at the filesystem level rather than
+at the block device level.  This allows it to encrypt different files
+with different keys and to have unencrypted files on the same
+filesystem.  This is useful for multi-user systems where each user's
+data-at-rest needs to be cryptographically isolated from the others.
+However, except for filenames, fscrypt does not encrypt filesystem
+metadata.
+
+Unlike eCryptfs, which is a stacked filesystem, fscrypt is integrated
+directly into supported filesystems --- currently ext4, F2FS, and
+UBIFS.  This allows encrypted files to be read and written without
+caching both the decrypted and encrypted pages in the pagecache,
+thereby nearly halving the memory used and bringing it in line with
+unencrypted files.  Similarly, half as many dentries and inodes are
+needed.  eCryptfs also limits encrypted filenames to 143 bytes,
+causing application compatibility issues; fscrypt allows the full 255
+bytes (NAME_MAX).  Finally, unlike eCryptfs, the fscrypt API can be
+used by unprivileged users, with no need to mount anything.
+
+fscrypt does not support encrypting files in-place.  Instead, it
+supports marking an empty directory as encrypted.  Then, after
+userspace provides the key, all regular files, directories, and
+symbolic links created in that directory tree are transparently
+encrypted.
+
+Threat model
+============
+
+Offline attacks
+---------------
+
+Provided that userspace chooses a strong encryption key, fscrypt
+protects the confidentiality of file contents and filenames in the
+event of a single point-in-time permanent offline compromise of the
+block device content.  fscrypt does not protect the confidentiality of
+non-filename metadata, e.g. file sizes, file permissions, file
+timestamps, and extended attributes.  Also, the existence and location
+of holes (unallocated blocks which logically contain all zeroes) in
+files is not protected.
+
+fscrypt is not guaranteed to protect confidentiality or authenticity
+if an attacker is able to manipulate the filesystem offline prior to
+an authorized user later accessing the filesystem.
+
+Online attacks
+--------------
+
+fscrypt (and storage encryption in general) can only provide limited
+protection, if any at all, against online attacks.  In detail:
+
+fscrypt is only resistant to side-channel attacks, such as timing or
+electromagnetic attacks, to the extent that the underlying Linux
+Cryptographic API algorithms are.  If a vulnerable algorithm is used,
+such as a table-based implementation of AES, it may be possible for an
+attacker to mount a side channel attack against the online system.
+Side channel attacks may also be mounted against applications
+consuming decrypted data.
+
+After an encryption key has been provided, fscrypt is not designed to
+hide the plaintext file contents or filenames from other users on the
+same system, regardless of the visibility of the keyring key.
+Instead, existing access control mechanisms such as file mode bits,
+POSIX ACLs, LSMs, or mount namespaces should be used for this purpose.
+Also note that as long as the encryption keys are *anywhere* in
+memory, an online attacker can necessarily compromise them by mounting
+a physical attack or by exploiting any kernel security vulnerability
+which provides an arbitrary memory read primitive.
+
+While it is ostensibly possible to "evict" keys from the system,
+recently accessed encrypted files will remain accessible at least
+until the filesystem is unmounted or the VFS caches are dropped, e.g.
+using ``echo 2 > /proc/sys/vm/drop_caches``.  Even after that, if the
+RAM is compromised before being powered off, it will likely still be
+possible to recover portions of the plaintext file contents, if not
+some of the encryption keys as well.  (Since Linux v4.12, all
+in-kernel keys related to fscrypt are sanitized before being freed.
+However, userspace would need to do its part as well.)
+
+Currently, fscrypt does not prevent a user from maliciously providing
+an incorrect key for another user's existing encrypted files.  A
+protection against this is planned.
+
+Key hierarchy
+=============
+
+Master Keys
+-----------
+
+Each encrypted directory tree is protected by a *master key*.  Master
+keys can be up to 64 bytes long, and must be at least as long as the
+greater of the key length needed by the contents and filenames
+encryption modes being used.  For example, if AES-256-XTS is used for
+contents encryption, the master key must be 64 bytes (512 bits).  Note
+that the XTS mode is defined to require a key twice as long as that
+required by the underlying block cipher.
+
+To "unlock" an encrypted directory tree, userspace must provide the
+appropriate master key.  There can be any number of master keys, each
+of which protects any number of directory trees on any number of
+filesystems.
+
+Userspace should generate master keys either using a cryptographically
+secure random number generator, or by using a KDF (Key Derivation
+Function).  Note that whenever a KDF is used to "stretch" a
+lower-entropy secret such as a passphrase, it is critical that a KDF
+designed for this purpose be used, such as scrypt, PBKDF2, or Argon2.
+
+Per-file keys
+-------------
+
+Master keys are not used to encrypt file contents or names directly.
+Instead, a unique key is derived for each encrypted file, including
+each regular file, directory, and symbolic link.  This has several
+advantages:
+
+- In cryptosystems, the same key material should never be used for
+  different purposes.  Using the master key as both an XTS key for
+  contents encryption and as a CTS-CBC key for filenames encryption
+  would violate this rule.
+- Per-file keys simplify the choice of IVs (Initialization Vectors)
+  for contents encryption.  Without per-file keys, to ensure IV
+  uniqueness both the inode and logical block number would need to be
+  encoded in the IVs.  This would make it impossible to renumber
+  inodes, which e.g. ``resize2fs`` can do when resizing an ext4
+  filesystem.  With per-file keys, it is sufficient to encode just the
+  logical block number in the IVs.
+- Per-file keys strengthen the encryption of filenames, where IVs are
+  reused out of necessity.  With a unique key per directory, IV reuse
+  is limited to within a single directory.
+- Per-file keys allow individual files to be securely erased simply by
+  securely erasing their keys.  (Not yet implemented.)
+
+A KDF (Key Derivation Function) is used to derive per-file keys from
+the master key.  This is done instead of wrapping a randomly-generated
+key for each file because it reduces the size of the encryption xattr,
+which for some filesystems makes the xattr more likely to fit in-line
+in the filesystem's inode table.  With a KDF, only a 16-byte nonce is
+required --- long enough to make key reuse extremely unlikely.  A
+wrapped key, on the other hand, would need to be up to 64 bytes ---
+the length of an AES-256-XTS key.  Furthermore, currently there is no
+requirement to support unlocking a file with multiple alternative
+master keys or to support rotating master keys.  Instead, the master
+keys may be wrapped in userspace, e.g. as done by the `fscrypt
+<https://github.com/google/fscrypt>`_ tool.
+
+The current KDF encrypts the master key using the 16-byte nonce as an
+AES-128-ECB key.  The output is used as the derived key.  If the
+output is longer than needed, then it is truncated to the needed
+length.  Truncation is the norm for directories and symlinks, since
+those use the CTS-CBC encryption mode which requires a key half as
+long as that required by the XTS encryption mode.
+
+Note: this KDF meets the primary security requirement, which is to
+produce unique derived keys that preserve the entropy of the master
+key, assuming that the master key is already a good pseudorandom key.
+However, it is nonstandard and has some problems such as being
+reversible, so it is generally considered to be a mistake!  It may be
+replaced with HKDF or another more standard KDF in the future.
+
+Encryption modes and usage
+==========================
+
+fscrypt allows one encryption mode to be specified for file contents
+and one encryption mode to be specified for filenames.  Different
+directory trees are permitted to use different encryption modes.
+Currently, the following pairs of encryption modes are supported:
+
+- AES-256-XTS for contents and AES-256-CTS-CBC for filenames
+- AES-128-CBC for contents and AES-128-CTS-CBC for filenames
+- Speck128/256-XTS for contents and Speck128/256-CTS-CBC for filenames
+
+It is strongly recommended to use AES-256-XTS for contents encryption.
+AES-128-CBC was added only for low-powered embedded devices with
+crypto accelerators such as CAAM or CESA that do not support XTS.
+
+Similarly, Speck128/256 support was only added for older or low-end
+CPUs which cannot do AES fast enough -- especially ARM CPUs which have
+NEON instructions but not the Cryptography Extensions -- and for which
+it would not otherwise be feasible to use encryption at all.  It is
+not recommended to use Speck on CPUs that have AES instructions.
+Speck support is only available if it has been enabled in the crypto
+API via CONFIG_CRYPTO_SPECK.  Also, on ARM platforms, to get
+acceptable performance CONFIG_CRYPTO_SPECK_NEON must be enabled.
+
+New encryption modes can be added relatively easily, without changes
+to individual filesystems.  However, authenticated encryption (AE)
+modes are not currently supported because of the difficulty of dealing
+with ciphertext expansion.
+
+For file contents, each filesystem block is encrypted independently.
+Currently, only the case where the filesystem block size is equal to
+the system's page size (usually 4096 bytes) is supported.  With the
+XTS mode of operation (recommended), the logical block number within
+the file is used as the IV.  With the CBC mode of operation (not
+recommended), ESSIV is used; specifically, the IV for CBC is the
+logical block number encrypted with AES-256, where the AES-256 key is
+the SHA-256 hash of the inode's data encryption key.
+
+For filenames, the full filename is encrypted at once.  Because of the
+requirements to retain support for efficient directory lookups and
+filenames of up to 255 bytes, a constant initialization vector (IV) is
+used.  However, each encrypted directory uses a unique key, which
+limits IV reuse to within a single directory.  Note that IV reuse in
+the context of CTS-CBC encryption means that when the original
+filenames share a common prefix at least as long as the cipher block
+size (16 bytes for AES), the corresponding encrypted filenames will
+also share a common prefix.  This is undesirable; it may be fixed in
+the future by switching to an encryption mode that is a strong
+pseudorandom permutation on arbitrary-length messages, e.g. the HEH
+(Hash-Encrypt-Hash) mode.
+
+Since filenames are encrypted with the CTS-CBC mode of operation, the
+plaintext and ciphertext filenames need not be multiples of the AES
+block size, i.e. 16 bytes.  However, the minimum size that can be
+encrypted is 16 bytes, so shorter filenames are NUL-padded to 16 bytes
+before being encrypted.  In addition, to reduce leakage of filename
+lengths via their ciphertexts, all filenames are NUL-padded to the
+next 4, 8, 16, or 32-byte boundary (configurable).  32 is recommended
+since this provides the best confidentiality, at the cost of making
+directory entries consume slightly more space.  Note that since NUL
+(``\0``) is not otherwise a valid character in filenames, the padding
+will never produce duplicate plaintexts.
+
+Symbolic link targets are considered a type of filename and are
+encrypted in the same way as filenames in directory entries.  Each
+symlink also uses a unique key; hence, the hardcoded IV is not a
+problem for symlinks.
+
+User API
+========
+
+Setting an encryption policy
+----------------------------
+
+The FS_IOC_SET_ENCRYPTION_POLICY ioctl sets an encryption policy on an
+empty directory or verifies that a directory or regular file already
+has the specified encryption policy.  It takes in a pointer to a
+:c:type:`struct fscrypt_policy`, defined as follows::
+
+    #define FS_KEY_DESCRIPTOR_SIZE  8
+
+    struct fscrypt_policy {
+            __u8 version;
+            __u8 contents_encryption_mode;
+            __u8 filenames_encryption_mode;
+            __u8 flags;
+            __u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
+    };
+
+This structure must be initialized as follows:
+
+- ``version`` must be 0.
+
+- ``contents_encryption_mode`` and ``filenames_encryption_mode`` must
+  be set to constants from ``<linux/fs.h>`` which identify the
+  encryption modes to use.  If unsure, use
+  FS_ENCRYPTION_MODE_AES_256_XTS (1) for ``contents_encryption_mode``
+  and FS_ENCRYPTION_MODE_AES_256_CTS (4) for
+  ``filenames_encryption_mode``.
+
+- ``flags`` must be set to a value from ``<linux/fs.h>`` which
+  identifies the amount of NUL-padding to use when encrypting
+  filenames.  If unsure, use FS_POLICY_FLAGS_PAD_32 (0x3).
+
+- ``master_key_descriptor`` specifies how to find the master key in
+  the keyring; see `Adding keys`_.  It is up to userspace to choose a
+  unique ``master_key_descriptor`` for each master key.  The e4crypt
+  and fscrypt tools use the first 8 bytes of
+  ``SHA-512(SHA-512(master_key))``, but this particular scheme is not
+  required.  Also, the master key need not be in the keyring yet when
+  FS_IOC_SET_ENCRYPTION_POLICY is executed.  However, it must be added
+  before any files can be created in the encrypted directory.
+
+If the file is not yet encrypted, then FS_IOC_SET_ENCRYPTION_POLICY
+verifies that the file is an empty directory.  If so, the specified
+encryption policy is assigned to the directory, turning it into an
+encrypted directory.  After that, and after providing the
+corresponding master key as described in `Adding keys`_, all regular
+files, directories (recursively), and symlinks created in the
+directory will be encrypted, inheriting the same encryption policy.
+The filenames in the directory's entries will be encrypted as well.
+
+Alternatively, if the file is already encrypted, then
+FS_IOC_SET_ENCRYPTION_POLICY validates that the specified encryption
+policy exactly matches the actual one.  If they match, then the ioctl
+returns 0.  Otherwise, it fails with EEXIST.  This works on both
+regular files and directories, including nonempty directories.
+
+Note that the ext4 filesystem does not allow the root directory to be
+encrypted, even if it is empty.  Users who want to encrypt an entire
+filesystem with one key should consider using dm-crypt instead.
+
+FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors:
+
+- ``EACCES``: the file is not owned by the process's uid, nor does the
+  process have the CAP_FOWNER capability in a namespace with the file
+  owner's uid mapped
+- ``EEXIST``: the file is already encrypted with an encryption policy
+  different from the one specified
+- ``EINVAL``: an invalid encryption policy was specified (invalid
+  version, mode(s), or flags)
+- ``ENOTDIR``: the file is unencrypted and is a regular file, not a
+  directory
+- ``ENOTEMPTY``: the file is unencrypted and is a nonempty directory
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+  support for this filesystem, or the filesystem superblock has not
+  had encryption enabled on it.  (For example, to use encryption on an
+  ext4 filesystem, CONFIG_EXT4_ENCRYPTION must be enabled in the
+  kernel config, and the superblock must have had the "encrypt"
+  feature flag enabled using ``tune2fs -O encrypt`` or ``mkfs.ext4 -O
+  encrypt``.)
+- ``EPERM``: this directory may not be encrypted, e.g. because it is
+  the root directory of an ext4 filesystem
+- ``EROFS``: the filesystem is readonly
+
+Getting an encryption policy
+----------------------------
+
+The FS_IOC_GET_ENCRYPTION_POLICY ioctl retrieves the :c:type:`struct
+fscrypt_policy`, if any, for a directory or regular file.  See above
+for the struct definition.  No additional permissions are required
+beyond the ability to open the file.
+
+FS_IOC_GET_ENCRYPTION_POLICY can fail with the following errors:
+
+- ``EINVAL``: the file is encrypted, but it uses an unrecognized
+  encryption context format
+- ``ENODATA``: the file is not encrypted
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+  support for this filesystem
+
+Note: if you only need to know whether a file is encrypted or not, on
+most filesystems it is also possible to use the FS_IOC_GETFLAGS ioctl
+and check for FS_ENCRYPT_FL, or to use the statx() system call and
+check for STATX_ATTR_ENCRYPTED in stx_attributes.
+
+Getting the per-filesystem salt
+-------------------------------
+
+Some filesystems, such as ext4 and F2FS, also support the deprecated
+ioctl FS_IOC_GET_ENCRYPTION_PWSALT.  This ioctl retrieves a randomly
+generated 16-byte value stored in the filesystem superblock.  This
+value is intended to used as a salt when deriving an encryption key
+from a passphrase or other low-entropy user credential.
+
+FS_IOC_GET_ENCRYPTION_PWSALT is deprecated.  Instead, prefer to
+generate and manage any needed salt(s) in userspace.
+
+Adding keys
+-----------
+
+To provide a master key, userspace must add it to an appropriate
+keyring using the add_key() system call (see:
+``Documentation/security/keys/core.rst``).  The key type must be
+"logon"; keys of this type are kept in kernel memory and cannot be
+read back by userspace.  The key description must be "fscrypt:"
+followed by the 16-character lower case hex representation of the
+``master_key_descriptor`` that was set in the encryption policy.  The
+key payload must conform to the following structure::
+
+    #define FS_MAX_KEY_SIZE 64
+
+    struct fscrypt_key {
+            u32 mode;
+            u8 raw[FS_MAX_KEY_SIZE];
+            u32 size;
+    };
+
+``mode`` is ignored; just set it to 0.  The actual key is provided in
+``raw`` with ``size`` indicating its size in bytes.  That is, the
+bytes ``raw[0..size-1]`` (inclusive) are the actual key.
+
+The key description prefix "fscrypt:" may alternatively be replaced
+with a filesystem-specific prefix such as "ext4:".  However, the
+filesystem-specific prefixes are deprecated and should not be used in
+new programs.
+
+There are several different types of keyrings in which encryption keys
+may be placed, such as a session keyring, a user session keyring, or a
+user keyring.  Each key must be placed in a keyring that is "attached"
+to all processes that might need to access files encrypted with it, in
+the sense that request_key() will find the key.  Generally, if only
+processes belonging to a specific user need to access a given
+encrypted directory and no session keyring has been installed, then
+that directory's key should be placed in that user's user session
+keyring or user keyring.  Otherwise, a session keyring should be
+installed if needed, and the key should be linked into that session
+keyring, or in a keyring linked into that session keyring.
+
+Note: introducing the complex visibility semantics of keyrings here
+was arguably a mistake --- especially given that by design, after any
+process successfully opens an encrypted file (thereby setting up the
+per-file key), possessing the keyring key is not actually required for
+any process to read/write the file until its in-memory inode is
+evicted.  In the future there probably should be a way to provide keys
+directly to the filesystem instead, which would make the intended
+semantics clearer.
+
+Access semantics
+================
+
+With the key
+------------
+
+With the encryption key, encrypted regular files, directories, and
+symlinks behave very similarly to their unencrypted counterparts ---
+after all, the encryption is intended to be transparent.  However,
+astute users may notice some differences in behavior:
+
+- Unencrypted files, or files encrypted with a different encryption
+  policy (i.e. different key, modes, or flags), cannot be renamed or
+  linked into an encrypted directory; see `Encryption policy
+  enforcement`_.  Attempts to do so will fail with EPERM.  However,
+  encrypted files can be renamed within an encrypted directory, or
+  into an unencrypted directory.
+
+- Direct I/O is not supported on encrypted files.  Attempts to use
+  direct I/O on such files will fall back to buffered I/O.
+
+- The fallocate operations FALLOC_FL_COLLAPSE_RANGE,
+  FALLOC_FL_INSERT_RANGE, and FALLOC_FL_ZERO_RANGE are not supported
+  on encrypted files and will fail with EOPNOTSUPP.
+
+- Online defragmentation of encrypted files is not supported.  The
+  EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with
+  EOPNOTSUPP.
+
+- The ext4 filesystem does not support data journaling with encrypted
+  regular files.  It will fall back to ordered data mode instead.
+
+- DAX (Direct Access) is not supported on encrypted files.
+
+- The st_size of an encrypted symlink will not necessarily give the
+  length of the symlink target as required by POSIX.  It will actually
+  give the length of the ciphertext, which will be slightly longer
+  than the plaintext due to NUL-padding and an extra 2-byte overhead.
+
+- The maximum length of an encrypted symlink is 2 bytes shorter than
+  the maximum length of an unencrypted symlink.  For example, on an
+  EXT4 filesystem with a 4K block size, unencrypted symlinks can be up
+  to 4095 bytes long, while encrypted symlinks can only be up to 4093
+  bytes long (both lengths excluding the terminating null).
+
+Note that mmap *is* supported.  This is possible because the pagecache
+for an encrypted file contains the plaintext, not the ciphertext.
+
+Without the key
+---------------
+
+Some filesystem operations may be performed on encrypted regular
+files, directories, and symlinks even before their encryption key has
+been provided:
+
+- File metadata may be read, e.g. using stat().
+
+- Directories may be listed, in which case the filenames will be
+  listed in an encoded form derived from their ciphertext.  The
+  current encoding algorithm is described in `Filename hashing and
+  encoding`_.  The algorithm is subject to change, but it is
+  guaranteed that the presented filenames will be no longer than
+  NAME_MAX bytes, will not contain the ``/`` or ``\0`` characters, and
+  will uniquely identify directory entries.
+
+  The ``.`` and ``..`` directory entries are special.  They are always
+  present and are not encrypted or encoded.
+
+- Files may be deleted.  That is, nondirectory files may be deleted
+  with unlink() as usual, and empty directories may be deleted with
+  rmdir() as usual.  Therefore, ``rm`` and ``rm -r`` will work as
+  expected.
+
+- Symlink targets may be read and followed, but they will be presented
+  in encrypted form, similar to filenames in directories.  Hence, they
+  are unlikely to point to anywhere useful.
+
+Without the key, regular files cannot be opened or truncated.
+Attempts to do so will fail with ENOKEY.  This implies that any
+regular file operations that require a file descriptor, such as
+read(), write(), mmap(), fallocate(), and ioctl(), are also forbidden.
+
+Also without the key, files of any type (including directories) cannot
+be created or linked into an encrypted directory, nor can a name in an
+encrypted directory be the source or target of a rename, nor can an
+O_TMPFILE temporary file be created in an encrypted directory.  All
+such operations will fail with ENOKEY.
+
+It is not currently possible to backup and restore encrypted files
+without the encryption key.  This would require special APIs which
+have not yet been implemented.
+
+Encryption policy enforcement
+=============================
+
+After an encryption policy has been set on a directory, all regular
+files, directories, and symbolic links created in that directory
+(recursively) will inherit that encryption policy.  Special files ---
+that is, named pipes, device nodes, and UNIX domain sockets --- will
+not be encrypted.
+
+Except for those special files, it is forbidden to have unencrypted
+files, or files encrypted with a different encryption policy, in an
+encrypted directory tree.  Attempts to link or rename such a file into
+an encrypted directory will fail with EPERM.  This is also enforced
+during ->lookup() to provide limited protection against offline
+attacks that try to disable or downgrade encryption in known locations
+where applications may later write sensitive data.  It is recommended
+that systems implementing a form of "verified boot" take advantage of
+this by validating all top-level encryption policies prior to access.
+
+Implementation details
+======================
+
+Encryption context
+------------------
+
+An encryption policy is represented on-disk by a :c:type:`struct
+fscrypt_context`.  It is up to individual filesystems to decide where
+to store it, but normally it would be stored in a hidden extended
+attribute.  It should *not* be exposed by the xattr-related system
+calls such as getxattr() and setxattr() because of the special
+semantics of the encryption xattr.  (In particular, there would be
+much confusion if an encryption policy were to be added to or removed
+from anything other than an empty directory.)  The struct is defined
+as follows::
+
+    #define FS_KEY_DESCRIPTOR_SIZE  8
+    #define FS_KEY_DERIVATION_NONCE_SIZE 16
+
+    struct fscrypt_context {
+            u8 format;
+            u8 contents_encryption_mode;
+            u8 filenames_encryption_mode;
+            u8 flags;
+            u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
+            u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE];
+    };
+
+Note that :c:type:`struct fscrypt_context` contains the same
+information as :c:type:`struct fscrypt_policy` (see `Setting an
+encryption policy`_), except that :c:type:`struct fscrypt_context`
+also contains a nonce.  The nonce is randomly generated by the kernel
+and is used to derive the inode's encryption key as described in
+`Per-file keys`_.
+
+Data path changes
+-----------------
+
+For the read path (->readpage()) of regular files, filesystems can
+read the ciphertext into the page cache and decrypt it in-place.  The
+page lock must be held until decryption has finished, to prevent the
+page from becoming visible to userspace prematurely.
+
+For the write path (->writepage()) of regular files, filesystems
+cannot encrypt data in-place in the page cache, since the cached
+plaintext must be preserved.  Instead, filesystems must encrypt into a
+temporary buffer or "bounce page", then write out the temporary
+buffer.  Some filesystems, such as UBIFS, already use temporary
+buffers regardless of encryption.  Other filesystems, such as ext4 and
+F2FS, have to allocate bounce pages specially for encryption.
+
+Filename hashing and encoding
+-----------------------------
+
+Modern filesystems accelerate directory lookups by using indexed
+directories.  An indexed directory is organized as a tree keyed by
+filename hashes.  When a ->lookup() is requested, the filesystem
+normally hashes the filename being looked up so that it can quickly
+find the corresponding directory entry, if any.
+
+With encryption, lookups must be supported and efficient both with and
+without the encryption key.  Clearly, it would not work to hash the
+plaintext filenames, since the plaintext filenames are unavailable
+without the key.  (Hashing the plaintext filenames would also make it
+impossible for the filesystem's fsck tool to optimize encrypted
+directories.)  Instead, filesystems hash the ciphertext filenames,
+i.e. the bytes actually stored on-disk in the directory entries.  When
+asked to do a ->lookup() with the key, the filesystem just encrypts
+the user-supplied name to get the ciphertext.
+
+Lookups without the key are more complicated.  The raw ciphertext may
+contain the ``\0`` and ``/`` characters, which are illegal in
+filenames.  Therefore, readdir() must base64-encode the ciphertext for
+presentation.  For most filenames, this works fine; on ->lookup(), the
+filesystem just base64-decodes the user-supplied name to get back to
+the raw ciphertext.
+
+However, for very long filenames, base64 encoding would cause the
+filename length to exceed NAME_MAX.  To prevent this, readdir()
+actually presents long filenames in an abbreviated form which encodes
+a strong "hash" of the ciphertext filename, along with the optional
+filesystem-specific hash(es) needed for directory lookups.  This
+allows the filesystem to still, with a high degree of confidence, map
+the filename given in ->lookup() back to a particular directory entry
+that was previously listed by readdir().  See :c:type:`struct
+fscrypt_digested_name` in the source for more details.
+
+Note that the precise way that filenames are presented to userspace
+without the key is subject to change in the future.  It is only meant
+as a way to temporarily present valid filenames so that commands like
+``rm -r`` work as expected on encrypted directories.
author	Srinivasarao P <spathi@codeaurora.org>	2018-08-02 10:10:30 +0530
committer	Srinivasarao P <spathi@codeaurora.org>	2018-08-03 16:59:20 +0530
commit	c2e09fadec5ce348e125150e66a9a32b4af44756 (patch)
tree	652cf573762608aecdb28c230363166a03a24f39 /Documentation/filesystems
parent	414b079b7f4ba955f4679cd87be64199b4e875c7 (diff)
parent	8ec9fd8936b20ca2d18160f8b18acb4b732c2771 (diff)