diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/00-INDEX | 5 | ||||
-rw-r--r-- | Documentation/filesystems/Locking | 5 | ||||
-rw-r--r-- | Documentation/filesystems/dax.txt | 94 | ||||
-rw-r--r-- | Documentation/filesystems/dlmfs.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/ext2.txt | 5 | ||||
-rw-r--r-- | Documentation/filesystems/ext4.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/f2fs.txt | 6 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.txt | 23 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/pnfs-block-server.txt | 37 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/pnfs.txt | 13 | ||||
-rw-r--r-- | Documentation/filesystems/ocfs2.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/overlayfs.txt | 28 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 62 | ||||
-rw-r--r-- | Documentation/filesystems/seq_file.txt | 12 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 7 | ||||
-rw-r--r-- | Documentation/filesystems/xip.txt | 68 |
16 files changed, 261 insertions, 116 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index ac28149aede4..9922939e7d99 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX @@ -34,6 +34,9 @@ configfs/ - directory containing configfs documentation and example code. cramfs.txt - info on the cram filesystem for small storage (ROMs etc). +dax.txt + - info on avoiding the page cache for files stored on CPU-addressable + storage devices. debugfs.txt - info on the debugfs filesystem. devpts.txt @@ -154,5 +157,3 @@ xfs-self-describing-metadata.txt - info on XFS Self Describing Metadata. xfs.txt - info and mount options for the XFS filesystem. -xip.txt - - info on execute-in-place for file mappings. diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index b30753cbf431..f91926f2f482 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -164,8 +164,6 @@ the block device inode. See there for more details. --------------------------- file_system_type --------------------------- prototypes: - int (*get_sb) (struct file_system_type *, int, - const char *, void *, struct vfsmount *); struct dentry *(*mount) (struct file_system_type *, int, const char *, void *); void (*kill_sb) (struct super_block *); @@ -199,8 +197,6 @@ prototypes: int (*releasepage) (struct page *, int); void (*freepage)(struct page *); int (*direct_IO)(int, struct kiocb *, struct iov_iter *iter, loff_t offset); - int (*get_xip_mem)(struct address_space *, pgoff_t, int, void **, - unsigned long *); int (*migratepage)(struct address_space *, struct page *, struct page *); int (*launder_page)(struct page *); int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long); @@ -225,7 +221,6 @@ invalidatepage: yes releasepage: yes freepage: yes direct_IO: -get_xip_mem: maybe migratepage: yes (both) launder_page: yes is_partially_uptodate: yes diff --git a/Documentation/filesystems/dax.txt b/Documentation/filesystems/dax.txt new file mode 100644 index 000000000000..baf41118660d --- /dev/null +++ b/Documentation/filesystems/dax.txt @@ -0,0 +1,94 @@ +Direct Access for files +----------------------- + +Motivation +---------- + +The page cache is usually used to buffer reads and writes to files. +It is also used to provide the pages which are mapped into userspace +by a call to mmap. + +For block devices that are memory-like, the page cache pages would be +unnecessary copies of the original storage. The DAX code removes the +extra copy by performing reads and writes directly to the storage device. +For file mappings, the storage device is mapped directly into userspace. + + +Usage +----- + +If you have a block device which supports DAX, you can make a filesystem +on it as usual. When mounting it, use the -o dax option manually +or add 'dax' to the options in /etc/fstab. + + +Implementation Tips for Block Driver Writers +-------------------------------------------- + +To support DAX in your block driver, implement the 'direct_access' +block device operation. It is used to translate the sector number +(expressed in units of 512-byte sectors) to a page frame number (pfn) +that identifies the physical page for the memory. It also returns a +kernel virtual address that can be used to access the memory. + +The direct_access method takes a 'size' parameter that indicates the +number of bytes being requested. The function should return the number +of bytes that can be contiguously accessed at that offset. It may also +return a negative errno if an error occurs. + +In order to support this method, the storage must be byte-accessible by +the CPU at all times. If your device uses paging techniques to expose +a large amount of memory through a smaller window, then you cannot +implement direct_access. Equally, if your device can occasionally +stall the CPU for an extended period, you should also not attempt to +implement direct_access. + +These block devices may be used for inspiration: +- axonram: Axon DDR2 device driver +- brd: RAM backed block device driver +- dcssblk: s390 dcss block device driver + + +Implementation Tips for Filesystem Writers +------------------------------------------ + +Filesystem support consists of +- adding support to mark inodes as being DAX by setting the S_DAX flag in + i_flags +- implementing the direct_IO address space operation, and calling + dax_do_io() instead of blockdev_direct_IO() if S_DAX is set +- implementing an mmap file operation for DAX files which sets the + VM_MIXEDMAP flag on the VMA, and setting the vm_ops to include handlers + for fault and page_mkwrite (which should probably call dax_fault() and + dax_mkwrite(), passing the appropriate get_block() callback) +- calling dax_truncate_page() instead of block_truncate_page() for DAX files +- calling dax_zero_page_range() instead of zero_user() for DAX files +- ensuring that there is sufficient locking between reads, writes, + truncates and page faults + +The get_block() callback passed to the DAX functions may return +uninitialised extents. If it does, it must ensure that simultaneous +calls to get_block() (for example by a page-fault racing with a read() +or a write()) work correctly. + +These filesystems may be used for inspiration: +- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt +- ext4: the fourth extended filesystem, see Documentation/filesystems/ext4.txt + + +Shortcomings +------------ + +Even if the kernel or its modules are stored on a filesystem that supports +DAX on a block device that supports DAX, they will still be copied into RAM. + +The DAX code does not work correctly on architectures which have virtually +mapped caches such as ARM, MIPS and SPARC. + +Calling get_user_pages() on a range of user memory that has been mmaped +from a DAX file will fail as there are no 'struct page' to describe +those pages. This problem is being worked on. That means that O_DIRECT +reads/writes to those memory ranges from a non-DAX file will fail (note +that O_DIRECT reads/writes _of a DAX file_ do work, it is the memory +that is being accessed that is key here). Other things that will not +work include RDMA, sendfile() and splice(). diff --git a/Documentation/filesystems/dlmfs.txt b/Documentation/filesystems/dlmfs.txt index 1b528b2ad809..fcf4d509d118 100644 --- a/Documentation/filesystems/dlmfs.txt +++ b/Documentation/filesystems/dlmfs.txt @@ -5,8 +5,8 @@ system. dlmfs is built with OCFS2 as it requires most of its infrastructure. -Project web page: http://oss.oracle.com/projects/ocfs2 -Tools web page: http://oss.oracle.com/projects/ocfs2-tools +Project web page: http://ocfs2.wiki.kernel.org +Tools web page: https://github.com/markfasheh/ocfs2-tools OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ All code copyright 2005 Oracle except when otherwise noted. diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt index 67639f905f10..b9714569e472 100644 --- a/Documentation/filesystems/ext2.txt +++ b/Documentation/filesystems/ext2.txt @@ -20,6 +20,9 @@ minixdf Makes `df' act like Minix. check=none, nocheck (*) Don't do extra checking of bitmaps on mount (check=normal and check=strict options removed) +dax Use direct access (no page cache). See + Documentation/filesystems/dax.txt. + debug Extra debugging information is sent to the kernel syslog. Useful for developers. @@ -56,8 +59,6 @@ noacl Don't support POSIX ACLs. nobh Do not attach buffer_heads to file pagecache. -xip Use execute in place (no caching) if possible - grpquota,noquota,quota,usrquota Quota options are silently ignored by ext2. diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 919a3293aaa4..6c0108eb0137 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -386,6 +386,10 @@ max_dir_size_kb=n This limits the size of directories so that any i_version Enable 64-bit inode version support. This option is off by default. +dax Use direct access (no page cache). See + Documentation/filesystems/dax.txt. Note that + this option is incompatible with data=journal. + Data Mode ========= There are 3 different data modes: diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt index e0950c483c22..dac11d7fef27 100644 --- a/Documentation/filesystems/f2fs.txt +++ b/Documentation/filesystems/f2fs.txt @@ -106,6 +106,8 @@ background_gc=%s Turn on/off cleaning operations, namely garbage Default value for this option is on. So garbage collection is on by default. disable_roll_forward Disable the roll-forward recovery routine +norecovery Disable the roll-forward recovery routine, mounted read- + only (i.e., -o ro,disable_roll_forward) discard Issue discard/TRIM commands when a segment is cleaned. no_heap Disable heap-style segment allocation which finds free segments for data from the beginning of main area, while @@ -197,6 +199,10 @@ Files in /sys/fs/f2fs/<devname> checkpoint is triggered, and issued during the checkpoint. By default, it is disabled with 0. + trim_sections This parameter controls the number of sections + to be trimmed out in batch mode when FITRIM + conducts. 32 sections is set by default. + ipu_policy This parameter controls the policy of in-place updates in f2fs. There are five policies: 0x01: F2FS_IPU_FORCE, 0x02: F2FS_IPU_SSR, diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt index c49cd7e796e7..682a59fabe3f 100644 --- a/Documentation/filesystems/nfs/nfs41-server.txt +++ b/Documentation/filesystems/nfs/nfs41-server.txt @@ -24,11 +24,6 @@ focuses on the mandatory-to-implement NFSv4.1 Sessions, providing "exactly once" semantics and better control and throttling of the resources allocated for each client. -Other NFSv4.1 features, Parallel NFS operations in particular, -are still under development out of tree. -See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design -for more information. - The table below, taken from the NFSv4.1 document, lists the operations that are mandatory to implement (REQ), optional (OPT), and NFSv4.0 operations that are required not to implement (MNI) @@ -43,9 +38,7 @@ The OPTIONAL features identified and their abbreviations are as follows: The following abbreviations indicate the linux server implementation status. I Implemented NFSv4.1 operations. NS Not Supported. - NS* unimplemented optional feature. - P pNFS features implemented out of tree. - PNS pNFS features that are not supported yet (out of tree). + NS* Unimplemented optional feature. Operations @@ -70,13 +63,13 @@ I | DESTROY_SESSION | REQ | | Section 18.37 | I | EXCHANGE_ID | REQ | | Section 18.35 | I | FREE_STATEID | REQ | | Section 18.38 | | GETATTR | REQ | | Section 18.7 | -P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | -P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | +I | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | +NS*| GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | | GETFH | REQ | | Section 18.8 | NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | -P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | -P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | -P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | +I | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | +I | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | +I | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | | LINK | OPT | | Section 18.9 | | LOCK | REQ | | Section 18.10 | | LOCKT | REQ | | Section 18.11 | @@ -122,9 +115,9 @@ Callback Operations | | MNI | or OPT) | | +-------------------------+-----------+-------------+---------------+ | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | -P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | +I | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | -P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | +NS*| CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 | NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | | CB_RECALL | OPT | FDELG, | Section 20.2 | diff --git a/Documentation/filesystems/nfs/pnfs-block-server.txt b/Documentation/filesystems/nfs/pnfs-block-server.txt new file mode 100644 index 000000000000..2143673cf154 --- /dev/null +++ b/Documentation/filesystems/nfs/pnfs-block-server.txt @@ -0,0 +1,37 @@ +pNFS block layout server user guide + +The Linux NFS server now supports the pNFS block layout extension. In this +case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition +to handling all the metadata access to the NFS export also hands out layouts +to the clients to directly access the underlying block devices that are +shared with the client. + +To use pNFS block layouts with with the Linux NFS server the exported file +system needs to support the pNFS block layouts (currently just XFS), and the +file system must sit on shared storage (typically iSCSI) that is accessible +to the clients in addition to the MDS. As of now the file system needs to +sit directly on the exported volume, striping or concatenation of +volumes on the MDS and clients is not supported yet. + +On the server, pNFS block volume support is automatically if the file system +support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK +option enabled, the blkmapd daemon from nfs-utils is running, and the +file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1). + +If the nfsd server needs to fence a non-responding client it calls +/sbin/nfsd-recall-failed with the first argument set to the IP address of +the client, and the second argument set to the device node without the /dev +prefix for the file system to be fenced. Below is an example file that shows +how to translate the device into a serial number from SCSI EVPD 0x80: + +cat > /sbin/nfsd-recall-failed << EOF +#!/bin/sh + +CLIENT="$1" +DEV="/dev/$2" +EVPD=`sg_inq --page=0x80 ${DEV} | \ + grep "Unit serial number:" | \ + awk -F ': ' '{print $2}'` + +echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log +EOF diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt index adc81a35fe2d..44a9f2493a88 100644 --- a/Documentation/filesystems/nfs/pnfs.txt +++ b/Documentation/filesystems/nfs/pnfs.txt @@ -57,15 +57,16 @@ bit is set, preventing any new lsegs from being added. layout drivers -------------- -PNFS utilizes what is called layout drivers. The STD defines 3 basic -layout types: "files" "objects" and "blocks". For each of these types -there is a layout-driver with a common function-vectors table which -are called by the nfs-client pnfs-core to implement the different layout -types. +PNFS utilizes what is called layout drivers. The STD defines 4 basic +layout types: "files", "objects", "blocks", and "flexfiles". For each +of these types there is a layout-driver with a common function-vectors +table which are called by the nfs-client pnfs-core to implement the +different layout types. -Files-layout-driver code is in: fs/nfs/nfs4filelayout.c && nfs4filelayoutdev.c +Files-layout-driver code is in: fs/nfs/filelayout/.. directory Objects-layout-deriver code is in: fs/nfs/objlayout/.. directory Blocks-layout-deriver code is in: fs/nfs/blocklayout/.. directory +Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory objects-layout setup -------------------- diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index 28f8c08201e2..4c49e5410595 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -8,8 +8,8 @@ also make it attractive for non-clustered use. You'll want to install the ocfs2-tools package in order to at least get "mount.ocfs2" and "ocfs2_hb_ctl". -Project web page: http://oss.oracle.com/projects/ocfs2 -Tools web page: http://oss.oracle.com/projects/ocfs2-tools +Project web page: http://ocfs2.wiki.kernel.org +Tools git tree: https://github.com/markfasheh/ocfs2-tools OCFS2 mailing lists: http://oss.oracle.com/projects/ocfs2/mailman/ All code copyright 2005 Oracle except when otherwise noted. diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index a27c950ece61..6db0e5d1da07 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt @@ -159,6 +159,22 @@ overlay filesystem (though an operation on the name of the file such as rename or unlink will of course be noticed and handled). +Multiple lower layers +--------------------- + +Multiple lower layers can now be given using the the colon (":") as a +separator character between the directory names. For example: + + mount -t overlay overlay -olowerdir=/lower1:/lower2:/lower3 /merged + +As the example shows, "upperdir=" and "workdir=" may be omitted. In +that case the overlay will be read-only. + +The specified lower directories will be stacked beginning from the +rightmost one and going left. In the above example lower1 will be the +top, lower2 the middle and lower3 the bottom layer. + + Non-standard behavior --------------------- @@ -196,3 +212,15 @@ Changes to the underlying filesystems while part of a mounted overlay filesystem are not allowed. If the underlying filesystem is changed, the behavior of the overlay is undefined, though it will not result in a crash or deadlock. + +Testsuite +--------- + +There's testsuite developed by David Howells at: + + git://git.infradead.org/users/dhowells/unionmount-testsuite.git + +Run as root: + + # cd unionmount-testsuite + # ./run --ov diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index aae9dd13c91f..a07ba61662ed 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -28,7 +28,7 @@ Table of Contents 1.6 Parallel port info in /proc/parport 1.7 TTY info in /proc/tty 1.8 Miscellaneous kernel statistics in /proc/stat - 1.9 Ext4 file system parameters + 1.9 Ext4 file system parameters 2 Modifying System Parameters @@ -42,6 +42,7 @@ Table of Contents 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 3.7 /proc/<pid>/task/<tid>/children - Information about task children 3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file + 3.9 /proc/<pid>/map_files - Information about memory mapped files 4 Configuring procfs 4.1 Mount options @@ -144,6 +145,8 @@ Table 1-1: Process specific entries in /proc stack Report full stack trace, enable via CONFIG_STACKTRACE smaps a extension based on maps, showing the memory consumption of each mapping and flags associated with it + numa_maps an extension based on maps, showing the memory locality and + binding policy as well as mem usage (in pages) of each mapping. .............................................................................. For example, to get the status information of a process, all you have to do is @@ -488,12 +491,47 @@ To clear the bits for the file mapped pages associated with the process To clear the soft-dirty bit > echo 4 > /proc/PID/clear_refs +To reset the peak resident set size ("high water mark") to the process's +current value: + > echo 5 > /proc/PID/clear_refs + Any other value written to /proc/PID/clear_refs will have no effect. The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags using /proc/kpageflags and number of times a page is mapped using /proc/kpagecount. For detailed explanation, see Documentation/vm/pagemap.txt. +The /proc/pid/numa_maps is an extension based on maps, showing the memory +locality and binding policy, as well as the memory usage (in pages) of +each mapping. The output follows a general format where mapping details get +summarized separated by blank spaces, one mapping per each file line: + +address policy mapping details + +00400000 default file=/usr/local/bin/app mapped=1 active=0 N3=1 kernelpagesize_kB=4 +00600000 default file=/usr/local/bin/app anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206000000 default file=/lib64/ld-2.12.so mapped=26 mapmax=6 N0=24 N3=2 kernelpagesize_kB=4 +320621f000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206220000 default file=/lib64/ld-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206221000 default anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206800000 default file=/lib64/libc-2.12.so mapped=59 mapmax=21 active=55 N0=41 N3=18 kernelpagesize_kB=4 +320698b000 default file=/lib64/libc-2.12.so +3206b8a000 default file=/lib64/libc-2.12.so anon=2 dirty=2 N3=2 kernelpagesize_kB=4 +3206b8e000 default file=/lib64/libc-2.12.so anon=1 dirty=1 N3=1 kernelpagesize_kB=4 +3206b8f000 default anon=3 dirty=3 active=1 N3=3 kernelpagesize_kB=4 +7f4dc10a2000 default anon=3 dirty=3 N3=3 kernelpagesize_kB=4 +7f4dc10b4000 default anon=2 dirty=2 active=1 N3=2 kernelpagesize_kB=4 +7f4dc1200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N3=1 kernelpagesize_kB=2048 +7fff335f0000 default stack anon=3 dirty=3 N3=3 kernelpagesize_kB=4 +7fff3369d000 default mapped=1 mapmax=35 active=0 N3=1 kernelpagesize_kB=4 + +Where: +"address" is the starting address for the mapping; +"policy" reports the NUMA memory policy set for the mapping (see vm/numa_memory_policy.txt); +"mapping details" summarizes mapping data such as mapping type, page usage counters, +node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page +size, in KB, that is backing the mapping up. + 1.2 Kernel data --------------- @@ -1763,6 +1801,28 @@ pair provide additional information particular to the objects they represent. with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value' still exhibits timer's remaining time. +3.9 /proc/<pid>/map_files - Information about memory mapped files +--------------------------------------------------------------------- +This directory contains symbolic links which represent memory mapped files +the process is maintaining. Example output: + + | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so + | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so + | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so + | ... + | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1 + | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls + +The name of a link represents the virtual memory bounds of a mapping, i.e. +vm_area_struct::vm_start-vm_area_struct::vm_end. + +The main purpose of the map_files is to retrieve a set of memory mapped +files in a fast way instead of parsing /proc/<pid>/maps or +/proc/<pid>/smaps, both of which contain many more records. At the same +time one can open(2) mappings from the listings of two processes and +comparing their inode numbers to figure out which anonymous memory areas +are actually shared. + ------------------------------------------------------------------------------ Configuring procfs ------------------------------------------------------------------------------ diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt index b797ed38de46..9de4303201e1 100644 --- a/Documentation/filesystems/seq_file.txt +++ b/Documentation/filesystems/seq_file.txt @@ -194,16 +194,16 @@ which is in the string esc will be represented in octal form in the output. There are also a pair of functions for printing filenames: - int seq_path(struct seq_file *m, struct path *path, char *esc); - int seq_path_root(struct seq_file *m, struct path *path, - struct path *root, char *esc) + int seq_path(struct seq_file *m, const struct path *path, + const char *esc); + int seq_path_root(struct seq_file *m, const struct path *path, + const struct path *root, const char *esc) Here, path indicates the file of interest, and esc is a set of characters which should be escaped in the output. A call to seq_path() will output the path relative to the current process's filesystem root. If a different -root is desired, it can be used with seq_path_root(). Note that, if it -turns out that path cannot be reached from root, the value of root will be -changed in seq_file_root() to a root which *does* work. +root is desired, it can be used with seq_path_root(). If it turns out that +path cannot be reached from root, seq_path_root() returns SEQ_SKIP. A function producing complicated output may want to check bool seq_has_overflowed(struct seq_file *m); diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 43ce0507ee25..966b22829f3b 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -591,8 +591,6 @@ struct address_space_operations { int (*releasepage) (struct page *, int); void (*freepage)(struct page *); ssize_t (*direct_IO)(int, struct kiocb *, struct iov_iter *iter, loff_t offset); - struct page* (*get_xip_page)(struct address_space *, sector_t, - int); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct page *, struct page *); int (*launder_page) (struct page *); @@ -748,11 +746,6 @@ struct address_space_operations { and transfer data directly between the storage and the application's address space. - get_xip_page: called by the VM to translate a block number to a page. - The page is valid until the corresponding filesystem is unmounted. - Filesystems that want to use execute-in-place (XIP) need to implement - it. An example implementation can be found in fs/ext2/xip.c. - migrate_page: This is used to compact the physical memory usage. If the VM wants to relocate a page (maybe off a memory card that is signalling imminent failure) it will pass a new page diff --git a/Documentation/filesystems/xip.txt b/Documentation/filesystems/xip.txt deleted file mode 100644 index 0466ee569278..000000000000 --- a/Documentation/filesystems/xip.txt +++ /dev/null @@ -1,68 +0,0 @@ -Execute-in-place for file mappings ----------------------------------- - -Motivation ----------- -File mappings are performed by mapping page cache pages to userspace. In -addition, read&write type file operations also transfer data from/to the page -cache. - -For memory backed storage devices that use the block device interface, the page -cache pages are in fact copies of the original storage. Various approaches -exist to work around the need for an extra copy. The ramdisk driver for example -does read the data into the page cache, keeps a reference, and discards the -original data behind later on. - -Execute-in-place solves this issue the other way around: instead of keeping -data in the page cache, the need to have a page cache copy is eliminated -completely. With execute-in-place, read&write type operations are performed -directly from/to the memory backed storage device. For file mappings, the -storage device itself is mapped directly into userspace. - -This implementation was initially written for shared memory segments between -different virtual machines on s390 hardware to allow multiple machines to -share the same binaries and libraries. - -Implementation --------------- -Execute-in-place is implemented in three steps: block device operation, -address space operation, and file operations. - -A block device operation named direct_access is used to retrieve a -reference (pointer) to a block on-disk. The reference is supposed to be -cpu-addressable, physical address and remain valid until the release operation -is performed. A struct block_device reference is used to address the device, -and a sector_t argument is used to identify the individual block. As an -alternative, memory technology devices can be used for this. - -The block device operation is optional, these block devices support it as of -today: -- dcssblk: s390 dcss block device driver - -An address space operation named get_xip_mem is used to retrieve references -to a page frame number and a kernel address. To obtain these values a reference -to an address_space is provided. This function assigns values to the kmem and -pfn parameters. The third argument indicates whether the function should allocate -blocks if needed. - -This address space operation is mutually exclusive with readpage&writepage that -do page cache read/write operations. -The following filesystems support it as of today: -- ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt - -A set of file operations that do utilize get_xip_page can be found in -mm/filemap_xip.c . The following file operation implementations are provided: -- aio_read/aio_write -- readv/writev -- sendfile - -The generic file operations do_sync_read/do_sync_write can be used to implement -classic synchronous IO calls. - -Shortcomings ------------- -This implementation is limited to storage devices that are cpu addressable at -all times (no highmem or such). It works well on rom/ram, but enhancements are -needed to make it work with flash in read+write mode. -Putting the Linux kernel and/or its modules on a xip filesystem does not mean -they are not copied. |