diff options
345 files changed, 7548 insertions, 2708 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index ea6a043f5beb..50f95689ab38 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -276,6 +276,7 @@ What: /sys/devices/system/cpu/vulnerabilities /sys/devices/system/cpu/vulnerabilities/meltdown /sys/devices/system/cpu/vulnerabilities/spectre_v1 /sys/devices/system/cpu/vulnerabilities/spectre_v2 + /sys/devices/system/cpu/vulnerabilities/spec_store_bypass Date: January 2018 Contact: Linux kernel mailing list <linux-kernel@vger.kernel.org> Description: Information about CPU vulnerabilities diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst new file mode 100644 index 000000000000..48b424de85bb --- /dev/null +++ b/Documentation/filesystems/fscrypt.rst @@ -0,0 +1,626 @@ +===================================== +Filesystem-level encryption (fscrypt) +===================================== + +Introduction +============ + +fscrypt is a library which filesystems can hook into to support +transparent encryption of files and directories. + +Note: "fscrypt" in this document refers to the kernel-level portion, +implemented in ``fs/crypto/``, as opposed to the userspace tool +`fscrypt <https://github.com/google/fscrypt>`_. This document only +covers the kernel-level portion. For command-line examples of how to +use encryption, see the documentation for the userspace tool `fscrypt +<https://github.com/google/fscrypt>`_. Also, it is recommended to use +the fscrypt userspace tool, or other existing userspace tools such as +`fscryptctl <https://github.com/google/fscryptctl>`_ or `Android's key +management system +<https://source.android.com/security/encryption/file-based>`_, over +using the kernel's API directly. Using existing tools reduces the +chance of introducing your own security bugs. (Nevertheless, for +completeness this documentation covers the kernel's API anyway.) + +Unlike dm-crypt, fscrypt operates at the filesystem level rather than +at the block device level. This allows it to encrypt different files +with different keys and to have unencrypted files on the same +filesystem. This is useful for multi-user systems where each user's +data-at-rest needs to be cryptographically isolated from the others. +However, except for filenames, fscrypt does not encrypt filesystem +metadata. + +Unlike eCryptfs, which is a stacked filesystem, fscrypt is integrated +directly into supported filesystems --- currently ext4, F2FS, and +UBIFS. This allows encrypted files to be read and written without +caching both the decrypted and encrypted pages in the pagecache, +thereby nearly halving the memory used and bringing it in line with +unencrypted files. Similarly, half as many dentries and inodes are +needed. eCryptfs also limits encrypted filenames to 143 bytes, +causing application compatibility issues; fscrypt allows the full 255 +bytes (NAME_MAX). Finally, unlike eCryptfs, the fscrypt API can be +used by unprivileged users, with no need to mount anything. + +fscrypt does not support encrypting files in-place. Instead, it +supports marking an empty directory as encrypted. Then, after +userspace provides the key, all regular files, directories, and +symbolic links created in that directory tree are transparently +encrypted. + +Threat model +============ + +Offline attacks +--------------- + +Provided that userspace chooses a strong encryption key, fscrypt +protects the confidentiality of file contents and filenames in the +event of a single point-in-time permanent offline compromise of the +block device content. fscrypt does not protect the confidentiality of +non-filename metadata, e.g. file sizes, file permissions, file +timestamps, and extended attributes. Also, the existence and location +of holes (unallocated blocks which logically contain all zeroes) in +files is not protected. + +fscrypt is not guaranteed to protect confidentiality or authenticity +if an attacker is able to manipulate the filesystem offline prior to +an authorized user later accessing the filesystem. + +Online attacks +-------------- + +fscrypt (and storage encryption in general) can only provide limited +protection, if any at all, against online attacks. In detail: + +fscrypt is only resistant to side-channel attacks, such as timing or +electromagnetic attacks, to the extent that the underlying Linux +Cryptographic API algorithms are. If a vulnerable algorithm is used, +such as a table-based implementation of AES, it may be possible for an +attacker to mount a side channel attack against the online system. +Side channel attacks may also be mounted against applications +consuming decrypted data. + +After an encryption key has been provided, fscrypt is not designed to +hide the plaintext file contents or filenames from other users on the +same system, regardless of the visibility of the keyring key. +Instead, existing access control mechanisms such as file mode bits, +POSIX ACLs, LSMs, or mount namespaces should be used for this purpose. +Also note that as long as the encryption keys are *anywhere* in +memory, an online attacker can necessarily compromise them by mounting +a physical attack or by exploiting any kernel security vulnerability +which provides an arbitrary memory read primitive. + +While it is ostensibly possible to "evict" keys from the system, +recently accessed encrypted files will remain accessible at least +until the filesystem is unmounted or the VFS caches are dropped, e.g. +using ``echo 2 > /proc/sys/vm/drop_caches``. Even after that, if the +RAM is compromised before being powered off, it will likely still be +possible to recover portions of the plaintext file contents, if not +some of the encryption keys as well. (Since Linux v4.12, all +in-kernel keys related to fscrypt are sanitized before being freed. +However, userspace would need to do its part as well.) + +Currently, fscrypt does not prevent a user from maliciously providing +an incorrect key for another user's existing encrypted files. A +protection against this is planned. + +Key hierarchy +============= + +Master Keys +----------- + +Each encrypted directory tree is protected by a *master key*. Master +keys can be up to 64 bytes long, and must be at least as long as the +greater of the key length needed by the contents and filenames +encryption modes being used. For example, if AES-256-XTS is used for +contents encryption, the master key must be 64 bytes (512 bits). Note +that the XTS mode is defined to require a key twice as long as that +required by the underlying block cipher. + +To "unlock" an encrypted directory tree, userspace must provide the +appropriate master key. There can be any number of master keys, each +of which protects any number of directory trees on any number of +filesystems. + +Userspace should generate master keys either using a cryptographically +secure random number generator, or by using a KDF (Key Derivation +Function). Note that whenever a KDF is used to "stretch" a +lower-entropy secret such as a passphrase, it is critical that a KDF +designed for this purpose be used, such as scrypt, PBKDF2, or Argon2. + +Per-file keys +------------- + +Master keys are not used to encrypt file contents or names directly. +Instead, a unique key is derived for each encrypted file, including +each regular file, directory, and symbolic link. This has several +advantages: + +- In cryptosystems, the same key material should never be used for + different purposes. Using the master key as both an XTS key for + contents encryption and as a CTS-CBC key for filenames encryption + would violate this rule. +- Per-file keys simplify the choice of IVs (Initialization Vectors) + for contents encryption. Without per-file keys, to ensure IV + uniqueness both the inode and logical block number would need to be + encoded in the IVs. This would make it impossible to renumber + inodes, which e.g. ``resize2fs`` can do when resizing an ext4 + filesystem. With per-file keys, it is sufficient to encode just the + logical block number in the IVs. +- Per-file keys strengthen the encryption of filenames, where IVs are + reused out of necessity. With a unique key per directory, IV reuse + is limited to within a single directory. +- Per-file keys allow individual files to be securely erased simply by + securely erasing their keys. (Not yet implemented.) + +A KDF (Key Derivation Function) is used to derive per-file keys from +the master key. This is done instead of wrapping a randomly-generated +key for each file because it reduces the size of the encryption xattr, +which for some filesystems makes the xattr more likely to fit in-line +in the filesystem's inode table. With a KDF, only a 16-byte nonce is +required --- long enough to make key reuse extremely unlikely. A +wrapped key, on the other hand, would need to be up to 64 bytes --- +the length of an AES-256-XTS key. Furthermore, currently there is no +requirement to support unlocking a file with multiple alternative +master keys or to support rotating master keys. Instead, the master +keys may be wrapped in userspace, e.g. as done by the `fscrypt +<https://github.com/google/fscrypt>`_ tool. + +The current KDF encrypts the master key using the 16-byte nonce as an +AES-128-ECB key. The output is used as the derived key. If the +output is longer than needed, then it is truncated to the needed +length. Truncation is the norm for directories and symlinks, since +those use the CTS-CBC encryption mode which requires a key half as +long as that required by the XTS encryption mode. + +Note: this KDF meets the primary security requirement, which is to +produce unique derived keys that preserve the entropy of the master +key, assuming that the master key is already a good pseudorandom key. +However, it is nonstandard and has some problems such as being +reversible, so it is generally considered to be a mistake! It may be +replaced with HKDF or another more standard KDF in the future. + +Encryption modes and usage +========================== + +fscrypt allows one encryption mode to be specified for file contents +and one encryption mode to be specified for filenames. Different +directory trees are permitted to use different encryption modes. +Currently, the following pairs of encryption modes are supported: + +- AES-256-XTS for contents and AES-256-CTS-CBC for filenames +- AES-128-CBC for contents and AES-128-CTS-CBC for filenames +- Speck128/256-XTS for contents and Speck128/256-CTS-CBC for filenames + +It is strongly recommended to use AES-256-XTS for contents encryption. +AES-128-CBC was added only for low-powered embedded devices with +crypto accelerators such as CAAM or CESA that do not support XTS. + +Similarly, Speck128/256 support was only added for older or low-end +CPUs which cannot do AES fast enough -- especially ARM CPUs which have +NEON instructions but not the Cryptography Extensions -- and for which +it would not otherwise be feasible to use encryption at all. It is +not recommended to use Speck on CPUs that have AES instructions. +Speck support is only available if it has been enabled in the crypto +API via CONFIG_CRYPTO_SPECK. Also, on ARM platforms, to get +acceptable performance CONFIG_CRYPTO_SPECK_NEON must be enabled. + +New encryption modes can be added relatively easily, without changes +to individual filesystems. However, authenticated encryption (AE) +modes are not currently supported because of the difficulty of dealing +with ciphertext expansion. + +For file contents, each filesystem block is encrypted independently. +Currently, only the case where the filesystem block size is equal to +the system's page size (usually 4096 bytes) is supported. With the +XTS mode of operation (recommended), the logical block number within +the file is used as the IV. With the CBC mode of operation (not +recommended), ESSIV is used; specifically, the IV for CBC is the +logical block number encrypted with AES-256, where the AES-256 key is +the SHA-256 hash of the inode's data encryption key. + +For filenames, the full filename is encrypted at once. Because of the +requirements to retain support for efficient directory lookups and +filenames of up to 255 bytes, a constant initialization vector (IV) is +used. However, each encrypted directory uses a unique key, which +limits IV reuse to within a single directory. Note that IV reuse in +the context of CTS-CBC encryption means that when the original +filenames share a common prefix at least as long as the cipher block +size (16 bytes for AES), the corresponding encrypted filenames will +also share a common prefix. This is undesirable; it may be fixed in +the future by switching to an encryption mode that is a strong +pseudorandom permutation on arbitrary-length messages, e.g. the HEH +(Hash-Encrypt-Hash) mode. + +Since filenames are encrypted with the CTS-CBC mode of operation, the +plaintext and ciphertext filenames need not be multiples of the AES +block size, i.e. 16 bytes. However, the minimum size that can be +encrypted is 16 bytes, so shorter filenames are NUL-padded to 16 bytes +before being encrypted. In addition, to reduce leakage of filename +lengths via their ciphertexts, all filenames are NUL-padded to the +next 4, 8, 16, or 32-byte boundary (configurable). 32 is recommended +since this provides the best confidentiality, at the cost of making +directory entries consume slightly more space. Note that since NUL +(``\0``) is not otherwise a valid character in filenames, the padding +will never produce duplicate plaintexts. + +Symbolic link targets are considered a type of filename and are +encrypted in the same way as filenames in directory entries. Each +symlink also uses a unique key; hence, the hardcoded IV is not a +problem for symlinks. + +User API +======== + +Setting an encryption policy +---------------------------- + +The FS_IOC_SET_ENCRYPTION_POLICY ioctl sets an encryption policy on an +empty directory or verifies that a directory or regular file already +has the specified encryption policy. It takes in a pointer to a +:c:type:`struct fscrypt_policy`, defined as follows:: + + #define FS_KEY_DESCRIPTOR_SIZE 8 + + struct fscrypt_policy { + __u8 version; + __u8 contents_encryption_mode; + __u8 filenames_encryption_mode; + __u8 flags; + __u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; + }; + +This structure must be initialized as follows: + +- ``version`` must be 0. + +- ``contents_encryption_mode`` and ``filenames_encryption_mode`` must + be set to constants from ``<linux/fs.h>`` which identify the + encryption modes to use. If unsure, use + FS_ENCRYPTION_MODE_AES_256_XTS (1) for ``contents_encryption_mode`` + and FS_ENCRYPTION_MODE_AES_256_CTS (4) for + ``filenames_encryption_mode``. + +- ``flags`` must be set to a value from ``<linux/fs.h>`` which + identifies the amount of NUL-padding to use when encrypting + filenames. If unsure, use FS_POLICY_FLAGS_PAD_32 (0x3). + +- ``master_key_descriptor`` specifies how to find the master key in + the keyring; see `Adding keys`_. It is up to userspace to choose a + unique ``master_key_descriptor`` for each master key. The e4crypt + and fscrypt tools use the first 8 bytes of + ``SHA-512(SHA-512(master_key))``, but this particular scheme is not + required. Also, the master key need not be in the keyring yet when + FS_IOC_SET_ENCRYPTION_POLICY is executed. However, it must be added + before any files can be created in the encrypted directory. + +If the file is not yet encrypted, then FS_IOC_SET_ENCRYPTION_POLICY +verifies that the file is an empty directory. If so, the specified +encryption policy is assigned to the directory, turning it into an +encrypted directory. After that, and after providing the +corresponding master key as described in `Adding keys`_, all regular +files, directories (recursively), and symlinks created in the +directory will be encrypted, inheriting the same encryption policy. +The filenames in the directory's entries will be encrypted as well. + +Alternatively, if the file is already encrypted, then +FS_IOC_SET_ENCRYPTION_POLICY validates that the specified encryption +policy exactly matches the actual one. If they match, then the ioctl +returns 0. Otherwise, it fails with EEXIST. This works on both +regular files and directories, including nonempty directories. + +Note that the ext4 filesystem does not allow the root directory to be +encrypted, even if it is empty. Users who want to encrypt an entire +filesystem with one key should consider using dm-crypt instead. + +FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors: + +- ``EACCES``: the file is not owned by the process's uid, nor does the + process have the CAP_FOWNER capability in a namespace with the file + owner's uid mapped +- ``EEXIST``: the file is already encrypted with an encryption policy + different from the one specified +- ``EINVAL``: an invalid encryption policy was specified (invalid + version, mode(s), or flags) +- ``ENOTDIR``: the file is unencrypted and is a regular file, not a + directory +- ``ENOTEMPTY``: the file is unencrypted and is a nonempty directory +- ``ENOTTY``: this type of filesystem does not implement encryption +- ``EOPNOTSUPP``: the kernel was not configured with encryption + support for this filesystem, or the filesystem superblock has not + had encryption enabled on it. (For example, to use encryption on an + ext4 filesystem, CONFIG_EXT4_ENCRYPTION must be enabled in the + kernel config, and the superblock must have had the "encrypt" + feature flag enabled using ``tune2fs -O encrypt`` or ``mkfs.ext4 -O + encrypt``.) +- ``EPERM``: this directory may not be encrypted, e.g. because it is + the root directory of an ext4 filesystem +- ``EROFS``: the filesystem is readonly + +Getting an encryption policy +---------------------------- + +The FS_IOC_GET_ENCRYPTION_POLICY ioctl retrieves the :c:type:`struct +fscrypt_policy`, if any, for a directory or regular file. See above +for the struct definition. No additional permissions are required +beyond the ability to open the file. + +FS_IOC_GET_ENCRYPTION_POLICY can fail with the following errors: + +- ``EINVAL``: the file is encrypted, but it uses an unrecognized + encryption context format +- ``ENODATA``: the file is not encrypted +- ``ENOTTY``: this type of filesystem does not implement encryption +- ``EOPNOTSUPP``: the kernel was not configured with encryption + support for this filesystem + +Note: if you only need to know whether a file is encrypted or not, on +most filesystems it is also possible to use the FS_IOC_GETFLAGS ioctl +and check for FS_ENCRYPT_FL, or to use the statx() system call and +check for STATX_ATTR_ENCRYPTED in stx_attributes. + +Getting the per-filesystem salt +------------------------------- + +Some filesystems, such as ext4 and F2FS, also support the deprecated +ioctl FS_IOC_GET_ENCRYPTION_PWSALT. This ioctl retrieves a randomly +generated 16-byte value stored in the filesystem superblock. This +value is intended to used as a salt when deriving an encryption key +from a passphrase or other low-entropy user credential. + +FS_IOC_GET_ENCRYPTION_PWSALT is deprecated. Instead, prefer to +generate and manage any needed salt(s) in userspace. + +Adding keys +----------- + +To provide a master key, userspace must add it to an appropriate +keyring using the add_key() system call (see: +``Documentation/security/keys/core.rst``). The key type must be +"logon"; keys of this type are kept in kernel memory and cannot be +read back by userspace. The key description must be "fscrypt:" +followed by the 16-character lower case hex representation of the +``master_key_descriptor`` that was set in the encryption policy. The +key payload must conform to the following structure:: + + #define FS_MAX_KEY_SIZE 64 + + struct fscrypt_key { + u32 mode; + u8 raw[FS_MAX_KEY_SIZE]; + u32 size; + }; + +``mode`` is ignored; just set it to 0. The actual key is provided in +``raw`` with ``size`` indicating its size in bytes. That is, the +bytes ``raw[0..size-1]`` (inclusive) are the actual key. + +The key description prefix "fscrypt:" may alternatively be replaced +with a filesystem-specific prefix such as "ext4:". However, the +filesystem-specific prefixes are deprecated and should not be used in +new programs. + +There are several different types of keyrings in which encryption keys +may be placed, such as a session keyring, a user session keyring, or a +user keyring. Each key must be placed in a keyring that is "attached" +to all processes that might need to access files encrypted with it, in +the sense that request_key() will find the key. Generally, if only +processes belonging to a specific user need to access a given +encrypted directory and no session keyring has been installed, then +that directory's key should be placed in that user's user session +keyring or user keyring. Otherwise, a session keyring should be +installed if needed, and the key should be linked into that session +keyring, or in a keyring linked into that session keyring. + +Note: introducing the complex visibility semantics of keyrings here +was arguably a mistake --- especially given that by design, after any +process successfully opens an encrypted file (thereby setting up the +per-file key), possessing the keyring key is not actually required for +any process to read/write the file until its in-memory inode is +evicted. In the future there probably should be a way to provide keys +directly to the filesystem instead, which would make the intended +semantics clearer. + +Access semantics +================ + +With the key +------------ + +With the encryption key, encrypted regular files, directories, and +symlinks behave very similarly to their unencrypted counterparts --- +after all, the encryption is intended to be transparent. However, +astute users may notice some differences in behavior: + +- Unencrypted files, or files encrypted with a different encryption + policy (i.e. different key, modes, or flags), cannot be renamed or + linked into an encrypted directory; see `Encryption policy + enforcement`_. Attempts to do so will fail with EPERM. However, + encrypted files can be renamed within an encrypted directory, or + into an unencrypted directory. + +- Direct I/O is not supported on encrypted files. Attempts to use + direct I/O on such files will fall back to buffered I/O. + +- The fallocate operations FALLOC_FL_COLLAPSE_RANGE, + FALLOC_FL_INSERT_RANGE, and FALLOC_FL_ZERO_RANGE are not supported + on encrypted files and will fail with EOPNOTSUPP. + +- Online defragmentation of encrypted files is not supported. The + EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with + EOPNOTSUPP. + +- The ext4 filesystem does not support data journaling with encrypted + regular files. It will fall back to ordered data mode instead. + +- DAX (Direct Access) is not supported on encrypted files. + +- The st_size of an encrypted symlink will not necessarily give the + length of the symlink target as required by POSIX. It will actually + give the length of the ciphertext, which will be slightly longer + than the plaintext due to NUL-padding and an extra 2-byte overhead. + +- The maximum length of an encrypted symlink is 2 bytes shorter than + the maximum length of an unencrypted symlink. For example, on an + EXT4 filesystem with a 4K block size, unencrypted symlinks can be up + to 4095 bytes long, while encrypted symlinks can only be up to 4093 + bytes long (both lengths excluding the terminating null). + +Note that mmap *is* supported. This is possible because the pagecache +for an encrypted file contains the plaintext, not the ciphertext. + +Without the key +--------------- + +Some filesystem operations may be performed on encrypted regular +files, directories, and symlinks even before their encryption key has +been provided: + +- File metadata may be read, e.g. using stat(). + +- Directories may be listed, in which case the filenames will be + listed in an encoded form derived from their ciphertext. The + current encoding algorithm is described in `Filename hashing and + encoding`_. The algorithm is subject to change, but it is + guaranteed that the presented filenames will be no longer than + NAME_MAX bytes, will not contain the ``/`` or ``\0`` characters, and + will uniquely identify directory entries. + + The ``.`` and ``..`` directory entries are special. They are always + present and are not encrypted or encoded. + +- Files may be deleted. That is, nondirectory files may be deleted + with unlink() as usual, and empty directories may be deleted with + rmdir() as usual. Therefore, ``rm`` and ``rm -r`` will work as + expected. + +- Symlink targets may be read and followed, but they will be presented + in encrypted form, similar to filenames in directories. Hence, they + are unlikely to point to anywhere useful. + +Without the key, regular files cannot be opened or truncated. +Attempts to do so will fail with ENOKEY. This implies that any +regular file operations that require a file descriptor, such as +read(), write(), mmap(), fallocate(), and ioctl(), are also forbidden. + +Also without the key, files of any type (including directories) cannot +be created or linked into an encrypted directory, nor can a name in an +encrypted directory be the source or target of a rename, nor can an +O_TMPFILE temporary file be created in an encrypted directory. All +such operations will fail with ENOKEY. + +It is not currently possible to backup and restore encrypted files +without the encryption key. This would require special APIs which +have not yet been implemented. + +Encryption policy enforcement +============================= + +After an encryption policy has been set on a directory, all regular +files, directories, and symbolic links created in that directory +(recursively) will inherit that encryption policy. Special files --- +that is, named pipes, device nodes, and UNIX domain sockets --- will +not be encrypted. + +Except for those special files, it is forbidden to have unencrypted +files, or files encrypted with a different encryption policy, in an +encrypted directory tree. Attempts to link or rename such a file into +an encrypted directory will fail with EPERM. This is also enforced +during ->lookup() to provide limited protection against offline +attacks that try to disable or downgrade encryption in known locations +where applications may later write sensitive data. It is recommended +that systems implementing a form of "verified boot" take advantage of +this by validating all top-level encryption policies prior to access. + +Implementation details +====================== + +Encryption context +------------------ + +An encryption policy is represented on-disk by a :c:type:`struct +fscrypt_context`. It is up to individual filesystems to decide where +to store it, but normally it would be stored in a hidden extended +attribute. It should *not* be exposed by the xattr-related system +calls such as getxattr() and setxattr() because of the special +semantics of the encryption xattr. (In particular, there would be +much confusion if an encryption policy were to be added to or removed +from anything other than an empty directory.) The struct is defined +as follows:: + + #define FS_KEY_DESCRIPTOR_SIZE 8 + #define FS_KEY_DERIVATION_NONCE_SIZE 16 + + struct fscrypt_context { + u8 format; + u8 contents_encryption_mode; + u8 filenames_encryption_mode; + u8 flags; + u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; + u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE]; + }; + +Note that :c:type:`struct fscrypt_context` contains the same +information as :c:type:`struct fscrypt_policy` (see `Setting an +encryption policy`_), except that :c:type:`struct fscrypt_context` +also contains a nonce. The nonce is randomly generated by the kernel +and is used to derive the inode's encryption key as described in +`Per-file keys`_. + +Data path changes +----------------- + +For the read path (->readpage()) of regular files, filesystems can +read the ciphertext into the page cache and decrypt it in-place. The +page lock must be held until decryption has finished, to prevent the +page from becoming visible to userspace prematurely. + +For the write path (->writepage()) of regular files, filesystems +cannot encrypt data in-place in the page cache, since the cached +plaintext must be preserved. Instead, filesystems must encrypt into a +temporary buffer or "bounce page", then write out the temporary +buffer. Some filesystems, such as UBIFS, already use temporary +buffers regardless of encryption. Other filesystems, such as ext4 and +F2FS, have to allocate bounce pages specially for encryption. + +Filename hashing and encoding +----------------------------- + +Modern filesystems accelerate directory lookups by using indexed +directories. An indexed directory is organized as a tree keyed by +filename hashes. When a ->lookup() is requested, the filesystem +normally hashes the filename being looked up so that it can quickly +find the corresponding directory entry, if any. + +With encryption, lookups must be supported and efficient both with and +without the encryption key. Clearly, it would not work to hash the +plaintext filenames, since the plaintext filenames are unavailable +without the key. (Hashing the plaintext filenames would also make it +impossible for the filesystem's fsck tool to optimize encrypted +directories.) Instead, filesystems hash the ciphertext filenames, +i.e. the bytes actually stored on-disk in the directory entries. When +asked to do a ->lookup() with the key, the filesystem just encrypts +the user-supplied name to get the ciphertext. + +Lookups without the key are more complicated. The raw ciphertext may +contain the ``\0`` and ``/`` characters, which are illegal in +filenames. Therefore, readdir() must base64-encode the ciphertext for +presentation. For most filenames, this works fine; on ->lookup(), the +filesystem just base64-decodes the user-supplied name to get back to +the raw ciphertext. + +However, for very long filenames, base64 encoding would cause the +filename length to exceed NAME_MAX. To prevent this, readdir() +actually presents long filenames in an abbreviated form which encodes +a strong "hash" of the ciphertext filename, along with the optional +filesystem-specific hash(es) needed for directory lookups. This +allows the filesystem to still, with a high degree of confidence, map +the filename given in ->lookup() back to a particular directory entry +that was previously listed by readdir(). See :c:type:`struct +fscrypt_digested_name` in the source for more details. + +Note that the precise way that filenames are presented to userspace +without the key is subject to change in the future. It is only meant +as a way to temporarily present valid filenames so that commands like +``rm -r`` work as expected on encrypted directories. diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 0ea9ef13f758..40686dc931b9 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -2530,6 +2530,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted. allow data leaks with this option, which is equivalent to spectre_v2=off. + nospec_store_bypass_disable + [HW] Disable all mitigations for the Speculative Store Bypass vulnerability + noxsave [BUGS=X86] Disables x86 extended register state save and restore using xsave. The kernel will fallback to enabling legacy floating-point and sse state. @@ -3702,6 +3705,48 @@ bytes respectively. Such letter suffixes can also be entirely omitted. Not specifying this option is equivalent to spectre_v2=auto. + spec_store_bypass_disable= + [HW] Control Speculative Store Bypass (SSB) Disable mitigation + (Speculative Store Bypass vulnerability) + + Certain CPUs are vulnerable to an exploit against a + a common industry wide performance optimization known + as "Speculative Store Bypass" in which recent stores + to the same memory location may not be observed by + later loads during speculative execution. The idea + is that such stores are unlikely and that they can + be detected prior to instruction retirement at the + end of a particular speculation execution window. + + In vulnerable processors, the speculatively forwarded + store can be used in a cache side channel attack, for + example to read memory to which the attacker does not + directly have access (e.g. inside sandboxed code). + + This parameter controls whether the Speculative Store + Bypass optimization is used. + + on - Unconditionally disable Speculative Store Bypass + off - Unconditionally enable Speculative Store Bypass + auto - Kernel detects whether the CPU model contains an + implementation of Speculative Store Bypass and + picks the most appropriate mitigation. If the + CPU is not vulnerable, "off" is selected. If the + CPU is vulnerable the default mitigation is + architecture and Kconfig dependent. See below. + prctl - Control Speculative Store Bypass per thread + via prctl. Speculative Store Bypass is enabled + for a process by default. The state of the control + is inherited on fork. + seccomp - Same as "prctl" above, but all seccomp threads + will disable SSB unless they explicitly opt out. + + Not specifying this option is equivalent to + spec_store_bypass_disable=auto. + + Default mitigations: + X86: If CONFIG_SECCOMP=y "seccomp", otherwise "prctl" + spia_io_base= [HW,MTD] spia_fio_base= spia_pedr= diff --git a/Documentation/spec_ctrl.txt b/Documentation/spec_ctrl.txt new file mode 100644 index 000000000000..32f3d55c54b7 --- /dev/null +++ b/Documentation/spec_ctrl.txt @@ -0,0 +1,94 @@ +=================== +Speculation Control +=================== + +Quite some CPUs have speculation-related misfeatures which are in +fact vulnerabilities causing data leaks in various forms even across +privilege domains. + +The kernel provides mitigation for such vulnerabilities in various +forms. Some of these mitigations are compile-time configurable and some +can be supplied on the kernel command line. + +There is also a class of mitigations which are very expensive, but they can +be restricted to a certain set of processes or tasks in controlled +environments. The mechanism to control these mitigations is via +:manpage:`prctl(2)`. + +There are two prctl options which are related to this: + + * PR_GET_SPECULATION_CTRL + + * PR_SET_SPECULATION_CTRL + +PR_GET_SPECULATION_CTRL +----------------------- + +PR_GET_SPECULATION_CTRL returns the state of the speculation misfeature +which is selected with arg2 of prctl(2). The return value uses bits 0-3 with +the following meaning: + +==== ===================== =================================================== +Bit Define Description +==== ===================== =================================================== +0 PR_SPEC_PRCTL Mitigation can be controlled per task by + PR_SET_SPECULATION_CTRL. +1 PR_SPEC_ENABLE The speculation feature is enabled, mitigation is + disabled. +2 PR_SPEC_DISABLE The speculation feature is disabled, mitigation is + enabled. +3 PR_SPEC_FORCE_DISABLE Same as PR_SPEC_DISABLE, but cannot be undone. A + subsequent prctl(..., PR_SPEC_ENABLE) will fail. +==== ===================== =================================================== + +If all bits are 0 the CPU is not affected by the speculation misfeature. + +If PR_SPEC_PRCTL is set, then the per-task control of the mitigation is +available. If not set, prctl(PR_SET_SPECULATION_CTRL) for the speculation +misfeature will fail. + +PR_SET_SPECULATION_CTRL +----------------------- + +PR_SET_SPECULATION_CTRL allows to control the speculation misfeature, which +is selected by arg2 of :manpage:`prctl(2)` per task. arg3 is used to hand +in the control value, i.e. either PR_SPEC_ENABLE or PR_SPEC_DISABLE or +PR_SPEC_FORCE_DISABLE. + +Common error codes +------------------ +======= ================================================================= +Value Meaning +======= ================================================================= +EINVAL The prctl is not implemented by the architecture or unused + prctl(2) arguments are not 0. + +ENODEV arg2 is selecting a not supported speculation misfeature. +======= ================================================================= + +PR_SET_SPECULATION_CTRL error codes +----------------------------------- +======= ================================================================= +Value Meaning +======= ================================================================= +0 Success + +ERANGE arg3 is incorrect, i.e. it's neither PR_SPEC_ENABLE nor + PR_SPEC_DISABLE nor PR_SPEC_FORCE_DISABLE. + +ENXIO Control of the selected speculation misfeature is not possible. + See PR_GET_SPECULATION_CTRL. + +EPERM Speculation was disabled with PR_SPEC_FORCE_DISABLE and caller + tried to enable it again. +======= ================================================================= + +Speculation misfeature controls +------------------------------- +- PR_SPEC_STORE_BYPASS: Speculative Store Bypass + + Invocations: + * prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, 0, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_ENABLE, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_FORCE_DISABLE, 0, 0); @@ -1,6 +1,6 @@ VERSION = 4 PATCHLEVEL = 4 -SUBLEVEL = 141 +SUBLEVEL = 146 EXTRAVERSION = NAME = Blurry Fish Butt @@ -631,6 +631,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning,frame-address,) KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation) KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow) KBUILD_CFLAGS += $(call cc-disable-warning, int-in-bool-context) +KBUILD_CFLAGS += $(call cc-disable-warning, attribute-alias) ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE KBUILD_CFLAGS += $(call cc-option,-Oz,-Os) diff --git a/arch/arc/include/asm/page.h b/arch/arc/include/asm/page.h index 429957f1c236..8f1145ed0046 100644 --- a/arch/arc/include/asm/page.h +++ b/arch/arc/include/asm/page.h @@ -102,7 +102,7 @@ typedef pte_t * pgtable_t; #define virt_addr_valid(kaddr) pfn_valid(__pa(kaddr) >> PAGE_SHIFT) /* Default Permissions for stack/heaps pages (Non Executable) */ -#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE) +#define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC) #define WANT_PAGE_VIRTUAL 1 diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h index e5fec320f158..c07d7b0a4058 100644 --- a/arch/arc/include/asm/pgtable.h +++ b/arch/arc/include/asm/pgtable.h @@ -372,7 +372,7 @@ void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, /* Decode a PTE containing swap "identifier "into constituents */ #define __swp_type(pte_lookalike) (((pte_lookalike).val) & 0x1f) -#define __swp_offset(pte_lookalike) ((pte_lookalike).val << 13) +#define __swp_offset(pte_lookalike) ((pte_lookalike).val >> 13) /* NOPs, to keep generic kernel happy */ #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) }) diff --git a/arch/arm/boot/dts/qcom/sdm636.dtsi b/arch/arm/boot/dts/qcom/sdm636.dtsi index cf4f5572160d..49b01c3ad073 100644 --- a/arch/arm/boot/dts/qcom/sdm636.dtsi +++ b/arch/arm/boot/dts/qcom/sdm636.dtsi @@ -31,6 +31,7 @@ &soc { /delete-node/ qcom,turing@1a300000; + /delete-node/ qcom,msm-cdsp-loader; /delete-node/ cti@7068000; /delete-node/ turing_etm0; funnel@6042000 { diff --git a/arch/arm/include/asm/uaccess.h b/arch/arm/include/asm/uaccess.h index 7fb59199c6bb..7665bd2f4871 100644 --- a/arch/arm/include/asm/uaccess.h +++ b/arch/arm/include/asm/uaccess.h @@ -251,7 +251,7 @@ extern int __put_user_8(void *, unsigned long long); ({ \ unsigned long __limit = current_thread_info()->addr_limit - 1; \ const typeof(*(p)) __user *__tmp_p = (p); \ - register const typeof(*(p)) __r2 asm("r2") = (x); \ + register typeof(*(p)) __r2 asm("r2") = (x); \ register const typeof(*(p)) __user *__p asm("r0") = __tmp_p; \ register unsigned long __l asm("r1") = __limit; \ register int __e asm("r0"); \ diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index d49e164867e1..213c78f84e56 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -36,7 +36,6 @@ generic-y += poll.h generic-y += preempt.h generic-y += resource.h generic-y += rwsem.h -generic-y += sections.h generic-y += segment.h generic-y += sembuf.h generic-y += serial.h diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b190367fe6be..13d6b496de92 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -232,6 +232,16 @@ static inline pte_t pte_mknoncont(pte_t pte) return clear_pte_bit(pte, __pgprot(PTE_CONT)); } +static inline pte_t pte_clear_rdonly(pte_t pte) +{ + return clear_pte_bit(pte, __pgprot(PTE_RDONLY)); +} + +static inline pte_t pte_mkpresent(pte_t pte) +{ + return set_pte_bit(pte, __pgprot(PTE_VALID)); +} + static inline pmd_t pmd_mkcont(pmd_t pmd) { return __pmd(pmd_val(pmd) | PMD_SECT_CONT); diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h new file mode 100644 index 000000000000..b865e83e57f5 --- /dev/null +++ b/arch/arm64/include/asm/sections.h @@ -0,0 +1,28 @@ +/* + * Copyright (C) 2016 ARM Limited + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef __ASM_SECTIONS_H +#define __ASM_SECTIONS_H + +#include <asm-generic/sections.h> + +extern char __alt_instructions[], __alt_instructions_end[]; +extern char __exception_text_start[], __exception_text_end[]; +extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[]; +extern char __idmap_text_start[], __idmap_text_end[]; +extern char __irqentry_text_start[], __irqentry_text_end[]; +extern char __mmuoff_data_start[], __mmuoff_data_end[]; + +#endif /* __ASM_SECTIONS_H */ diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h index 1d60ab1b3b11..7fe6a2e1c93f 100644 --- a/arch/arm64/include/asm/traps.h +++ b/arch/arm64/include/asm/traps.h @@ -19,6 +19,7 @@ #define __ASM_TRAP_H #include <linux/list.h> +#include <asm/sections.h> struct pt_regs; @@ -36,17 +37,12 @@ void unregister_undef_hook(struct undef_hook *hook); static inline int __in_irqentry_text(unsigned long ptr) { - extern char __irqentry_text_start[]; - extern char __irqentry_text_end[]; - return ptr >= (unsigned long)&__irqentry_text_start && ptr < (unsigned long)&__irqentry_text_end; } static inline int in_exception_text(unsigned long ptr) { - extern char __exception_text_start[]; - extern char __exception_text_end[]; int in; in = ptr >= (unsigned long)&__exception_text_start && diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h index 06e6a5238c4c..46e0bbddee94 100644 --- a/arch/arm64/include/asm/virt.h +++ b/arch/arm64/include/asm/virt.h @@ -40,6 +40,7 @@ #ifndef __ASSEMBLY__ #include <asm/ptrace.h> +#include <asm/sections.h> /* * __boot_cpu_mode records what mode CPUs were booted in. diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index e6cb1dc63a2a..737481c8e918 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -25,14 +25,13 @@ #include <asm/alternative.h> #include <asm/cpufeature.h> #include <asm/insn.h> +#include <asm/sections.h> #include <linux/stop_machine.h> #define __ALT_PTR(a,f) (u32 *)((void *)&(a)->f + (a)->f) #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) -extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; - struct alt_region { struct alt_instr *begin; struct alt_instr *end; @@ -124,8 +123,8 @@ static int __apply_alternatives_multi_stop(void *unused) { static int patched = 0; struct alt_region region = { - .begin = __alt_instructions, - .end = __alt_instructions_end, + .begin = (struct alt_instr *)__alt_instructions, + .end = (struct alt_instr *)__alt_instructions_end, }; /* We always have a CPU 0 at this point (__init) */ diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index b8121a51404f..d5d4afbf0518 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -472,7 +472,7 @@ ENDPROC(__primary_switched) * end early head section, begin head code that is also used for * hotplug and needs to have the same protections as the text region */ - .section ".text","ax" + .section ".idmap.text","ax" ENTRY(kimage_vaddr) .quad _text - TEXT_OFFSET @@ -595,17 +595,29 @@ set_cpu_boot_mode_flag: ENDPROC(set_cpu_boot_mode_flag) /* + * These values are written with the MMU off, but read with the MMU on. + * Writers will invalidate the corresponding address, discarding up to a + * 'Cache Writeback Granule' (CWG) worth of data. The linker script ensures + * sufficient alignment that the CWG doesn't overlap another section. + */ + .pushsection ".mmuoff.data.write", "aw" +/* * We need to find out the CPU boot mode long after boot, so we need to * store it in a writable variable. * * This is not in .bss, because we set it sufficiently early that the boot-time * zeroing of .bss would clobber it. */ - .pushsection .data..cacheline_aligned - .align L1_CACHE_SHIFT ENTRY(__boot_cpu_mode) .long BOOT_CPU_MODE_EL2 .long BOOT_CPU_MODE_EL1 +/* + * The booting CPU updates the failed status @__early_cpu_boot_status, + * with MMU turned off. + */ +ENTRY(__early_cpu_boot_status) + .long 0 + .popsection /* @@ -679,7 +691,6 @@ ENDPROC(__secondary_switched) * Checks if the selected granule size is supported by the CPU. * If it isn't, park the CPU */ - .section ".idmap.text", "ax" ENTRY(__enable_mmu) mrs x22, sctlr_el1 // preserve old SCTLR_EL1 value mrs x1, ID_AA64MMFR0_EL1 diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index 6dd18140ebb8..8761eb95ed27 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -53,12 +53,6 @@ extern int in_suspend; /* Do we need to reset el2? */ #define el2_reset_needed() (is_hyp_mode_available() && !is_kernel_in_hyp_mode()) -/* - * Start/end of the hibernate exit code, this must be copied to a 'safe' - * location in memory, and executed from there. - */ -extern char __hibernate_exit_text_start[], __hibernate_exit_text_end[]; - /* temporary el2 vectors in the __hibernate_exit_text section. */ extern char hibernate_el2_vectors[]; @@ -240,6 +234,7 @@ out: return rc; } +#define dcache_clean_range(start, end) __flush_dcache_area(start, (end - start)) int swsusp_arch_suspend(void) { @@ -252,8 +247,9 @@ int swsusp_arch_suspend(void) if (__cpu_suspend_enter(&state)) { ret = swsusp_save(); } else { - /* Clean kernel to PoC for secondary core startup */ - __flush_dcache_area(LMADDR(KERNEL_START), KERNEL_END - KERNEL_START); + /* Clean kernel core startup/idle code to PoC*/ + dcache_clean_range(__mmuoff_data_start, __mmuoff_data_end); + dcache_clean_range(__idmap_text_start, __idmap_text_end); /* * Tell the hibernation core that we've just restored @@ -269,6 +265,33 @@ int swsusp_arch_suspend(void) return ret; } +static void _copy_pte(pte_t *dst_pte, pte_t *src_pte, unsigned long addr) +{ + pte_t pte = *src_pte; + + if (pte_valid(pte)) { + /* + * Resume will overwrite areas that may be marked + * read only (code, rodata). Clear the RDONLY bit from + * the temporary mappings we use during restore. + */ + set_pte(dst_pte, pte_clear_rdonly(pte)); + } else if (debug_pagealloc_enabled() && !pte_none(pte)) { + /* + * debug_pagealloc will removed the PTE_VALID bit if + * the page isn't in use by the resume kernel. It may have + * been in use by the original kernel, in which case we need + * to put it back in our copy to do the restore. + * + * Before marking this entry valid, check the pfn should + * be mapped. + */ + BUG_ON(!pfn_valid(pte_pfn(pte))); + + set_pte(dst_pte, pte_mkpresent(pte_clear_rdonly(pte))); + } +} + static int copy_pte(pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long start, unsigned long end) { @@ -284,13 +307,7 @@ static int copy_pte(pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long start, src_pte = pte_offset_kernel(src_pmd, start); do { - if (!pte_none(*src_pte)) - /* - * Resume will overwrite areas that may be marked - * read only (code, rodata). Clear the RDONLY bit from - * the temporary mappings we use during restore. - */ - set_pte(dst_pte, __pte(pte_val(*src_pte) & ~PTE_RDONLY)); + _copy_pte(dst_pte, src_pte, addr); } while (dst_pte++, src_pte++, addr += PAGE_SIZE, addr != end); return 0; diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index 1ee93c7c5a75..4ea8433011d0 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -31,7 +31,7 @@ #include <asm/insn.h> #include <asm/uaccess.h> #include <asm/irq.h> -#include <asm-generic/sections.h> +#include <asm/sections.h> #include "decode-insn.h" diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S index a1ccef018538..100f92f13113 100644 --- a/arch/arm64/kernel/sleep.S +++ b/arch/arm64/kernel/sleep.S @@ -97,16 +97,24 @@ ENTRY(__cpu_suspend_enter) ENDPROC(__cpu_suspend_enter) .ltorg + .pushsection ".idmap.text", "ax" ENTRY(cpu_resume) bl el2_setup // if in EL2 drop to EL1 cleanly /* enable the MMU early - so we can access sleep_save_stash by va */ adr_l lr, __enable_mmu /* __cpu_setup will return here */ - ldr x27, =_cpu_resume /* __enable_mmu will branch here */ + adr_l x27, _resume_switched /* __enable_mmu will branch here */ adrp x25, idmap_pg_dir adrp x26, swapper_pg_dir b __cpu_setup ENDPROC(cpu_resume) +_resume_switched: + ldr x8, =_cpu_resume + br x8 +ENDPROC(_resume_switched) + .ltorg + .popsection + ENTRY(_cpu_resume) mrs x1, mpidr_el1 adrp x8, mpidr_hash diff --git a/arch/arm64/kernel/smp_spin_table.c b/arch/arm64/kernel/smp_spin_table.c index 2ccb883353d9..303d571702ea 100644 --- a/arch/arm64/kernel/smp_spin_table.c +++ b/arch/arm64/kernel/smp_spin_table.c @@ -30,7 +30,8 @@ #include <asm/smp_plat.h> extern void secondary_holding_pen(void); -volatile unsigned long secondary_holding_pen_release = INVALID_HWID; +volatile unsigned long __section(".mmuoff.data.read") +secondary_holding_pen_release = INVALID_HWID; static phys_addr_t cpu_release_addr[NR_CPUS]; diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index 42a6c4f35af2..71c8076bbc60 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -197,6 +197,25 @@ SECTIONS _data = .; _sdata = .; RW_DATA_SECTION(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE) + + /* + * Data written with the MMU off but read with the MMU on requires + * cache lines to be invalidated, discarding up to a Cache Writeback + * Granule (CWG) of data from the cache. Keep the section that + * requires this type of maintenance to be in its own Cache Writeback + * Granule (CWG) area so the cache maintenance operations don't + * interfere with adjacent data. + */ + .mmuoff.data.write : ALIGN(SZ_2K) { + __mmuoff_data_start = .; + *(.mmuoff.data.write) + } + . = ALIGN(SZ_2K); + .mmuoff.data.read : { + *(.mmuoff.data.read) + __mmuoff_data_end = .; + } + PECOFF_EDATA_PADDING _edata = .; diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c index 4754762bde49..6ea71387ee12 100644 --- a/arch/arm64/mm/pageattr.c +++ b/arch/arm64/mm/pageattr.c @@ -146,4 +146,43 @@ void __kernel_map_pages(struct page *page, int numpages, int enable) __pgprot(0), __pgprot(PTE_VALID)); } -#endif +#ifdef CONFIG_HIBERNATION +/* + * When built with CONFIG_DEBUG_PAGEALLOC and CONFIG_HIBERNATION, this function + * is used to determine if a linear map page has been marked as not-valid by + * CONFIG_DEBUG_PAGEALLOC. Walk the page table and check the PTE_VALID bit. + * This is based on kern_addr_valid(), which almost does what we need. + * + * Because this is only called on the kernel linear map, p?d_sect() implies + * p?d_present(). When debug_pagealloc is enabled, sections mappings are + * disabled. + */ +bool kernel_page_present(struct page *page) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned long addr = (unsigned long)page_address(page); + + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + return false; + + pud = pud_offset(pgd, addr); + if (pud_none(*pud)) + return false; + if (pud_sect(*pud)) + return true; + + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + return false; + if (pmd_sect(*pmd)) + return true; + + pte = pte_offset_kernel(pmd, addr); + return pte_valid(*pte); +} +#endif /* CONFIG_HIBERNATION */ +#endif /* CONFIG_DEBUG_PAGEALLOC */ diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 7a7cf5c2215d..b78688806652 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -132,6 +132,7 @@ ENDPROC(cpu_do_suspend) * * x0: Address of context pointer */ + .pushsection ".idmap.text", "ax" ENTRY(cpu_do_resume) ldp x2, x3, [x0] ldp x4, x5, [x0, #16] @@ -163,6 +164,7 @@ ENTRY(cpu_do_resume) isb ret ENDPROC(cpu_do_resume) + .popsection #endif /* @@ -220,6 +222,7 @@ ENDPROC(idmap_cpu_replace_ttbr1) * Initialise the processor for turning the MMU on. Return in x0 the * value of the SCTLR_EL1 register. */ + .pushsection ".idmap.text", "ax" ENTRY(__cpu_setup) tlbi vmalle1 // Invalidate local TLB dsb nsh @@ -321,3 +324,4 @@ crval: #endif .word 0xfcffffff // clear .word 0x34d5d91d | CR_IBIT | CR_CBIT // set + .popsection diff --git a/arch/microblaze/boot/Makefile b/arch/microblaze/boot/Makefile index 91d2068da1b9..0f3fe6a151dc 100644 --- a/arch/microblaze/boot/Makefile +++ b/arch/microblaze/boot/Makefile @@ -21,17 +21,19 @@ $(obj)/linux.bin.gz: $(obj)/linux.bin FORCE quiet_cmd_cp = CP $< $@$2 cmd_cp = cat $< >$@$2 || (rm -f $@ && echo false) -quiet_cmd_strip = STRIP $@ +quiet_cmd_strip = STRIP $< $@$2 cmd_strip = $(STRIP) -K microblaze_start -K _end -K __log_buf \ - -K _fdt_start vmlinux -o $@ + -K _fdt_start $< -o $@$2 UIMAGE_LOADADDR = $(CONFIG_KERNEL_BASE_ADDR) +UIMAGE_IN = $@ +UIMAGE_OUT = $@.ub $(obj)/simpleImage.%: vmlinux FORCE $(call if_changed,cp,.unstrip) $(call if_changed,objcopy) $(call if_changed,uimage) - $(call if_changed,strip) - @echo 'Kernel: $@ is ready' ' (#'`cat .version`')' + $(call if_changed,strip,.strip) + @echo 'Kernel: $(UIMAGE_OUT) is ready' ' (#'`cat .version`')' clean-files += simpleImage.*.unstrip linux.bin.ub dts/*.dtb diff --git a/arch/mips/ath79/common.c b/arch/mips/ath79/common.c index 8ae4067a5eda..40ecb6e700cd 100644 --- a/arch/mips/ath79/common.c +++ b/arch/mips/ath79/common.c @@ -58,7 +58,7 @@ EXPORT_SYMBOL_GPL(ath79_ddr_ctrl_init); void ath79_ddr_wb_flush(u32 reg) { - void __iomem *flush_reg = ath79_ddr_wb_flush_base + reg; + void __iomem *flush_reg = ath79_ddr_wb_flush_base + (reg * 4); /* Flush the DDR write buffer. */ __raw_writel(0x1, flush_reg); diff --git a/arch/mips/include/asm/pci.h b/arch/mips/include/asm/pci.h index 98c31e5d9579..a7bc901819c8 100644 --- a/arch/mips/include/asm/pci.h +++ b/arch/mips/include/asm/pci.h @@ -89,7 +89,7 @@ static inline void pci_resource_to_user(const struct pci_dev *dev, int bar, phys_addr_t size = resource_size(rsrc); *start = fixup_bigphys_addr(rsrc->start, size); - *end = rsrc->start + size; + *end = rsrc->start + size - 1; } /* diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c index 054a22c0873b..9684a0d22d97 100644 --- a/arch/mips/kernel/process.c +++ b/arch/mips/kernel/process.c @@ -633,21 +633,48 @@ unsigned long arch_align_stack(unsigned long sp) return sp & ALMASK; } +static DEFINE_PER_CPU(struct call_single_data, backtrace_csd); +static struct cpumask backtrace_csd_busy; + static void arch_dump_stack(void *info) { struct pt_regs *regs; + static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; + arch_spin_lock(&lock); regs = get_irq_regs(); if (regs) show_regs(regs); + else + dump_stack(); + arch_spin_unlock(&lock); - dump_stack(); + cpumask_clear_cpu(smp_processor_id(), &backtrace_csd_busy); } void arch_trigger_all_cpu_backtrace(bool include_self) { - smp_call_function(arch_dump_stack, NULL, 1); + struct call_single_data *csd; + int cpu; + + for_each_cpu(cpu, cpu_online_mask) { + /* + * If we previously sent an IPI to the target CPU & it hasn't + * cleared its bit in the busy cpumask then it didn't handle + * our previous IPI & it's not safe for us to reuse the + * call_single_data_t. + */ + if (cpumask_test_and_set_cpu(cpu, &backtrace_csd_busy)) { + pr_warn("Unable to send backtrace IPI to CPU%u - perhaps it hung?\n", + cpu); + continue; + } + + csd = &per_cpu(backtrace_csd, cpu); + csd->func = arch_dump_stack; + smp_call_function_single_async(cpu, csd); + } } int mips_get_process_fp_mode(struct task_struct *task) diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index e23f4775b10b..da6997486709 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -345,6 +345,7 @@ static void __show_regs(const struct pt_regs *regs) void show_regs(struct pt_regs *regs) { __show_regs((struct pt_regs *)regs); + dump_stack(); } void show_registers(struct pt_regs *regs) diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S index 78c1eba4c04a..01e274e6907b 100644 --- a/arch/powerpc/kernel/head_8xx.S +++ b/arch/powerpc/kernel/head_8xx.S @@ -720,7 +720,7 @@ start_here: tovirt(r6,r6) lis r5, abatron_pteptrs@h ori r5, r5, abatron_pteptrs@l - stw r5, 0xf0(r0) /* Must match your Abatron config file */ + stw r5, 0xf0(0) /* Must match your Abatron config file */ tophys(r5,r5) stw r6, 0(r5) diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c index 1f7930037cb7..d9e41b77dd13 100644 --- a/arch/powerpc/kernel/pci_32.c +++ b/arch/powerpc/kernel/pci_32.c @@ -11,6 +11,7 @@ #include <linux/sched.h> #include <linux/errno.h> #include <linux/bootmem.h> +#include <linux/syscalls.h> #include <linux/irq.h> #include <linux/list.h> #include <linux/of.h> diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c index 515730e499fe..309027208f7c 100644 --- a/arch/powerpc/mm/slb.c +++ b/arch/powerpc/mm/slb.c @@ -69,14 +69,14 @@ static inline void slb_shadow_update(unsigned long ea, int ssize, * updating it. No write barriers are needed here, provided * we only update the current CPU's SLB shadow buffer. */ - p->save_area[index].esid = 0; - p->save_area[index].vsid = cpu_to_be64(mk_vsid_data(ea, ssize, flags)); - p->save_area[index].esid = cpu_to_be64(mk_esid_data(ea, ssize, index)); + WRITE_ONCE(p->save_area[index].esid, 0); + WRITE_ONCE(p->save_area[index].vsid, cpu_to_be64(mk_vsid_data(ea, ssize, flags))); + WRITE_ONCE(p->save_area[index].esid, cpu_to_be64(mk_esid_data(ea, ssize, index))); } static inline void slb_shadow_clear(enum slb_index index) { - get_slb_shadow()->save_area[index].esid = 0; + WRITE_ONCE(get_slb_shadow()->save_area[index].esid, 0); } static inline void create_shadowed_slbe(unsigned long ea, int ssize, diff --git a/arch/powerpc/platforms/chrp/time.c b/arch/powerpc/platforms/chrp/time.c index f803f4b8ab6f..8608e358217f 100644 --- a/arch/powerpc/platforms/chrp/time.c +++ b/arch/powerpc/platforms/chrp/time.c @@ -27,6 +27,8 @@ #include <asm/sections.h> #include <asm/time.h> +#include <platforms/chrp/chrp.h> + extern spinlock_t rtc_lock; #define NVRAM_AS0 0x74 @@ -62,7 +64,7 @@ long __init chrp_time_init(void) return 0; } -int chrp_cmos_clock_read(int addr) +static int chrp_cmos_clock_read(int addr) { if (nvram_as1 != 0) outb(addr>>8, nvram_as1); @@ -70,7 +72,7 @@ int chrp_cmos_clock_read(int addr) return (inb(nvram_data)); } -void chrp_cmos_clock_write(unsigned long val, int addr) +static void chrp_cmos_clock_write(unsigned long val, int addr) { if (nvram_as1 != 0) outb(addr>>8, nvram_as1); diff --git a/arch/powerpc/platforms/embedded6xx/hlwd-pic.c b/arch/powerpc/platforms/embedded6xx/hlwd-pic.c index 9b7975706bfc..9485f1024d46 100644 --- a/arch/powerpc/platforms/embedded6xx/hlwd-pic.c +++ b/arch/powerpc/platforms/embedded6xx/hlwd-pic.c @@ -35,6 +35,8 @@ */ #define HW_BROADWAY_ICR 0x00 #define HW_BROADWAY_IMR 0x04 +#define HW_STARLET_ICR 0x08 +#define HW_STARLET_IMR 0x0c /* @@ -74,6 +76,9 @@ static void hlwd_pic_unmask(struct irq_data *d) void __iomem *io_base = irq_data_get_irq_chip_data(d); setbits32(io_base + HW_BROADWAY_IMR, 1 << irq); + + /* Make sure the ARM (aka. Starlet) doesn't handle this interrupt. */ + clrbits32(io_base + HW_STARLET_IMR, 1 << irq); } diff --git a/arch/powerpc/platforms/powermac/bootx_init.c b/arch/powerpc/platforms/powermac/bootx_init.c index 76f5013c35e5..89237b84b096 100644 --- a/arch/powerpc/platforms/powermac/bootx_init.c +++ b/arch/powerpc/platforms/powermac/bootx_init.c @@ -467,7 +467,7 @@ void __init bootx_init(unsigned long r3, unsigned long r4) boot_infos_t *bi = (boot_infos_t *) r4; unsigned long hdr; unsigned long space; - unsigned long ptr, x; + unsigned long ptr; char *model; unsigned long offset = reloc_offset(); @@ -561,6 +561,8 @@ void __init bootx_init(unsigned long r3, unsigned long r4) * MMU switched OFF, so this should not be useful anymore. */ if (bi->version < 4) { + unsigned long x __maybe_unused; + bootx_printf("Touching pages...\n"); /* diff --git a/arch/powerpc/platforms/powermac/setup.c b/arch/powerpc/platforms/powermac/setup.c index 8dd78f4e1af4..32fc56cf6261 100644 --- a/arch/powerpc/platforms/powermac/setup.c +++ b/arch/powerpc/platforms/powermac/setup.c @@ -359,6 +359,7 @@ static int pmac_late_init(void) } machine_late_initcall(powermac, pmac_late_init); +void note_bootable_part(dev_t dev, int part, int goodness); /* * This is __init_refok because we check for "initializing" before * touching any of the __init sensitive things and "initializing" diff --git a/arch/s390/include/asm/cpu_mf.h b/arch/s390/include/asm/cpu_mf.h index 9dd04b9e9782..b2f8c52b3840 100644 --- a/arch/s390/include/asm/cpu_mf.h +++ b/arch/s390/include/asm/cpu_mf.h @@ -113,7 +113,7 @@ struct hws_basic_entry { struct hws_diag_entry { unsigned int def:16; /* 0-15 Data Entry Format */ - unsigned int R:14; /* 16-19 and 20-30 reserved */ + unsigned int R:15; /* 16-19 and 20-30 reserved */ unsigned int I:1; /* 31 entry valid or invalid */ u8 data[]; /* Machine-dependent sample data */ } __packed; @@ -129,7 +129,9 @@ struct hws_trailer_entry { unsigned int f:1; /* 0 - Block Full Indicator */ unsigned int a:1; /* 1 - Alert request control */ unsigned int t:1; /* 2 - Timestamp format */ - unsigned long long:61; /* 3 - 63: Reserved */ + unsigned int :29; /* 3 - 31: Reserved */ + unsigned int bsdes:16; /* 32-47: size of basic SDE */ + unsigned int dsdes:16; /* 48-63: size of diagnostic SDE */ }; unsigned long long flags; /* 0 - 63: All indicators */ }; diff --git a/arch/x86/configs/x86_64_cuttlefish_defconfig b/arch/x86/configs/x86_64_cuttlefish_defconfig index 71026930c04c..df9b6bd228f7 100644 --- a/arch/x86/configs/x86_64_cuttlefish_defconfig +++ b/arch/x86/configs/x86_64_cuttlefish_defconfig @@ -214,13 +214,17 @@ CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_SPI_ATTRS=y CONFIG_SCSI_VIRTIO=y CONFIG_MD=y +CONFIG_BLK_DEV_MD=y +CONFIG_MD_LINEAR=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y CONFIG_DM_MIRROR=y CONFIG_DM_ZERO=y CONFIG_DM_UEVENT=y CONFIG_DM_VERITY=y +CONFIG_DM_VERITY_HASH_PREFETCH_MIN_SIZE=1 CONFIG_DM_VERITY_FEC=y +CONFIG_DM_ANDROID_VERITY=y CONFIG_NETDEVICES=y CONFIG_NETCONSOLE=y CONFIG_NETCONSOLE_DYNAMIC=y @@ -447,3 +451,8 @@ CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1 # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set CONFIG_CRYPTO_ECHAINIV=y CONFIG_CRYPTO_SHA512=y +CONFIG_ASYMMETRIC_KEY_TYPE=y +CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y +CONFIG_X509_CERTIFICATE_PARSER=y +CONFIG_SYSTEM_TRUSTED_KEYRING=y +CONFIG_SYSTEM_TRUSTED_KEYS="verity_dev_keys.x509" diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index d03bf0e28b8b..48c27c3fdfdb 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -79,24 +79,33 @@ ENTRY(entry_SYSENTER_compat) ASM_CLAC /* Clear AC after saving FLAGS */ pushq $__USER32_CS /* pt_regs->cs */ - xorq %r8,%r8 - pushq %r8 /* pt_regs->ip = 0 (placeholder) */ + pushq $0 /* pt_regs->ip = 0 (placeholder) */ pushq %rax /* pt_regs->orig_ax */ pushq %rdi /* pt_regs->di */ pushq %rsi /* pt_regs->si */ pushq %rdx /* pt_regs->dx */ pushq %rcx /* pt_regs->cx */ pushq $-ENOSYS /* pt_regs->ax */ - pushq %r8 /* pt_regs->r8 = 0 */ - pushq %r8 /* pt_regs->r9 = 0 */ - pushq %r8 /* pt_regs->r10 = 0 */ - pushq %r8 /* pt_regs->r11 = 0 */ + pushq $0 /* pt_regs->r8 = 0 */ + xorq %r8, %r8 /* nospec r8 */ + pushq $0 /* pt_regs->r9 = 0 */ + xorq %r9, %r9 /* nospec r9 */ + pushq $0 /* pt_regs->r10 = 0 */ + xorq %r10, %r10 /* nospec r10 */ + pushq $0 /* pt_regs->r11 = 0 */ + xorq %r11, %r11 /* nospec r11 */ pushq %rbx /* pt_regs->rbx */ + xorl %ebx, %ebx /* nospec rbx */ pushq %rbp /* pt_regs->rbp (will be overwritten) */ - pushq %r8 /* pt_regs->r12 = 0 */ - pushq %r8 /* pt_regs->r13 = 0 */ - pushq %r8 /* pt_regs->r14 = 0 */ - pushq %r8 /* pt_regs->r15 = 0 */ + xorl %ebp, %ebp /* nospec rbp */ + pushq $0 /* pt_regs->r12 = 0 */ + xorq %r12, %r12 /* nospec r12 */ + pushq $0 /* pt_regs->r13 = 0 */ + xorq %r13, %r13 /* nospec r13 */ + pushq $0 /* pt_regs->r14 = 0 */ + xorq %r14, %r14 /* nospec r14 */ + pushq $0 /* pt_regs->r15 = 0 */ + xorq %r15, %r15 /* nospec r15 */ cld /* @@ -185,17 +194,26 @@ ENTRY(entry_SYSCALL_compat) pushq %rdx /* pt_regs->dx */ pushq %rbp /* pt_regs->cx (stashed in bp) */ pushq $-ENOSYS /* pt_regs->ax */ - xorq %r8,%r8 - pushq %r8 /* pt_regs->r8 = 0 */ - pushq %r8 /* pt_regs->r9 = 0 */ - pushq %r8 /* pt_regs->r10 = 0 */ - pushq %r8 /* pt_regs->r11 = 0 */ + pushq $0 /* pt_regs->r8 = 0 */ + xorq %r8, %r8 /* nospec r8 */ + pushq $0 /* pt_regs->r9 = 0 */ + xorq %r9, %r9 /* nospec r9 */ + pushq $0 /* pt_regs->r10 = 0 */ + xorq %r10, %r10 /* nospec r10 */ + pushq $0 /* pt_regs->r11 = 0 */ + xorq %r11, %r11 /* nospec r11 */ pushq %rbx /* pt_regs->rbx */ + xorl %ebx, %ebx /* nospec rbx */ pushq %rbp /* pt_regs->rbp (will be overwritten) */ - pushq %r8 /* pt_regs->r12 = 0 */ - pushq %r8 /* pt_regs->r13 = 0 */ - pushq %r8 /* pt_regs->r14 = 0 */ - pushq %r8 /* pt_regs->r15 = 0 */ + xorl %ebp, %ebp /* nospec rbp */ + pushq $0 /* pt_regs->r12 = 0 */ + xorq %r12, %r12 /* nospec r12 */ + pushq $0 /* pt_regs->r13 = 0 */ + xorq %r13, %r13 /* nospec r13 */ + pushq $0 /* pt_regs->r14 = 0 */ + xorq %r14, %r14 /* nospec r14 */ + pushq $0 /* pt_regs->r15 = 0 */ + xorq %r15, %r15 /* nospec r15 */ /* * User mode is traced as though IRQs are on, and SYSENTER @@ -292,17 +310,26 @@ ENTRY(entry_INT80_compat) pushq %rdx /* pt_regs->dx */ pushq %rcx /* pt_regs->cx */ pushq $-ENOSYS /* pt_regs->ax */ - xorq %r8,%r8 - pushq %r8 /* pt_regs->r8 = 0 */ - pushq %r8 /* pt_regs->r9 = 0 */ - pushq %r8 /* pt_regs->r10 = 0 */ - pushq %r8 /* pt_regs->r11 = 0 */ + pushq $0 /* pt_regs->r8 = 0 */ + xorq %r8, %r8 /* nospec r8 */ + pushq $0 /* pt_regs->r9 = 0 */ + xorq %r9, %r9 /* nospec r9 */ + pushq $0 /* pt_regs->r10 = 0 */ + xorq %r10, %r10 /* nospec r10 */ + pushq $0 /* pt_regs->r11 = 0 */ + xorq %r11, %r11 /* nospec r11 */ pushq %rbx /* pt_regs->rbx */ + xorl %ebx, %ebx /* nospec rbx */ pushq %rbp /* pt_regs->rbp */ + xorl %ebp, %ebp /* nospec rbp */ pushq %r12 /* pt_regs->r12 */ + xorq %r12, %r12 /* nospec r12 */ pushq %r13 /* pt_regs->r13 */ + xorq %r13, %r13 /* nospec r13 */ pushq %r14 /* pt_regs->r14 */ + xorq %r14, %r14 /* nospec r14 */ pushq %r15 /* pt_regs->r15 */ + xorq %r15, %r15 /* nospec r15 */ cld /* diff --git a/arch/x86/include/asm/apm.h b/arch/x86/include/asm/apm.h index 20370c6db74b..3d1ec41ae09a 100644 --- a/arch/x86/include/asm/apm.h +++ b/arch/x86/include/asm/apm.h @@ -6,6 +6,8 @@ #ifndef _ASM_X86_MACH_DEFAULT_APM_H #define _ASM_X86_MACH_DEFAULT_APM_H +#include <asm/nospec-branch.h> + #ifdef APM_ZERO_SEGS # define APM_DO_ZERO_SEGS \ "pushl %%ds\n\t" \ @@ -31,6 +33,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in, * N.B. We do NOT need a cld after the BIOS call * because we always save and restore the flags. */ + firmware_restrict_branch_speculation_start(); __asm__ __volatile__(APM_DO_ZERO_SEGS "pushl %%edi\n\t" "pushl %%ebp\n\t" @@ -43,6 +46,7 @@ static inline void apm_bios_call_asm(u32 func, u32 ebx_in, u32 ecx_in, "=S" (*esi) : "a" (func), "b" (ebx_in), "c" (ecx_in) : "memory", "cc"); + firmware_restrict_branch_speculation_end(); } static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in, @@ -55,6 +59,7 @@ static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in, * N.B. We do NOT need a cld after the BIOS call * because we always save and restore the flags. */ + firmware_restrict_branch_speculation_start(); __asm__ __volatile__(APM_DO_ZERO_SEGS "pushl %%edi\n\t" "pushl %%ebp\n\t" @@ -67,6 +72,7 @@ static inline u8 apm_bios_call_simple_asm(u32 func, u32 ebx_in, "=S" (si) : "a" (func), "b" (ebx_in), "c" (ecx_in) : "memory", "cc"); + firmware_restrict_branch_speculation_end(); return error; } diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h index 858f8c354cea..f3d4f1edc947 100644 --- a/arch/x86/include/asm/asm.h +++ b/arch/x86/include/asm/asm.h @@ -45,6 +45,65 @@ #define _ASM_SI __ASM_REG(si) #define _ASM_DI __ASM_REG(di) +#ifndef __x86_64__ +/* 32 bit */ + +#define _ASM_ARG1 _ASM_AX +#define _ASM_ARG2 _ASM_DX +#define _ASM_ARG3 _ASM_CX + +#define _ASM_ARG1L eax +#define _ASM_ARG2L edx +#define _ASM_ARG3L ecx + +#define _ASM_ARG1W ax +#define _ASM_ARG2W dx +#define _ASM_ARG3W cx + +#define _ASM_ARG1B al +#define _ASM_ARG2B dl +#define _ASM_ARG3B cl + +#else +/* 64 bit */ + +#define _ASM_ARG1 _ASM_DI +#define _ASM_ARG2 _ASM_SI +#define _ASM_ARG3 _ASM_DX +#define _ASM_ARG4 _ASM_CX +#define _ASM_ARG5 r8 +#define _ASM_ARG6 r9 + +#define _ASM_ARG1Q rdi +#define _ASM_ARG2Q rsi +#define _ASM_ARG3Q rdx +#define _ASM_ARG4Q rcx +#define _ASM_ARG5Q r8 +#define _ASM_ARG6Q r9 + +#define _ASM_ARG1L edi +#define _ASM_ARG2L esi +#define _ASM_ARG3L edx +#define _ASM_ARG4L ecx +#define _ASM_ARG5L r8d +#define _ASM_ARG6L r9d + +#define _ASM_ARG1W di +#define _ASM_ARG2W si +#define _ASM_ARG3W dx +#define _ASM_ARG4W cx +#define _ASM_ARG5W r8w +#define _ASM_ARG6W r9w + +#define _ASM_ARG1B dil +#define _ASM_ARG2B sil +#define _ASM_ARG3B dl +#define _ASM_ARG4B cl +#define _ASM_ARG5B r8b +#define _ASM_ARG6B r9b + +#endif + /* Exception table entry */ #ifdef __ASSEMBLY__ # define _ASM_EXTABLE(from,to) \ diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index e3a6f66d288c..7f5dcb64cedb 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -40,7 +40,7 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, asm volatile ("cmp %1,%2; sbb %0,%0;" :"=r" (mask) - :"r"(size),"r" (index) + :"g"(size),"r" (index) :"cc"); return mask; } diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index dd0089841a0f..d72c1db64679 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -28,6 +28,7 @@ enum cpuid_leafs CPUID_8000_000A_EDX, CPUID_7_ECX, CPUID_8000_0007_EBX, + CPUID_7_EDX, }; #ifdef CONFIG_X86_FEATURE_NAMES @@ -78,8 +79,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 15, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) || \ REQUIRED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 18)) + BUILD_BUG_ON_ZERO(NCAPINTS != 19)) #define DISABLED_MASK_BIT_SET(feature_bit) \ ( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 0, feature_bit) || \ @@ -100,8 +102,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 15, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) || \ CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) || \ + CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) || \ DISABLED_MASK_CHECK || \ - BUILD_BUG_ON_ZERO(NCAPINTS != 18)) + BUILD_BUG_ON_ZERO(NCAPINTS != 19)) #define cpu_has(c, bit) \ (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 205ce70c1d6c..f4b175db70f4 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -12,7 +12,7 @@ /* * Defines x86 CPU feature bits */ -#define NCAPINTS 18 /* N 32-bit words worth of info */ +#define NCAPINTS 19 /* N 32-bit words worth of info */ #define NBUGINTS 1 /* N 32-bit bug flags */ /* @@ -194,13 +194,28 @@ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ -#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* Fill RSB on context switches */ +#define X86_FEATURE_RSB_CTXSW ( 7*32+19) /* "" Fill RSB on context switches */ + +#define X86_FEATURE_RETPOLINE ( 7*32+29) /* "" Generic Retpoline mitigation for Spectre variant 2 */ +#define X86_FEATURE_RETPOLINE_AMD ( 7*32+30) /* "" AMD Retpoline mitigation for Spectre variant 2 */ + +#define X86_FEATURE_MSR_SPEC_CTRL ( 7*32+16) /* "" MSR SPEC_CTRL is implemented */ +#define X86_FEATURE_SSBD ( 7*32+17) /* Speculative Store Bypass Disable */ -#define X86_FEATURE_RETPOLINE ( 7*32+29) /* Generic Retpoline mitigation for Spectre variant 2 */ -#define X86_FEATURE_RETPOLINE_AMD ( 7*32+30) /* AMD Retpoline mitigation for Spectre variant 2 */ /* Because the ALTERNATIVE scheme is for members of the X86_FEATURE club... */ #define X86_FEATURE_KAISER ( 7*32+31) /* CONFIG_PAGE_TABLE_ISOLATION w/o nokaiser */ +#define X86_FEATURE_USE_IBPB ( 7*32+21) /* "" Indirect Branch Prediction Barrier enabled*/ +#define X86_FEATURE_USE_IBRS_FW ( 7*32+22) /* "" Use IBRS during runtime firmware calls */ +#define X86_FEATURE_SPEC_STORE_BYPASS_DISABLE ( 7*32+23) /* "" Disable Speculative Store Bypass. */ +#define X86_FEATURE_LS_CFG_SSBD ( 7*32+24) /* "" AMD SSBD implementation */ + +#define X86_FEATURE_IBRS ( 7*32+25) /* Indirect Branch Restricted Speculation */ +#define X86_FEATURE_IBPB ( 7*32+26) /* Indirect Branch Prediction Barrier */ +#define X86_FEATURE_STIBP ( 7*32+27) /* Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_ZEN ( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */ + + /* Virtualization flags: Linux defined, word 8 */ #define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ #define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ @@ -251,6 +266,10 @@ /* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */ #define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ +#define X86_FEATURE_AMD_IBPB (13*32+12) /* Indirect Branch Prediction Barrier */ +#define X86_FEATURE_AMD_IBRS (13*32+14) /* Indirect Branch Restricted Speculation */ +#define X86_FEATURE_AMD_STIBP (13*32+15) /* Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ /* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ @@ -285,6 +304,15 @@ #define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ #define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */ + +/* Intel-defined CPU features, CPUID level 0x00000007:0 (EDX), word 18 */ +#define X86_FEATURE_AVX512_4VNNIW (18*32+ 2) /* AVX-512 Neural Network Instructions */ +#define X86_FEATURE_AVX512_4FMAPS (18*32+ 3) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_SPEC_CTRL (18*32+26) /* "" Speculation Control (IBRS + IBPB) */ +#define X86_FEATURE_INTEL_STIBP (18*32+27) /* "" Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_ARCH_CAPABILITIES (18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */ +#define X86_FEATURE_SPEC_CTRL_SSBD (18*32+31) /* "" Speculative Store Bypass Disable */ + /* * BUG word(s) */ @@ -302,5 +330,6 @@ #define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */ #define X86_BUG_SPECTRE_V1 X86_BUG(15) /* CPU is affected by Spectre variant 1 attack with conditional branches */ #define X86_BUG_SPECTRE_V2 X86_BUG(16) /* CPU is affected by Spectre variant 2 attack with indirect branches */ +#define X86_BUG_SPEC_STORE_BYPASS X86_BUG(17) /* CPU is affected by speculative store bypass attack */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 21c5ac15657b..1f8cca459c6c 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -59,6 +59,7 @@ #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE) #define DISABLED_MASK17 0 -#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18) +#define DISABLED_MASK18 0 +#define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) #endif /* _ASM_X86_DISABLED_FEATURES_H */ diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h index 08b1f2f6ea50..cfde088f8e95 100644 --- a/arch/x86/include/asm/efi.h +++ b/arch/x86/include/asm/efi.h @@ -3,6 +3,7 @@ #include <asm/fpu/api.h> #include <asm/pgtable.h> +#include <asm/nospec-branch.h> /* * We map the EFI regions needed for runtime services non-contiguously, @@ -41,8 +42,10 @@ extern unsigned long asmlinkage efi_call_phys(void *, ...); ({ \ efi_status_t __s; \ kernel_fpu_begin(); \ + firmware_restrict_branch_speculation_start(); \ __s = ((efi_##f##_t __attribute__((regparm(0)))*) \ efi.systab->runtime->f)(args); \ + firmware_restrict_branch_speculation_end(); \ kernel_fpu_end(); \ __s; \ }) @@ -51,8 +54,10 @@ extern unsigned long asmlinkage efi_call_phys(void *, ...); #define __efi_call_virt(f, args...) \ ({ \ kernel_fpu_begin(); \ + firmware_restrict_branch_speculation_start(); \ ((efi_##f##_t __attribute__((regparm(0)))*) \ efi.systab->runtime->f)(args); \ + firmware_restrict_branch_speculation_end(); \ kernel_fpu_end(); \ }) @@ -73,7 +78,9 @@ extern u64 asmlinkage efi_call(void *fp, ...); efi_sync_low_kernel_mappings(); \ preempt_disable(); \ __kernel_fpu_begin(); \ + firmware_restrict_branch_speculation_start(); \ __s = efi_call((void *)efi.systab->runtime->f, __VA_ARGS__); \ + firmware_restrict_branch_speculation_end(); \ __kernel_fpu_end(); \ preempt_enable(); \ __s; \ diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h index 6999f7d01a0d..e13ff5a14633 100644 --- a/arch/x86/include/asm/intel-family.h +++ b/arch/x86/include/asm/intel-family.h @@ -12,6 +12,7 @@ */ #define INTEL_FAM6_CORE_YONAH 0x0E + #define INTEL_FAM6_CORE2_MEROM 0x0F #define INTEL_FAM6_CORE2_MEROM_L 0x16 #define INTEL_FAM6_CORE2_PENRYN 0x17 @@ -20,6 +21,7 @@ #define INTEL_FAM6_NEHALEM 0x1E #define INTEL_FAM6_NEHALEM_EP 0x1A #define INTEL_FAM6_NEHALEM_EX 0x2E + #define INTEL_FAM6_WESTMERE 0x25 #define INTEL_FAM6_WESTMERE2 0x1F #define INTEL_FAM6_WESTMERE_EP 0x2C @@ -36,9 +38,9 @@ #define INTEL_FAM6_HASWELL_GT3E 0x46 #define INTEL_FAM6_BROADWELL_CORE 0x3D -#define INTEL_FAM6_BROADWELL_XEON_D 0x56 #define INTEL_FAM6_BROADWELL_GT3E 0x47 #define INTEL_FAM6_BROADWELL_X 0x4F +#define INTEL_FAM6_BROADWELL_XEON_D 0x56 #define INTEL_FAM6_SKYLAKE_MOBILE 0x4E #define INTEL_FAM6_SKYLAKE_DESKTOP 0x5E @@ -56,13 +58,15 @@ #define INTEL_FAM6_ATOM_SILVERMONT1 0x37 /* BayTrail/BYT / Valleyview */ #define INTEL_FAM6_ATOM_SILVERMONT2 0x4D /* Avaton/Rangely */ #define INTEL_FAM6_ATOM_AIRMONT 0x4C /* CherryTrail / Braswell */ -#define INTEL_FAM6_ATOM_MERRIFIELD1 0x4A /* Tangier */ -#define INTEL_FAM6_ATOM_MERRIFIELD2 0x5A /* Annidale */ +#define INTEL_FAM6_ATOM_MERRIFIELD 0x4A /* Tangier */ +#define INTEL_FAM6_ATOM_MOOREFIELD 0x5A /* Annidale */ #define INTEL_FAM6_ATOM_GOLDMONT 0x5C #define INTEL_FAM6_ATOM_DENVERTON 0x5F /* Goldmont Microserver */ +#define INTEL_FAM6_ATOM_GEMINI_LAKE 0x7A /* Xeon Phi */ #define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */ +#define INTEL_FAM6_XEON_PHI_KNM 0x85 /* Knights Mill */ #endif /* _ASM_X86_INTEL_FAMILY_H */ diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index b77f5edb03b0..0056bc945cd1 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -8,7 +8,7 @@ * Interrupt control: */ -static inline unsigned long native_save_fl(void) +extern inline unsigned long native_save_fl(void) { unsigned long flags; diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 7680b76adafc..3359dfedc7ee 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -3,12 +3,18 @@ #include <linux/spinlock.h> #include <linux/mutex.h> +#include <linux/atomic.h> /* - * The x86 doesn't have a mmu context, but - * we put the segment information here. + * x86 has arch-specific MMU state beyond what lives in mm_struct. */ typedef struct { + /* + * ctx_id uniquely identifies this mm_struct. A ctx_id will never + * be reused, and zero is not a valid ctx_id. + */ + u64 ctx_id; + #ifdef CONFIG_MODIFY_LDT_SYSCALL struct ldt_struct *ldt; #endif @@ -24,6 +30,11 @@ typedef struct { atomic_t perf_rdpmc_allowed; /* nonzero if rdpmc is allowed */ } mm_context_t; +#define INIT_MM_CONTEXT(mm) \ + .context = { \ + .ctx_id = 1, \ + } + void leave_mm(int cpu); #endif /* _ASM_X86_MMU_H */ diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 9bfc5fd77015..effc12767cbf 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -11,6 +11,9 @@ #include <asm/tlbflush.h> #include <asm/paravirt.h> #include <asm/mpx.h> + +extern atomic64_t last_mm_ctx_id; + #ifndef CONFIG_PARAVIRT static inline void paravirt_activate_mm(struct mm_struct *prev, struct mm_struct *next) @@ -52,15 +55,15 @@ struct ldt_struct { /* * Used for LDT copy/destruction. */ -int init_new_context(struct task_struct *tsk, struct mm_struct *mm); -void destroy_context(struct mm_struct *mm); +int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm); +void destroy_context_ldt(struct mm_struct *mm); #else /* CONFIG_MODIFY_LDT_SYSCALL */ -static inline int init_new_context(struct task_struct *tsk, - struct mm_struct *mm) +static inline int init_new_context_ldt(struct task_struct *tsk, + struct mm_struct *mm) { return 0; } -static inline void destroy_context(struct mm_struct *mm) {} +static inline void destroy_context_ldt(struct mm_struct *mm) {} #endif static inline void load_mm_ldt(struct mm_struct *mm) @@ -102,6 +105,18 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY); } +static inline int init_new_context(struct task_struct *tsk, + struct mm_struct *mm) +{ + mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id); + init_new_context_ldt(tsk, mm); + return 0; +} +static inline void destroy_context(struct mm_struct *mm) +{ + destroy_context_ldt(mm); +} + extern void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk); diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index b8911aecf035..caa00191e565 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -32,6 +32,15 @@ #define EFER_FFXSR (1<<_EFER_FFXSR) /* Intel MSRs. Some also available on other CPUs */ +#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ +#define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */ +#define SPEC_CTRL_STIBP (1 << 1) /* Single Thread Indirect Branch Predictors */ +#define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ +#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ + +#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ +#define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */ + #define MSR_IA32_PERFCTR0 0x000000c1 #define MSR_IA32_PERFCTR1 0x000000c2 #define MSR_FSB_FREQ 0x000000cd @@ -45,6 +54,16 @@ #define SNB_C3_AUTO_UNDEMOTE (1UL << 28) #define MSR_MTRRcap 0x000000fe + +#define MSR_IA32_ARCH_CAPABILITIES 0x0000010a +#define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */ +#define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */ +#define ARCH_CAP_SSB_NO (1 << 4) /* + * Not susceptible to Speculative Store Bypass + * attack, so no Speculative Store Bypass + * control required. + */ + #define MSR_IA32_BBL_CR_CTL 0x00000119 #define MSR_IA32_BBL_CR_CTL3 0x0000011e @@ -132,6 +151,7 @@ /* DEBUGCTLMSR bits (others vary by model): */ #define DEBUGCTLMSR_LBR (1UL << 0) /* last branch recording */ +#define DEBUGCTLMSR_BTF_SHIFT 1 #define DEBUGCTLMSR_BTF (1UL << 1) /* single-step on branches */ #define DEBUGCTLMSR_TR (1UL << 6) #define DEBUGCTLMSR_BTS (1UL << 7) @@ -308,6 +328,8 @@ #define MSR_AMD64_IBSOPDATA4 0xc001103d #define MSR_AMD64_IBS_REG_COUNT_MAX 8 /* includes MSR_AMD64_IBSBRTARGET */ +#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f + /* Fam 16h MSRs */ #define MSR_F16H_L2I_PERF_CTL 0xc0010230 #define MSR_F16H_L2I_PERF_CTR 0xc0010231 diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 8b910416243c..b4c74c24c890 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -6,6 +6,7 @@ #include <asm/alternative.h> #include <asm/alternative-asm.h> #include <asm/cpufeatures.h> +#include <asm/msr-index.h> /* * Fill the CPU return stack buffer. @@ -171,6 +172,14 @@ enum spectre_v2_mitigation { SPECTRE_V2_IBRS, }; +/* The Speculative Store Bypass disable variants */ +enum ssb_mitigation { + SPEC_STORE_BYPASS_NONE, + SPEC_STORE_BYPASS_DISABLE, + SPEC_STORE_BYPASS_PRCTL, + SPEC_STORE_BYPASS_SECCOMP, +}; + extern char __indirect_thunk_start[]; extern char __indirect_thunk_end[]; @@ -194,6 +203,51 @@ static inline void vmexit_fill_RSB(void) #endif } +static __always_inline +void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature) +{ + asm volatile(ALTERNATIVE("", "wrmsr", %c[feature]) + : : "c" (msr), + "a" ((u32)val), + "d" ((u32)(val >> 32)), + [feature] "i" (feature) + : "memory"); +} + +static inline void indirect_branch_prediction_barrier(void) +{ + u64 val = PRED_CMD_IBPB; + + alternative_msr_write(MSR_IA32_PRED_CMD, val, X86_FEATURE_USE_IBPB); +} + +/* The Intel SPEC CTRL MSR base value cache */ +extern u64 x86_spec_ctrl_base; + +/* + * With retpoline, we must use IBRS to restrict branch prediction + * before calling into firmware. + * + * (Implemented as CPP macros due to header hell.) + */ +#define firmware_restrict_branch_speculation_start() \ +do { \ + u64 val = x86_spec_ctrl_base | SPEC_CTRL_IBRS; \ + \ + preempt_disable(); \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \ + X86_FEATURE_USE_IBRS_FW); \ +} while (0) + +#define firmware_restrict_branch_speculation_end() \ +do { \ + u64 val = x86_spec_ctrl_base; \ + \ + alternative_msr_write(MSR_IA32_SPEC_CTRL, val, \ + X86_FEATURE_USE_IBRS_FW); \ + preempt_enable(); \ +} while (0) + #endif /* __ASSEMBLY__ */ /* diff --git a/arch/x86/include/asm/required-features.h b/arch/x86/include/asm/required-features.h index fac9a5c0abe9..6847d85400a8 100644 --- a/arch/x86/include/asm/required-features.h +++ b/arch/x86/include/asm/required-features.h @@ -100,6 +100,7 @@ #define REQUIRED_MASK15 0 #define REQUIRED_MASK16 0 #define REQUIRED_MASK17 0 -#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 18) +#define REQUIRED_MASK18 0 +#define REQUIRED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19) #endif /* _ASM_X86_REQUIRED_FEATURES_H */ diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h new file mode 100644 index 000000000000..ae7c2c5cd7f0 --- /dev/null +++ b/arch/x86/include/asm/spec-ctrl.h @@ -0,0 +1,80 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_SPECCTRL_H_ +#define _ASM_X86_SPECCTRL_H_ + +#include <linux/thread_info.h> +#include <asm/nospec-branch.h> + +/* + * On VMENTER we must preserve whatever view of the SPEC_CTRL MSR + * the guest has, while on VMEXIT we restore the host view. This + * would be easier if SPEC_CTRL were architecturally maskable or + * shadowable for guests but this is not (currently) the case. + * Takes the guest view of SPEC_CTRL MSR as a parameter and also + * the guest's version of VIRT_SPEC_CTRL, if emulated. + */ +extern void x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool guest); + +/** + * x86_spec_ctrl_set_guest - Set speculation control registers for the guest + * @guest_spec_ctrl: The guest content of MSR_SPEC_CTRL + * @guest_virt_spec_ctrl: The guest controlled bits of MSR_VIRT_SPEC_CTRL + * (may get translated to MSR_AMD64_LS_CFG bits) + * + * Avoids writing to the MSR if the content/bits are the same + */ +static inline +void x86_spec_ctrl_set_guest(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl) +{ + x86_virt_spec_ctrl(guest_spec_ctrl, guest_virt_spec_ctrl, true); +} + +/** + * x86_spec_ctrl_restore_host - Restore host speculation control registers + * @guest_spec_ctrl: The guest content of MSR_SPEC_CTRL + * @guest_virt_spec_ctrl: The guest controlled bits of MSR_VIRT_SPEC_CTRL + * (may get translated to MSR_AMD64_LS_CFG bits) + * + * Avoids writing to the MSR if the content/bits are the same + */ +static inline +void x86_spec_ctrl_restore_host(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl) +{ + x86_virt_spec_ctrl(guest_spec_ctrl, guest_virt_spec_ctrl, false); +} + +/* AMD specific Speculative Store Bypass MSR data */ +extern u64 x86_amd_ls_cfg_base; +extern u64 x86_amd_ls_cfg_ssbd_mask; + +static inline u64 ssbd_tif_to_spec_ctrl(u64 tifn) +{ + BUILD_BUG_ON(TIF_SSBD < SPEC_CTRL_SSBD_SHIFT); + return (tifn & _TIF_SSBD) >> (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT); +} + +static inline unsigned long ssbd_spec_ctrl_to_tif(u64 spec_ctrl) +{ + BUILD_BUG_ON(TIF_SSBD < SPEC_CTRL_SSBD_SHIFT); + return (spec_ctrl & SPEC_CTRL_SSBD) << (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT); +} + +static inline u64 ssbd_tif_to_amd_ls_cfg(u64 tifn) +{ + return (tifn & _TIF_SSBD) ? x86_amd_ls_cfg_ssbd_mask : 0ULL; +} + +#ifdef CONFIG_SMP +extern void speculative_store_bypass_ht_init(void); +#else +static inline void speculative_store_bypass_ht_init(void) { } +#endif + +extern void speculative_store_bypass_update(unsigned long tif); + +static inline void speculative_store_bypass_update_current(void) +{ + speculative_store_bypass_update(current_thread_info()->flags); +} + +#endif diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 913468b17b9c..e8d453732823 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -92,6 +92,7 @@ struct thread_info { #define TIF_SIGPENDING 2 /* signal pending */ #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/ +#define TIF_SSBD 5 /* Reduced data speculation */ #define TIF_SYSCALL_EMU 6 /* syscall emulation active */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */ @@ -114,8 +115,9 @@ struct thread_info { #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) -#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) +#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP) +#define _TIF_SSBD (1 << TIF_SSBD) #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) @@ -147,7 +149,7 @@ struct thread_info { /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ - (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP) + (_TIF_IO_BITMAP|_TIF_NOTSC|_TIF_BLOCKSTEP|_TIF_SSBD) #define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index e2a89d2577fb..72cfe3e53af1 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -68,6 +68,8 @@ static inline void invpcid_flush_all_nonglobals(void) struct tlb_state { struct mm_struct *active_mm; int state; + /* last user mm's ctx id */ + u64 last_ctx_id; /* * Access to this CR4 shadow and to H/W CR4 is protected by @@ -109,6 +111,16 @@ static inline void cr4_clear_bits(unsigned long mask) } } +static inline void cr4_toggle_bits(unsigned long mask) +{ + unsigned long cr4; + + cr4 = this_cpu_read(cpu_tlbstate.cr4); + cr4 ^= mask; + this_cpu_write(cpu_tlbstate.cr4, cr4); + __write_cr4(cr4); +} + /* Read the CR4 shadow. */ static inline unsigned long cr4_read_shadow(void) { diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 616ebd22ef9a..5e59af41d40e 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -53,6 +53,7 @@ obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o obj-y += tsc.o tsc_msr.o io_delay.o rtc.o obj-y += pci-iommu_table.o obj-y += resource.o +obj-y += irqflags.o obj-y += process.o obj-y += fpu/ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index f4fb8f5b0be4..9f6151884249 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -9,6 +9,7 @@ #include <asm/processor.h> #include <asm/apic.h> #include <asm/cpu.h> +#include <asm/spec-ctrl.h> #include <asm/smp.h> #include <asm/pci-direct.h> #include <asm/delay.h> @@ -519,6 +520,26 @@ static void bsp_init_amd(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_MWAITX)) use_mwaitx_delay(); + + if (c->x86 >= 0x15 && c->x86 <= 0x17) { + unsigned int bit; + + switch (c->x86) { + case 0x15: bit = 54; break; + case 0x16: bit = 33; break; + case 0x17: bit = 10; break; + default: return; + } + /* + * Try to cache the base value so further operations can + * avoid RMW. If that faults, do not enable SSBD. + */ + if (!rdmsrl_safe(MSR_AMD64_LS_CFG, &x86_amd_ls_cfg_base)) { + setup_force_cpu_cap(X86_FEATURE_LS_CFG_SSBD); + setup_force_cpu_cap(X86_FEATURE_SSBD); + x86_amd_ls_cfg_ssbd_mask = 1ULL << bit; + } + } } static void early_init_amd(struct cpuinfo_x86 *c) @@ -692,6 +713,17 @@ static void init_amd_bd(struct cpuinfo_x86 *c) } } +static void init_amd_zn(struct cpuinfo_x86 *c) +{ + set_cpu_cap(c, X86_FEATURE_ZEN); + /* + * Fix erratum 1076: CPB feature bit not being set in CPUID. It affects + * all up to and including B1. + */ + if (c->x86_model <= 1 && c->x86_mask <= 1) + set_cpu_cap(c, X86_FEATURE_CPB); +} + static void init_amd(struct cpuinfo_x86 *c) { u32 dummy; @@ -722,6 +754,7 @@ static void init_amd(struct cpuinfo_x86 *c) case 0x10: init_amd_gh(c); break; case 0x12: init_amd_ln(c); break; case 0x15: init_amd_bd(c); break; + case 0x17: init_amd_zn(c); break; } /* Enable workaround for FXSAVE leak */ @@ -791,8 +824,9 @@ static void init_amd(struct cpuinfo_x86 *c) if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM)) set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH); - /* AMD CPUs don't reset SS attributes on SYSRET */ - set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS); + /* AMD CPUs don't reset SS attributes on SYSRET, Xen does. */ + if (!cpu_has(c, X86_FEATURE_XENPV)) + set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS); } #ifdef CONFIG_X86_32 diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 2bbc74f8a4a8..12a8867071f3 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -11,8 +11,10 @@ #include <linux/utsname.h> #include <linux/cpu.h> #include <linux/module.h> +#include <linux/nospec.h> +#include <linux/prctl.h> -#include <asm/nospec-branch.h> +#include <asm/spec-ctrl.h> #include <asm/cmdline.h> #include <asm/bugs.h> #include <asm/processor.h> @@ -26,6 +28,27 @@ #include <asm/intel-family.h> static void __init spectre_v2_select_mitigation(void); +static void __init ssb_select_mitigation(void); + +/* + * Our boot-time value of the SPEC_CTRL MSR. We read it once so that any + * writes to SPEC_CTRL contain whatever reserved bits have been set. + */ +u64 x86_spec_ctrl_base; +EXPORT_SYMBOL_GPL(x86_spec_ctrl_base); + +/* + * The vendor and possibly platform specific bits which can be modified in + * x86_spec_ctrl_base. + */ +static u64 x86_spec_ctrl_mask = SPEC_CTRL_IBRS; + +/* + * AMD specific MSR info for Speculative Store Bypass control. + * x86_amd_ls_cfg_ssbd_mask is initialized in identify_boot_cpu(). + */ +u64 x86_amd_ls_cfg_base; +u64 x86_amd_ls_cfg_ssbd_mask; void __init check_bugs(void) { @@ -36,9 +59,27 @@ void __init check_bugs(void) print_cpu_info(&boot_cpu_data); } + /* + * Read the SPEC_CTRL MSR to account for reserved bits which may + * have unknown values. AMD64_LS_CFG MSR is cached in the early AMD + * init code as it is not enumerated and depends on the family. + */ + if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) + rdmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); + + /* Allow STIBP in MSR_SPEC_CTRL if supported */ + if (boot_cpu_has(X86_FEATURE_STIBP)) + x86_spec_ctrl_mask |= SPEC_CTRL_STIBP; + /* Select the proper spectre mitigation before patching alternatives */ spectre_v2_select_mitigation(); + /* + * Select proper mitigation for any exposure to the Speculative Store + * Bypass vulnerability. + */ + ssb_select_mitigation(); + #ifdef CONFIG_X86_32 /* * Check whether we are able to run this kernel safely on SMP. @@ -94,6 +135,73 @@ static const char *spectre_v2_strings[] = { static enum spectre_v2_mitigation spectre_v2_enabled = SPECTRE_V2_NONE; +void +x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest) +{ + u64 msrval, guestval, hostval = x86_spec_ctrl_base; + struct thread_info *ti = current_thread_info(); + + /* Is MSR_SPEC_CTRL implemented ? */ + if (static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) { + /* + * Restrict guest_spec_ctrl to supported values. Clear the + * modifiable bits in the host base value and or the + * modifiable bits from the guest value. + */ + guestval = hostval & ~x86_spec_ctrl_mask; + guestval |= guest_spec_ctrl & x86_spec_ctrl_mask; + + /* SSBD controlled in MSR_SPEC_CTRL */ + if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD)) + hostval |= ssbd_tif_to_spec_ctrl(ti->flags); + + if (hostval != guestval) { + msrval = setguest ? guestval : hostval; + wrmsrl(MSR_IA32_SPEC_CTRL, msrval); + } + } + + /* + * If SSBD is not handled in MSR_SPEC_CTRL on AMD, update + * MSR_AMD64_L2_CFG or MSR_VIRT_SPEC_CTRL if supported. + */ + if (!static_cpu_has(X86_FEATURE_LS_CFG_SSBD) && + !static_cpu_has(X86_FEATURE_VIRT_SSBD)) + return; + + /* + * If the host has SSBD mitigation enabled, force it in the host's + * virtual MSR value. If its not permanently enabled, evaluate + * current's TIF_SSBD thread flag. + */ + if (static_cpu_has(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE)) + hostval = SPEC_CTRL_SSBD; + else + hostval = ssbd_tif_to_spec_ctrl(ti->flags); + + /* Sanitize the guest value */ + guestval = guest_virt_spec_ctrl & SPEC_CTRL_SSBD; + + if (hostval != guestval) { + unsigned long tif; + + tif = setguest ? ssbd_spec_ctrl_to_tif(guestval) : + ssbd_spec_ctrl_to_tif(hostval); + + speculative_store_bypass_update(tif); + } +} +EXPORT_SYMBOL_GPL(x86_virt_spec_ctrl); + +static void x86_amd_ssb_disable(void) +{ + u64 msrval = x86_amd_ls_cfg_base | x86_amd_ls_cfg_ssbd_mask; + + if (boot_cpu_has(X86_FEATURE_VIRT_SSBD)) + wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, SPEC_CTRL_SSBD); + else if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD)) + wrmsrl(MSR_AMD64_LS_CFG, msrval); +} #ifdef RETPOLINE static bool spectre_v2_bad_module; @@ -162,8 +270,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; else { - ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, - sizeof(arg)); + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); if (ret < 0) return SPECTRE_V2_CMD_AUTO; @@ -184,8 +291,7 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) cmd == SPECTRE_V2_CMD_RETPOLINE_AMD || cmd == SPECTRE_V2_CMD_RETPOLINE_GENERIC) && !IS_ENABLED(CONFIG_RETPOLINE)) { - pr_err("%s selected but not compiled in. Switching to AUTO select\n", - mitigation_options[i].option); + pr_err("%s selected but not compiled in. Switching to AUTO select\n", mitigation_options[i].option); return SPECTRE_V2_CMD_AUTO; } @@ -255,14 +361,14 @@ static void __init spectre_v2_select_mitigation(void) goto retpoline_auto; break; } - pr_err("kernel not compiled with retpoline; no mitigation available!"); + pr_err("Spectre mitigation: kernel not compiled with retpoline; no mitigation available!"); return; retpoline_auto: if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) { retpoline_amd: if (!boot_cpu_has(X86_FEATURE_LFENCE_RDTSC)) { - pr_err("LFENCE not serializing. Switching to generic retpoline\n"); + pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n"); goto retpoline_generic; } mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : @@ -280,7 +386,7 @@ retpoline_auto: pr_info("%s\n", spectre_v2_strings[mode]); /* - * If neither SMEP or KPTI are available, there is a risk of + * If neither SMEP nor PTI are available, there is a risk of * hitting userspace addresses in the RSB after a context switch * from a shallow call stack to a deeper one. To prevent this fill * the entire RSB, even when using IBRS. @@ -294,38 +400,309 @@ retpoline_auto: if ((!boot_cpu_has(X86_FEATURE_KAISER) && !boot_cpu_has(X86_FEATURE_SMEP)) || is_skylake_era()) { setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Filling RSB on context switch\n"); + pr_info("Spectre v2 mitigation: Filling RSB on context switch\n"); + } + + /* Initialize Indirect Branch Prediction Barrier if supported */ + if (boot_cpu_has(X86_FEATURE_IBPB)) { + setup_force_cpu_cap(X86_FEATURE_USE_IBPB); + pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); + } + + /* + * Retpoline means the kernel is safe because it has no indirect + * branches. But firmware isn't, so use IBRS to protect that. + */ + if (boot_cpu_has(X86_FEATURE_IBRS)) { + setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW); + pr_info("Enabling Restricted Speculation for firmware calls\n"); + } +} + +#undef pr_fmt +#define pr_fmt(fmt) "Speculative Store Bypass: " fmt + +static enum ssb_mitigation ssb_mode = SPEC_STORE_BYPASS_NONE; + +/* The kernel command line selection */ +enum ssb_mitigation_cmd { + SPEC_STORE_BYPASS_CMD_NONE, + SPEC_STORE_BYPASS_CMD_AUTO, + SPEC_STORE_BYPASS_CMD_ON, + SPEC_STORE_BYPASS_CMD_PRCTL, + SPEC_STORE_BYPASS_CMD_SECCOMP, +}; + +static const char *ssb_strings[] = { + [SPEC_STORE_BYPASS_NONE] = "Vulnerable", + [SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled", + [SPEC_STORE_BYPASS_PRCTL] = "Mitigation: Speculative Store Bypass disabled via prctl", + [SPEC_STORE_BYPASS_SECCOMP] = "Mitigation: Speculative Store Bypass disabled via prctl and seccomp", +}; + +static const struct { + const char *option; + enum ssb_mitigation_cmd cmd; +} ssb_mitigation_options[] = { + { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */ + { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */ + { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */ + { "prctl", SPEC_STORE_BYPASS_CMD_PRCTL }, /* Disable Speculative Store Bypass via prctl */ + { "seccomp", SPEC_STORE_BYPASS_CMD_SECCOMP }, /* Disable Speculative Store Bypass via prctl and seccomp */ +}; + +static enum ssb_mitigation_cmd __init ssb_parse_cmdline(void) +{ + enum ssb_mitigation_cmd cmd = SPEC_STORE_BYPASS_CMD_AUTO; + char arg[20]; + int ret, i; + + if (cmdline_find_option_bool(boot_command_line, "nospec_store_bypass_disable")) { + return SPEC_STORE_BYPASS_CMD_NONE; + } else { + ret = cmdline_find_option(boot_command_line, "spec_store_bypass_disable", + arg, sizeof(arg)); + if (ret < 0) + return SPEC_STORE_BYPASS_CMD_AUTO; + + for (i = 0; i < ARRAY_SIZE(ssb_mitigation_options); i++) { + if (!match_option(arg, ret, ssb_mitigation_options[i].option)) + continue; + + cmd = ssb_mitigation_options[i].cmd; + break; + } + + if (i >= ARRAY_SIZE(ssb_mitigation_options)) { + pr_err("unknown option (%s). Switching to AUTO select\n", arg); + return SPEC_STORE_BYPASS_CMD_AUTO; + } + } + + return cmd; +} + +static enum ssb_mitigation __init __ssb_select_mitigation(void) +{ + enum ssb_mitigation mode = SPEC_STORE_BYPASS_NONE; + enum ssb_mitigation_cmd cmd; + + if (!boot_cpu_has(X86_FEATURE_SSBD)) + return mode; + + cmd = ssb_parse_cmdline(); + if (!boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS) && + (cmd == SPEC_STORE_BYPASS_CMD_NONE || + cmd == SPEC_STORE_BYPASS_CMD_AUTO)) + return mode; + + switch (cmd) { + case SPEC_STORE_BYPASS_CMD_AUTO: + case SPEC_STORE_BYPASS_CMD_SECCOMP: + /* + * Choose prctl+seccomp as the default mode if seccomp is + * enabled. + */ + if (IS_ENABLED(CONFIG_SECCOMP)) + mode = SPEC_STORE_BYPASS_SECCOMP; + else + mode = SPEC_STORE_BYPASS_PRCTL; + break; + case SPEC_STORE_BYPASS_CMD_ON: + mode = SPEC_STORE_BYPASS_DISABLE; + break; + case SPEC_STORE_BYPASS_CMD_PRCTL: + mode = SPEC_STORE_BYPASS_PRCTL; + break; + case SPEC_STORE_BYPASS_CMD_NONE: + break; + } + + /* + * We have three CPU feature flags that are in play here: + * - X86_BUG_SPEC_STORE_BYPASS - CPU is susceptible. + * - X86_FEATURE_SSBD - CPU is able to turn off speculative store bypass + * - X86_FEATURE_SPEC_STORE_BYPASS_DISABLE - engage the mitigation + */ + if (mode == SPEC_STORE_BYPASS_DISABLE) { + setup_force_cpu_cap(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE); + /* + * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD uses + * a completely different MSR and bit dependent on family. + */ + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_INTEL: + x86_spec_ctrl_base |= SPEC_CTRL_SSBD; + x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); + break; + case X86_VENDOR_AMD: + x86_amd_ssb_disable(); + break; + } } + + return mode; +} + +static void ssb_select_mitigation(void) +{ + ssb_mode = __ssb_select_mitigation(); + + if (boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS)) + pr_info("%s\n", ssb_strings[ssb_mode]); } #undef pr_fmt +#define pr_fmt(fmt) "Speculation prctl: " fmt + +static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) +{ + bool update; + + if (ssb_mode != SPEC_STORE_BYPASS_PRCTL && + ssb_mode != SPEC_STORE_BYPASS_SECCOMP) + return -ENXIO; + + switch (ctrl) { + case PR_SPEC_ENABLE: + /* If speculation is force disabled, enable is not allowed */ + if (task_spec_ssb_force_disable(task)) + return -EPERM; + task_clear_spec_ssb_disable(task); + update = test_and_clear_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_DISABLE: + task_set_spec_ssb_disable(task); + update = !test_and_set_tsk_thread_flag(task, TIF_SSBD); + break; + case PR_SPEC_FORCE_DISABLE: + task_set_spec_ssb_disable(task); + task_set_spec_ssb_force_disable(task); + update = !test_and_set_tsk_thread_flag(task, TIF_SSBD); + break; + default: + return -ERANGE; + } + + /* + * If being set on non-current task, delay setting the CPU + * mitigation until it is next scheduled. + */ + if (task == current && update) + speculative_store_bypass_update_current(); + + return 0; +} + +int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, + unsigned long ctrl) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssb_prctl_set(task, ctrl); + default: + return -ENODEV; + } +} + +#ifdef CONFIG_SECCOMP +void arch_seccomp_spec_mitigate(struct task_struct *task) +{ + if (ssb_mode == SPEC_STORE_BYPASS_SECCOMP) + ssb_prctl_set(task, PR_SPEC_FORCE_DISABLE); +} +#endif + +static int ssb_prctl_get(struct task_struct *task) +{ + switch (ssb_mode) { + case SPEC_STORE_BYPASS_DISABLE: + return PR_SPEC_DISABLE; + case SPEC_STORE_BYPASS_SECCOMP: + case SPEC_STORE_BYPASS_PRCTL: + if (task_spec_ssb_force_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; + if (task_spec_ssb_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_DISABLE; + return PR_SPEC_PRCTL | PR_SPEC_ENABLE; + default: + if (boot_cpu_has_bug(X86_BUG_SPEC_STORE_BYPASS)) + return PR_SPEC_ENABLE; + return PR_SPEC_NOT_AFFECTED; + } +} + +int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which) +{ + switch (which) { + case PR_SPEC_STORE_BYPASS: + return ssb_prctl_get(task); + default: + return -ENODEV; + } +} + +void x86_spec_ctrl_setup_ap(void) +{ + if (boot_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); + + if (ssb_mode == SPEC_STORE_BYPASS_DISABLE) + x86_amd_ssb_disable(); +} #ifdef CONFIG_SYSFS -ssize_t cpu_show_meltdown(struct device *dev, - struct device_attribute *attr, char *buf) + +static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, + char *buf, unsigned int bug) { - if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) + if (!boot_cpu_has_bug(bug)) return sprintf(buf, "Not affected\n"); - if (boot_cpu_has(X86_FEATURE_KAISER)) - return sprintf(buf, "Mitigation: PTI\n"); + + switch (bug) { + case X86_BUG_CPU_MELTDOWN: + if (boot_cpu_has(X86_FEATURE_KAISER)) + return sprintf(buf, "Mitigation: PTI\n"); + + break; + + case X86_BUG_SPECTRE_V1: + return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + + case X86_BUG_SPECTRE_V2: + return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], + boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", + boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", + spectre_v2_module_string()); + + case X86_BUG_SPEC_STORE_BYPASS: + return sprintf(buf, "%s\n", ssb_strings[ssb_mode]); + + default: + break; + } + return sprintf(buf, "Vulnerable\n"); } -ssize_t cpu_show_spectre_v1(struct device *dev, - struct device_attribute *attr, char *buf) +ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) { - if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V1)) - return sprintf(buf, "Not affected\n"); - return sprintf(buf, "Mitigation: __user pointer sanitization\n"); + return cpu_show_common(dev, attr, buf, X86_BUG_CPU_MELTDOWN); } -ssize_t cpu_show_spectre_v2(struct device *dev, - struct device_attribute *attr, char *buf) +ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf) { - if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) - return sprintf(buf, "Not affected\n"); + return cpu_show_common(dev, attr, buf, X86_BUG_SPECTRE_V1); +} - return sprintf(buf, "%s%s\n", spectre_v2_strings[spectre_v2_enabled], - spectre_v2_module_string()); +ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf) +{ + return cpu_show_common(dev, attr, buf, X86_BUG_SPECTRE_V2); +} + +ssize_t cpu_show_spec_store_bypass(struct device *dev, struct device_attribute *attr, char *buf) +{ + return cpu_show_common(dev, attr, buf, X86_BUG_SPEC_STORE_BYPASS); } #endif diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 814276d0eed1..3d21b28f9826 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -43,6 +43,8 @@ #include <asm/pat.h> #include <asm/microcode.h> #include <asm/microcode_intel.h> +#include <asm/intel-family.h> +#include <asm/cpu_device_id.h> #ifdef CONFIG_X86_LOCAL_APIC #include <asm/uv/uv.h> @@ -674,6 +676,40 @@ static void apply_forced_caps(struct cpuinfo_x86 *c) } } +static void init_speculation_control(struct cpuinfo_x86 *c) +{ + /* + * The Intel SPEC_CTRL CPUID bit implies IBRS and IBPB support, + * and they also have a different bit for STIBP support. Also, + * a hypervisor might have set the individual AMD bits even on + * Intel CPUs, for finer-grained selection of what's available. + */ + if (cpu_has(c, X86_FEATURE_SPEC_CTRL)) { + set_cpu_cap(c, X86_FEATURE_IBRS); + set_cpu_cap(c, X86_FEATURE_IBPB); + set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL); + } + + if (cpu_has(c, X86_FEATURE_INTEL_STIBP)) + set_cpu_cap(c, X86_FEATURE_STIBP); + + if (cpu_has(c, X86_FEATURE_SPEC_CTRL_SSBD)) + set_cpu_cap(c, X86_FEATURE_SSBD); + + if (cpu_has(c, X86_FEATURE_AMD_IBRS)) { + set_cpu_cap(c, X86_FEATURE_IBRS); + set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL); + } + + if (cpu_has(c, X86_FEATURE_AMD_IBPB)) + set_cpu_cap(c, X86_FEATURE_IBPB); + + if (cpu_has(c, X86_FEATURE_AMD_STIBP)) { + set_cpu_cap(c, X86_FEATURE_STIBP); + set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL); + } +} + void get_cpu_cap(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx; @@ -686,14 +722,16 @@ void get_cpu_cap(struct cpuinfo_x86 *c) c->x86_capability[CPUID_1_EDX] = edx; } + /* Thermal and Power Management Leaf: level 0x00000006 (eax) */ + if (c->cpuid_level >= 0x00000006) + c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x00000006); + /* Additional Intel-defined flags: level 0x00000007 */ if (c->cpuid_level >= 0x00000007) { cpuid_count(0x00000007, 0, &eax, &ebx, &ecx, &edx); - c->x86_capability[CPUID_7_0_EBX] = ebx; - - c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x00000006); c->x86_capability[CPUID_7_ECX] = ecx; + c->x86_capability[CPUID_7_EDX] = edx; } /* Extended state features: level 0x0000000d */ @@ -764,6 +802,14 @@ void get_cpu_cap(struct cpuinfo_x86 *c) c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a); init_scattered_cpuid_features(c); + init_speculation_control(c); + + /* + * Clear/Set all flags overridden by options, after probe. + * This needs to happen each time we re-probe, which may happen + * several times during CPU initialization. + */ + apply_forced_caps(c); } static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c) @@ -792,6 +838,75 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c) #endif } +static const __initconst struct x86_cpu_id cpu_no_speculation[] = { + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PENWELL, X86_FEATURE_ANY }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PINEVIEW, X86_FEATURE_ANY }, + { X86_VENDOR_CENTAUR, 5 }, + { X86_VENDOR_INTEL, 5 }, + { X86_VENDOR_NSC, 5 }, + { X86_VENDOR_ANY, 4 }, + {} +}; + +static const __initconst struct x86_cpu_id cpu_no_meltdown[] = { + { X86_VENDOR_AMD }, + {} +}; + +static const __initconst struct x86_cpu_id cpu_no_spec_store_bypass[] = { + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PINEVIEW }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_LINCROFT }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_PENWELL }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CLOVERVIEW }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_CEDARVIEW }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT1 }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_AIRMONT }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_SILVERMONT2 }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_ATOM_MERRIFIELD }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_CORE_YONAH }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNL }, + { X86_VENDOR_INTEL, 6, INTEL_FAM6_XEON_PHI_KNM }, + { X86_VENDOR_CENTAUR, 5, }, + { X86_VENDOR_INTEL, 5, }, + { X86_VENDOR_NSC, 5, }, + { X86_VENDOR_AMD, 0x12, }, + { X86_VENDOR_AMD, 0x11, }, + { X86_VENDOR_AMD, 0x10, }, + { X86_VENDOR_AMD, 0xf, }, + { X86_VENDOR_ANY, 4, }, + {} +}; + +static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c) +{ + u64 ia32_cap = 0; + + if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap); + + if (!x86_match_cpu(cpu_no_spec_store_bypass) && + !(ia32_cap & ARCH_CAP_SSB_NO)) + setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS); + + if (x86_match_cpu(cpu_no_speculation)) + return; + + setup_force_cpu_bug(X86_BUG_SPECTRE_V1); + setup_force_cpu_bug(X86_BUG_SPECTRE_V2); + + if (x86_match_cpu(cpu_no_meltdown)) + return; + + /* Rogue Data Cache Load? No! */ + if (ia32_cap & ARCH_CAP_RDCL_NO) + return; + + setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN); +} + /* * Do minimum CPU detection early. * Fields really needed: vendor, cpuid_level, family, model, mask, @@ -838,11 +953,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) setup_force_cpu_cap(X86_FEATURE_ALWAYS); - if (c->x86_vendor != X86_VENDOR_AMD) - setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN); - - setup_force_cpu_bug(X86_BUG_SPECTRE_V1); - setup_force_cpu_bug(X86_BUG_SPECTRE_V2); + cpu_set_bug_bits(c); fpu__init_system(c); @@ -1131,6 +1242,7 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c) enable_sep_cpu(); #endif mtrr_ap_init(); + x86_spec_ctrl_setup_ap(); } struct msr_range { diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index 2584265d4745..3b19d82f7932 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -46,4 +46,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], extern void get_cpu_cap(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); + +extern void x86_spec_ctrl_setup_ap(void); + #endif /* ARCH_X86_CPU_H */ diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 9299e3bdfad6..4dce22d3cb06 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -13,6 +13,7 @@ #include <asm/msr.h> #include <asm/bugs.h> #include <asm/cpu.h> +#include <asm/intel-family.h> #ifdef CONFIG_X86_64 #include <linux/topology.h> @@ -25,6 +26,62 @@ #include <asm/apic.h> #endif +/* + * Early microcode releases for the Spectre v2 mitigation were broken. + * Information taken from; + * - https://newsroom.intel.com/wp-content/uploads/sites/11/2018/03/microcode-update-guidance.pdf + * - https://kb.vmware.com/s/article/52345 + * - Microcode revisions observed in the wild + * - Release note from 20180108 microcode release + */ +struct sku_microcode { + u8 model; + u8 stepping; + u32 microcode; +}; +static const struct sku_microcode spectre_bad_microcodes[] = { + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0B, 0x80 }, + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x0A, 0x80 }, + { INTEL_FAM6_KABYLAKE_DESKTOP, 0x09, 0x80 }, + { INTEL_FAM6_KABYLAKE_MOBILE, 0x0A, 0x80 }, + { INTEL_FAM6_KABYLAKE_MOBILE, 0x09, 0x80 }, + { INTEL_FAM6_SKYLAKE_X, 0x03, 0x0100013e }, + { INTEL_FAM6_SKYLAKE_X, 0x04, 0x0200003c }, + { INTEL_FAM6_BROADWELL_CORE, 0x04, 0x28 }, + { INTEL_FAM6_BROADWELL_GT3E, 0x01, 0x1b }, + { INTEL_FAM6_BROADWELL_XEON_D, 0x02, 0x14 }, + { INTEL_FAM6_BROADWELL_XEON_D, 0x03, 0x07000011 }, + { INTEL_FAM6_BROADWELL_X, 0x01, 0x0b000025 }, + { INTEL_FAM6_HASWELL_ULT, 0x01, 0x21 }, + { INTEL_FAM6_HASWELL_GT3E, 0x01, 0x18 }, + { INTEL_FAM6_HASWELL_CORE, 0x03, 0x23 }, + { INTEL_FAM6_HASWELL_X, 0x02, 0x3b }, + { INTEL_FAM6_HASWELL_X, 0x04, 0x10 }, + { INTEL_FAM6_IVYBRIDGE_X, 0x04, 0x42a }, + /* Observed in the wild */ + { INTEL_FAM6_SANDYBRIDGE_X, 0x06, 0x61b }, + { INTEL_FAM6_SANDYBRIDGE_X, 0x07, 0x712 }, +}; + +static bool bad_spectre_microcode(struct cpuinfo_x86 *c) +{ + int i; + + /* + * We know that the hypervisor lie to us on the microcode version so + * we may as well hope that it is running the correct version. + */ + if (cpu_has(c, X86_FEATURE_HYPERVISOR)) + return false; + + for (i = 0; i < ARRAY_SIZE(spectre_bad_microcodes); i++) { + if (c->x86_model == spectre_bad_microcodes[i].model && + c->x86_mask == spectre_bad_microcodes[i].stepping) + return (c->microcode <= spectre_bad_microcodes[i].microcode); + } + return false; +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; @@ -51,6 +108,22 @@ static void early_init_intel(struct cpuinfo_x86 *c) rdmsr(MSR_IA32_UCODE_REV, lower_word, c->microcode); } + /* Now if any of them are set, check the blacklist and clear the lot */ + if ((cpu_has(c, X86_FEATURE_SPEC_CTRL) || + cpu_has(c, X86_FEATURE_INTEL_STIBP) || + cpu_has(c, X86_FEATURE_IBRS) || cpu_has(c, X86_FEATURE_IBPB) || + cpu_has(c, X86_FEATURE_STIBP)) && bad_spectre_microcode(c)) { + pr_warn("Intel Spectre v2 broken microcode detected; disabling Speculation Control\n"); + setup_clear_cpu_cap(X86_FEATURE_IBRS); + setup_clear_cpu_cap(X86_FEATURE_IBPB); + setup_clear_cpu_cap(X86_FEATURE_STIBP); + setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL); + setup_clear_cpu_cap(X86_FEATURE_MSR_SPEC_CTRL); + setup_clear_cpu_cap(X86_FEATURE_INTEL_STIBP); + setup_clear_cpu_cap(X86_FEATURE_SSBD); + setup_clear_cpu_cap(X86_FEATURE_SPEC_CTRL_SSBD); + } + /* * Atom erratum AAE44/AAF40/AAG38/AAH41: * diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index ddc9b8125918..7b8c8c838191 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2294,9 +2294,6 @@ static ssize_t store_int_with_restart(struct device *s, if (check_interval == old_check_interval) return ret; - if (check_interval < 1) - check_interval = 1; - mutex_lock(&mce_sysfs_mutex); mce_restart(); mutex_unlock(&mce_sysfs_mutex); diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c index 61215a69b03d..b22e9c4dd111 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c @@ -229,7 +229,7 @@ void uncore_perf_event_update(struct intel_uncore_box *box, struct perf_event *e u64 prev_count, new_count, delta; int shift; - if (event->hw.idx >= UNCORE_PMC_IDX_FIXED) + if (event->hw.idx == UNCORE_PMC_IDX_FIXED) shift = 64 - uncore_fixed_ctr_bits(box); else shift = 64 - uncore_perf_ctr_bits(box); diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c index 2749965afed0..83cadc2605a7 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_nhmex.c @@ -240,7 +240,7 @@ static void nhmex_uncore_msr_enable_event(struct intel_uncore_box *box, struct p { struct hw_perf_event *hwc = &event->hw; - if (hwc->idx >= UNCORE_PMC_IDX_FIXED) + if (hwc->idx == UNCORE_PMC_IDX_FIXED) wrmsrl(hwc->config_base, NHMEX_PMON_CTL_EN_BIT0); else if (box->pmu->type->event_mask & NHMEX_PMON_CTL_EN_BIT0) wrmsrl(hwc->config_base, hwc->config | NHMEX_PMON_CTL_EN_BIT22); diff --git a/arch/x86/kernel/irqflags.S b/arch/x86/kernel/irqflags.S new file mode 100644 index 000000000000..3817eb748eb4 --- /dev/null +++ b/arch/x86/kernel/irqflags.S @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#include <asm/asm.h> +#include <asm-generic/export.h> +#include <linux/linkage.h> + +/* + * unsigned long native_save_fl(void) + */ +ENTRY(native_save_fl) + pushf + pop %_ASM_AX + ret +ENDPROC(native_save_fl) +EXPORT_SYMBOL(native_save_fl) + +/* + * void native_restore_fl(unsigned long flags) + * %eax/%rdi: flags + */ +ENTRY(native_restore_fl) + push %_ASM_ARG1 + popf + ret +ENDPROC(native_restore_fl) +EXPORT_SYMBOL(native_restore_fl) diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index bc429365b72a..8bc68cfc0d33 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -119,7 +119,7 @@ static void free_ldt_struct(struct ldt_struct *ldt) * we do not have to muck with descriptors here, that is * done in switch_mm() as needed. */ -int init_new_context(struct task_struct *tsk, struct mm_struct *mm) +int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm) { struct ldt_struct *new_ldt; struct mm_struct *old_mm; @@ -160,7 +160,7 @@ out_unlock: * * 64bit: Don't touch the LDT register - we're already in the next thread. */ -void destroy_context(struct mm_struct *mm) +void destroy_context_ldt(struct mm_struct *mm) { free_ldt_struct(mm->context.ldt); mm->context.ldt = NULL; diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 2754dad26839..911ca11eb489 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -31,6 +31,7 @@ #include <asm/tlbflush.h> #include <asm/mce.h> #include <asm/vm86.h> +#include <asm/spec-ctrl.h> /* * per-CPU TSS segments. Threads are completely 'soft' on Linux, @@ -116,11 +117,6 @@ void flush_thread(void) fpu__clear(&tsk->thread.fpu); } -static void hard_disable_TSC(void) -{ - cr4_set_bits(X86_CR4_TSD); -} - void disable_TSC(void) { preempt_disable(); @@ -129,15 +125,10 @@ void disable_TSC(void) * Must flip the CPU state synchronously with * TIF_NOTSC in the current running context. */ - hard_disable_TSC(); + cr4_set_bits(X86_CR4_TSD); preempt_enable(); } -static void hard_enable_TSC(void) -{ - cr4_clear_bits(X86_CR4_TSD); -} - static void enable_TSC(void) { preempt_disable(); @@ -146,7 +137,7 @@ static void enable_TSC(void) * Must flip the CPU state synchronously with * TIF_NOTSC in the current running context. */ - hard_enable_TSC(); + cr4_clear_bits(X86_CR4_TSD); preempt_enable(); } @@ -174,48 +165,199 @@ int set_tsc_mode(unsigned int val) return 0; } -void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, - struct tss_struct *tss) +static inline void switch_to_bitmap(struct tss_struct *tss, + struct thread_struct *prev, + struct thread_struct *next, + unsigned long tifp, unsigned long tifn) { - struct thread_struct *prev, *next; - - prev = &prev_p->thread; - next = &next_p->thread; - - if (test_tsk_thread_flag(prev_p, TIF_BLOCKSTEP) ^ - test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) { - unsigned long debugctl = get_debugctlmsr(); - - debugctl &= ~DEBUGCTLMSR_BTF; - if (test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) - debugctl |= DEBUGCTLMSR_BTF; - - update_debugctlmsr(debugctl); - } - - if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^ - test_tsk_thread_flag(next_p, TIF_NOTSC)) { - /* prev and next are different */ - if (test_tsk_thread_flag(next_p, TIF_NOTSC)) - hard_disable_TSC(); - else - hard_enable_TSC(); - } - - if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) { + if (tifn & _TIF_IO_BITMAP) { /* * Copy the relevant range of the IO bitmap. * Normally this is 128 bytes or less: */ memcpy(tss->io_bitmap, next->io_bitmap_ptr, max(prev->io_bitmap_max, next->io_bitmap_max)); - } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) { + } else if (tifp & _TIF_IO_BITMAP) { /* * Clear any possible leftover bits: */ memset(tss->io_bitmap, 0xff, prev->io_bitmap_max); } +} + +#ifdef CONFIG_SMP + +struct ssb_state { + struct ssb_state *shared_state; + raw_spinlock_t lock; + unsigned int disable_state; + unsigned long local_state; +}; + +#define LSTATE_SSB 0 + +static DEFINE_PER_CPU(struct ssb_state, ssb_state); + +void speculative_store_bypass_ht_init(void) +{ + struct ssb_state *st = this_cpu_ptr(&ssb_state); + unsigned int this_cpu = smp_processor_id(); + unsigned int cpu; + + st->local_state = 0; + + /* + * Shared state setup happens once on the first bringup + * of the CPU. It's not destroyed on CPU hotunplug. + */ + if (st->shared_state) + return; + + raw_spin_lock_init(&st->lock); + + /* + * Go over HT siblings and check whether one of them has set up the + * shared state pointer already. + */ + for_each_cpu(cpu, topology_sibling_cpumask(this_cpu)) { + if (cpu == this_cpu) + continue; + + if (!per_cpu(ssb_state, cpu).shared_state) + continue; + + /* Link it to the state of the sibling: */ + st->shared_state = per_cpu(ssb_state, cpu).shared_state; + return; + } + + /* + * First HT sibling to come up on the core. Link shared state of + * the first HT sibling to itself. The siblings on the same core + * which come up later will see the shared state pointer and link + * themself to the state of this CPU. + */ + st->shared_state = st; +} + +/* + * Logic is: First HT sibling enables SSBD for both siblings in the core + * and last sibling to disable it, disables it for the whole core. This how + * MSR_SPEC_CTRL works in "hardware": + * + * CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL + */ +static __always_inline void amd_set_core_ssb_state(unsigned long tifn) +{ + struct ssb_state *st = this_cpu_ptr(&ssb_state); + u64 msr = x86_amd_ls_cfg_base; + + if (!static_cpu_has(X86_FEATURE_ZEN)) { + msr |= ssbd_tif_to_amd_ls_cfg(tifn); + wrmsrl(MSR_AMD64_LS_CFG, msr); + return; + } + + if (tifn & _TIF_SSBD) { + /* + * Since this can race with prctl(), block reentry on the + * same CPU. + */ + if (__test_and_set_bit(LSTATE_SSB, &st->local_state)) + return; + + msr |= x86_amd_ls_cfg_ssbd_mask; + + raw_spin_lock(&st->shared_state->lock); + /* First sibling enables SSBD: */ + if (!st->shared_state->disable_state) + wrmsrl(MSR_AMD64_LS_CFG, msr); + st->shared_state->disable_state++; + raw_spin_unlock(&st->shared_state->lock); + } else { + if (!__test_and_clear_bit(LSTATE_SSB, &st->local_state)) + return; + + raw_spin_lock(&st->shared_state->lock); + st->shared_state->disable_state--; + if (!st->shared_state->disable_state) + wrmsrl(MSR_AMD64_LS_CFG, msr); + raw_spin_unlock(&st->shared_state->lock); + } +} +#else +static __always_inline void amd_set_core_ssb_state(unsigned long tifn) +{ + u64 msr = x86_amd_ls_cfg_base | ssbd_tif_to_amd_ls_cfg(tifn); + + wrmsrl(MSR_AMD64_LS_CFG, msr); +} +#endif + +static __always_inline void amd_set_ssb_virt_state(unsigned long tifn) +{ + /* + * SSBD has the same definition in SPEC_CTRL and VIRT_SPEC_CTRL, + * so ssbd_tif_to_spec_ctrl() just works. + */ + wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn)); +} + +static __always_inline void intel_set_ssb_state(unsigned long tifn) +{ + u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn); + + wrmsrl(MSR_IA32_SPEC_CTRL, msr); +} + +static __always_inline void __speculative_store_bypass_update(unsigned long tifn) +{ + if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) + amd_set_ssb_virt_state(tifn); + else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) + amd_set_core_ssb_state(tifn); + else + intel_set_ssb_state(tifn); +} + +void speculative_store_bypass_update(unsigned long tif) +{ + preempt_disable(); + __speculative_store_bypass_update(tif); + preempt_enable(); +} + +void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, + struct tss_struct *tss) +{ + struct thread_struct *prev, *next; + unsigned long tifp, tifn; + + prev = &prev_p->thread; + next = &next_p->thread; + + tifn = READ_ONCE(task_thread_info(next_p)->flags); + tifp = READ_ONCE(task_thread_info(prev_p)->flags); + switch_to_bitmap(tss, prev, next, tifp, tifn); + propagate_user_return_notify(prev_p, next_p); + + if ((tifp & _TIF_BLOCKSTEP || tifn & _TIF_BLOCKSTEP) && + arch_has_block_step()) { + unsigned long debugctl, msk; + + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); + debugctl &= ~DEBUGCTLMSR_BTF; + msk = tifn & _TIF_BLOCKSTEP; + debugctl |= (msk >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BTF_SHIFT; + wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl); + } + + if ((tifp ^ tifn) & _TIF_NOTSC) + cr4_toggle_bits(X86_CR4_TSD); + + if ((tifp ^ tifn) & _TIF_SSBD) + __speculative_store_bypass_update(tifn); } /* diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 1f7aefc7b0b4..c017f1c71560 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -75,6 +75,7 @@ #include <asm/i8259.h> #include <asm/realmode.h> #include <asm/misc.h> +#include <asm/spec-ctrl.h> /* Number of siblings per CPU package */ int smp_num_siblings = 1; @@ -217,6 +218,8 @@ static void notrace start_secondary(void *unused) */ check_tsc_sync_target(); + speculative_store_bypass_ht_init(); + /* * Lock vector_lock and initialize the vectors on this cpu * before setting the cpu online. We must set it online with @@ -1209,6 +1212,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) set_mtrr_aps_delayed_init(); smp_quirk_init_udelay(); + + speculative_store_bypass_ht_init(); } void arch_enable_nonboot_cpus_begin(void) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 42654375b73f..df7827a981dd 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -37,7 +37,7 @@ #include <asm/desc.h> #include <asm/debugreg.h> #include <asm/kvm_para.h> -#include <asm/nospec-branch.h> +#include <asm/spec-ctrl.h> #include <asm/virtext.h> #include "trace.h" diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 63c44a9bf6bb..c5a4b1978cbf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -48,7 +48,7 @@ #include <asm/kexec.h> #include <asm/apic.h> #include <asm/irq_remapping.h> -#include <asm/nospec-branch.h> +#include <asm/spec-ctrl.h> #include "trace.h" #include "pmu.h" @@ -6843,6 +6843,8 @@ static int handle_vmon(struct kvm_vcpu *vcpu) HRTIMER_MODE_REL); vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; + vmx->nested.vpid02 = allocate_vpid(); + vmx->nested.vmxon = true; skip_emulated_instruction(vcpu); @@ -8887,10 +8889,8 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) goto free_vmcs; } - if (nested) { + if (nested) nested_vmx_setup_ctls_msrs(vmx); - vmx->nested.vpid02 = allocate_vpid(); - } vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull; @@ -8899,7 +8899,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id) return &vmx->vcpu; free_vmcs: - free_vpid(vmx->nested.vpid02); free_loaded_vmcs(vmx->loaded_vmcs); free_msrs: kfree(vmx->guest_msrs); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 7cad01af6dcd..6d683bbb3502 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -10,6 +10,7 @@ #include <asm/tlbflush.h> #include <asm/mmu_context.h> +#include <asm/nospec-branch.h> #include <asm/cache.h> #include <asm/apic.h> #include <asm/uv/uv.h> @@ -29,6 +30,8 @@ * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi */ +atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1); + struct flush_tlb_info { struct mm_struct *flush_mm; unsigned long flush_start; @@ -104,6 +107,36 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, unsigned cpu = smp_processor_id(); if (likely(prev != next)) { + u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id); + + /* + * Avoid user/user BTB poisoning by flushing the branch + * predictor when switching between processes. This stops + * one process from doing Spectre-v2 attacks on another. + * + * As an optimization, flush indirect branches only when + * switching into processes that disable dumping. This + * protects high value processes like gpg, without having + * too high performance overhead. IBPB is *expensive*! + * + * This will not flush branches when switching into kernel + * threads. It will also not flush if we switch to idle + * thread and back to the same process. It will flush if we + * switch to a different non-dumpable process. + */ + if (tsk && tsk->mm && + tsk->mm->context.ctx_id != last_ctx_id && + get_dumpable(tsk->mm) != SUID_DUMP_USER) + indirect_branch_prediction_barrier(); + + /* + * Record last user mm's context id, so we can avoid + * flushing branch buffer with IBPB if we switch back + * to the same user. + */ + if (next != &init_mm) + this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id); + this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK); this_cpu_write(cpu_tlbstate.active_mm, next); cpumask_set_cpu(cpu, mm_cpumask(next)); diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index a0ac0f9c307f..f5a8cd96bae4 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -40,6 +40,7 @@ #include <asm/fixmap.h> #include <asm/realmode.h> #include <asm/time.h> +#include <asm/nospec-branch.h> /* * We allocate runtime services regions bottom-up, starting from -4G, i.e. @@ -347,6 +348,7 @@ extern efi_status_t efi64_thunk(u32, ...); \ efi_sync_low_kernel_mappings(); \ local_irq_save(flags); \ + firmware_restrict_branch_speculation_start(); \ \ efi_scratch.prev_cr3 = read_cr3(); \ write_cr3((unsigned long)efi_scratch.efi_pgt); \ @@ -357,6 +359,7 @@ extern efi_status_t efi64_thunk(u32, ...); \ write_cr3(efi_scratch.prev_cr3); \ __flush_tlb_all(); \ + firmware_restrict_branch_speculation_end(); \ local_irq_restore(flags); \ \ __s; \ diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index cbef64b508e1..82fd84d5e1aa 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -460,6 +460,12 @@ static void __init xen_init_cpuid_mask(void) cpuid_leaf1_ecx_set_mask = (1 << (X86_FEATURE_MWAIT % 32)); } +static void __init xen_init_capabilities(void) +{ + if (xen_pv_domain()) + setup_force_cpu_cap(X86_FEATURE_XENPV); +} + static void xen_set_debugreg(int reg, unsigned long val) { HYPERVISOR_set_debugreg(reg, val); @@ -1587,6 +1593,7 @@ asmlinkage __visible void __init xen_start_kernel(void) xen_init_irq_ops(); xen_init_cpuid_mask(); + xen_init_capabilities(); #ifdef CONFIG_X86_LOCAL_APIC /* @@ -1883,14 +1890,6 @@ bool xen_hvm_need_lapic(void) } EXPORT_SYMBOL_GPL(xen_hvm_need_lapic); -static void xen_set_cpu_features(struct cpuinfo_x86 *c) -{ - if (xen_pv_domain()) { - clear_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS); - set_cpu_cap(c, X86_FEATURE_XENPV); - } -} - const struct hypervisor_x86 x86_hyper_xen = { .name = "Xen", .detect = xen_platform, @@ -1898,7 +1897,6 @@ const struct hypervisor_x86 x86_hyper_xen = { .init_platform = xen_hvm_guest_init, #endif .x2apic_available = xen_x2apic_para_available, - .set_cpu_features = xen_set_cpu_features, }; EXPORT_SYMBOL(x86_hyper_xen); diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 3f4ebf0261f2..29e50d1229bc 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -28,6 +28,7 @@ #include <xen/interface/vcpu.h> #include <xen/interface/xenpmu.h> +#include <asm/spec-ctrl.h> #include <asm/xen/interface.h> #include <asm/xen/hypercall.h> @@ -87,6 +88,8 @@ static void cpu_bringup(void) cpu_data(cpu).x86_max_cores = 1; set_cpu_sibling_map(cpu); + speculative_store_bypass_ht_init(); + xen_setup_cpu_clockevents(); notify_cpu_starting(cpu); @@ -357,6 +360,8 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus) } set_cpu_sibling_map(0); + speculative_store_bypass_ht_init(); + xen_pmu_init(0); if (xen_smp_intr_init(0)) diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c index 7f664c416faf..4ecd0de08557 100644 --- a/arch/x86/xen/suspend.c +++ b/arch/x86/xen/suspend.c @@ -1,11 +1,14 @@ #include <linux/types.h> #include <linux/tick.h> +#include <linux/percpu-defs.h> #include <xen/xen.h> #include <xen/interface/xen.h> #include <xen/grant_table.h> #include <xen/events.h> +#include <asm/cpufeatures.h> +#include <asm/msr-index.h> #include <asm/xen/hypercall.h> #include <asm/xen/page.h> #include <asm/fixmap.h> @@ -68,6 +71,8 @@ static void xen_pv_post_suspend(int suspend_cancelled) xen_mm_unpin_all(); } +static DEFINE_PER_CPU(u64, spec_ctrl); + void xen_arch_pre_suspend(void) { if (xen_pv_domain()) @@ -84,6 +89,9 @@ void xen_arch_post_suspend(int cancelled) static void xen_vcpu_notify_restore(void *data) { + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) + wrmsrl(MSR_IA32_SPEC_CTRL, this_cpu_read(spec_ctrl)); + /* Boot processor notified via generic timekeeping_resume() */ if (smp_processor_id() == 0) return; @@ -93,7 +101,15 @@ static void xen_vcpu_notify_restore(void *data) static void xen_vcpu_notify_suspend(void *data) { + u64 tmp; + tick_suspend_local(); + + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) { + rdmsrl(MSR_IA32_SPEC_CTRL, tmp); + this_cpu_write(spec_ctrl, tmp); + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + } } void xen_arch_resume(void) diff --git a/block/blk-core.c b/block/blk-core.c index 3b2d557ff7c7..e5f8a1482eb7 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -661,21 +661,17 @@ EXPORT_SYMBOL(blk_alloc_queue); int blk_queue_enter(struct request_queue *q, gfp_t gfp) { while (true) { - int ret; - if (percpu_ref_tryget_live(&q->q_usage_counter)) return 0; if (!gfpflags_allow_blocking(gfp)) return -EBUSY; - ret = wait_event_interruptible(q->mq_freeze_wq, - !atomic_read(&q->mq_freeze_depth) || - blk_queue_dying(q)); + wait_event(q->mq_freeze_wq, + !atomic_read(&q->mq_freeze_depth) || + blk_queue_dying(q)); if (blk_queue_dying(q)) return -ENODEV; - if (ret) - return ret; } } diff --git a/crypto/algapi.c b/crypto/algapi.c index eb58b73ca925..ac70fd5cd404 100644 --- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -1001,6 +1001,21 @@ unsigned int crypto_alg_extsize(struct crypto_alg *alg) } EXPORT_SYMBOL_GPL(crypto_alg_extsize); +int crypto_type_has_alg(const char *name, const struct crypto_type *frontend, + u32 type, u32 mask) +{ + int ret = 0; + struct crypto_alg *alg = crypto_find_alg(name, frontend, type, mask); + + if (!IS_ERR(alg)) { + crypto_mod_put(alg); + ret = 1; + } + + return ret; +} +EXPORT_SYMBOL_GPL(crypto_type_has_alg); + static int __init crypto_algapi_init(void) { crypto_init_proc(); diff --git a/crypto/authenc.c b/crypto/authenc.c index 55a354d57251..b7290c5b1eaa 100644 --- a/crypto/authenc.c +++ b/crypto/authenc.c @@ -108,6 +108,7 @@ static int crypto_authenc_setkey(struct crypto_aead *authenc, const u8 *key, CRYPTO_TFM_RES_MASK); out: + memzero_explicit(&keys, sizeof(keys)); return err; badkey: diff --git a/crypto/authencesn.c b/crypto/authencesn.c index 52154ef21b5e..fa0c4567f697 100644 --- a/crypto/authencesn.c +++ b/crypto/authencesn.c @@ -90,6 +90,7 @@ static int crypto_authenc_esn_setkey(struct crypto_aead *authenc_esn, const u8 * CRYPTO_TFM_RES_MASK); out: + memzero_explicit(&keys, sizeof(keys)); return err; badkey: diff --git a/crypto/internal.h b/crypto/internal.h index 00e42a3ed814..7eefcdb00227 100644 --- a/crypto/internal.h +++ b/crypto/internal.h @@ -104,6 +104,9 @@ int crypto_probing_notify(unsigned long val, void *v); unsigned int crypto_alg_extsize(struct crypto_alg *alg); +int crypto_type_has_alg(const char *name, const struct crypto_type *frontend, + u32 type, u32 mask); + static inline struct crypto_alg *crypto_alg_get(struct crypto_alg *alg) { atomic_inc(&alg->cra_refcnt); diff --git a/crypto/skcipher.c b/crypto/skcipher.c index d199c0b1751c..88349cf02220 100644 --- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -16,7 +16,11 @@ #include <crypto/internal/skcipher.h> #include <linux/bug.h> +#include <linux/cryptouser.h> #include <linux/module.h> +#include <linux/rtnetlink.h> +#include <linux/seq_file.h> +#include <net/netlink.h> #include "internal.h" @@ -25,10 +29,11 @@ static unsigned int crypto_skcipher_extsize(struct crypto_alg *alg) if (alg->cra_type == &crypto_blkcipher_type) return sizeof(struct crypto_blkcipher *); - BUG_ON(alg->cra_type != &crypto_ablkcipher_type && - alg->cra_type != &crypto_givcipher_type); + if (alg->cra_type == &crypto_ablkcipher_type || + alg->cra_type == &crypto_givcipher_type) + return sizeof(struct crypto_ablkcipher *); - return sizeof(struct crypto_ablkcipher *); + return crypto_alg_extsize(alg); } static int skcipher_setkey_blkcipher(struct crypto_skcipher *tfm, @@ -118,7 +123,7 @@ static int crypto_init_skcipher_ops_blkcipher(struct crypto_tfm *tfm) skcipher->decrypt = skcipher_decrypt_blkcipher; skcipher->ivsize = crypto_blkcipher_ivsize(blkcipher); - skcipher->has_setkey = calg->cra_blkcipher.max_keysize; + skcipher->keysize = calg->cra_blkcipher.max_keysize; return 0; } @@ -211,31 +216,123 @@ static int crypto_init_skcipher_ops_ablkcipher(struct crypto_tfm *tfm) skcipher->ivsize = crypto_ablkcipher_ivsize(ablkcipher); skcipher->reqsize = crypto_ablkcipher_reqsize(ablkcipher) + sizeof(struct ablkcipher_request); - skcipher->has_setkey = calg->cra_ablkcipher.max_keysize; + skcipher->keysize = calg->cra_ablkcipher.max_keysize; return 0; } +static void crypto_skcipher_exit_tfm(struct crypto_tfm *tfm) +{ + struct crypto_skcipher *skcipher = __crypto_skcipher_cast(tfm); + struct skcipher_alg *alg = crypto_skcipher_alg(skcipher); + + alg->exit(skcipher); +} + static int crypto_skcipher_init_tfm(struct crypto_tfm *tfm) { + struct crypto_skcipher *skcipher = __crypto_skcipher_cast(tfm); + struct skcipher_alg *alg = crypto_skcipher_alg(skcipher); + if (tfm->__crt_alg->cra_type == &crypto_blkcipher_type) return crypto_init_skcipher_ops_blkcipher(tfm); - BUG_ON(tfm->__crt_alg->cra_type != &crypto_ablkcipher_type && - tfm->__crt_alg->cra_type != &crypto_givcipher_type); + if (tfm->__crt_alg->cra_type == &crypto_ablkcipher_type || + tfm->__crt_alg->cra_type == &crypto_givcipher_type) + return crypto_init_skcipher_ops_ablkcipher(tfm); + + skcipher->setkey = alg->setkey; + skcipher->encrypt = alg->encrypt; + skcipher->decrypt = alg->decrypt; + skcipher->ivsize = alg->ivsize; + skcipher->keysize = alg->max_keysize; + + if (alg->exit) + skcipher->base.exit = crypto_skcipher_exit_tfm; - return crypto_init_skcipher_ops_ablkcipher(tfm); + if (alg->init) + return alg->init(skcipher); + + return 0; +} + +static void crypto_skcipher_free_instance(struct crypto_instance *inst) +{ + struct skcipher_instance *skcipher = + container_of(inst, struct skcipher_instance, s.base); + + skcipher->free(skcipher); +} + +static void crypto_skcipher_show(struct seq_file *m, struct crypto_alg *alg) + __attribute__ ((unused)); +static void crypto_skcipher_show(struct seq_file *m, struct crypto_alg *alg) +{ + struct skcipher_alg *skcipher = container_of(alg, struct skcipher_alg, + base); + + seq_printf(m, "type : skcipher\n"); + seq_printf(m, "async : %s\n", + alg->cra_flags & CRYPTO_ALG_ASYNC ? "yes" : "no"); + seq_printf(m, "blocksize : %u\n", alg->cra_blocksize); + seq_printf(m, "min keysize : %u\n", skcipher->min_keysize); + seq_printf(m, "max keysize : %u\n", skcipher->max_keysize); + seq_printf(m, "ivsize : %u\n", skcipher->ivsize); + seq_printf(m, "chunksize : %u\n", skcipher->chunksize); } +#ifdef CONFIG_NET +static int crypto_skcipher_report(struct sk_buff *skb, struct crypto_alg *alg) +{ + struct crypto_report_blkcipher rblkcipher; + struct skcipher_alg *skcipher = container_of(alg, struct skcipher_alg, + base); + + strlcpy(rblkcipher.type, "skcipher", sizeof(rblkcipher.type)); + strlcpy(rblkcipher.geniv, "<none>", sizeof(rblkcipher.geniv)); + + rblkcipher.blocksize = alg->cra_blocksize; + rblkcipher.min_keysize = skcipher->min_keysize; + rblkcipher.max_keysize = skcipher->max_keysize; + rblkcipher.ivsize = skcipher->ivsize; + + if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER, + sizeof(struct crypto_report_blkcipher), &rblkcipher)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return -EMSGSIZE; +} +#else +static int crypto_skcipher_report(struct sk_buff *skb, struct crypto_alg *alg) +{ + return -ENOSYS; +} +#endif + static const struct crypto_type crypto_skcipher_type2 = { .extsize = crypto_skcipher_extsize, .init_tfm = crypto_skcipher_init_tfm, + .free = crypto_skcipher_free_instance, +#ifdef CONFIG_PROC_FS + .show = crypto_skcipher_show, +#endif + .report = crypto_skcipher_report, .maskclear = ~CRYPTO_ALG_TYPE_MASK, .maskset = CRYPTO_ALG_TYPE_BLKCIPHER_MASK, - .type = CRYPTO_ALG_TYPE_BLKCIPHER, + .type = CRYPTO_ALG_TYPE_SKCIPHER, .tfmsize = offsetof(struct crypto_skcipher, base), }; +int crypto_grab_skcipher2(struct crypto_skcipher_spawn *spawn, + const char *name, u32 type, u32 mask) +{ + spawn->base.frontend = &crypto_skcipher_type2; + return crypto_grab_spawn(&spawn->base, name, type, mask); +} +EXPORT_SYMBOL_GPL(crypto_grab_skcipher2); + struct crypto_skcipher *crypto_alloc_skcipher(const char *alg_name, u32 type, u32 mask) { @@ -243,5 +340,90 @@ struct crypto_skcipher *crypto_alloc_skcipher(const char *alg_name, } EXPORT_SYMBOL_GPL(crypto_alloc_skcipher); +int crypto_has_skcipher2(const char *alg_name, u32 type, u32 mask) +{ + return crypto_type_has_alg(alg_name, &crypto_skcipher_type2, + type, mask); +} +EXPORT_SYMBOL_GPL(crypto_has_skcipher2); + +static int skcipher_prepare_alg(struct skcipher_alg *alg) +{ + struct crypto_alg *base = &alg->base; + + if (alg->ivsize > PAGE_SIZE / 8 || alg->chunksize > PAGE_SIZE / 8) + return -EINVAL; + + if (!alg->chunksize) + alg->chunksize = base->cra_blocksize; + + base->cra_type = &crypto_skcipher_type2; + base->cra_flags &= ~CRYPTO_ALG_TYPE_MASK; + base->cra_flags |= CRYPTO_ALG_TYPE_SKCIPHER; + + return 0; +} + +int crypto_register_skcipher(struct skcipher_alg *alg) +{ + struct crypto_alg *base = &alg->base; + int err; + + err = skcipher_prepare_alg(alg); + if (err) + return err; + + return crypto_register_alg(base); +} +EXPORT_SYMBOL_GPL(crypto_register_skcipher); + +void crypto_unregister_skcipher(struct skcipher_alg *alg) +{ + crypto_unregister_alg(&alg->base); +} +EXPORT_SYMBOL_GPL(crypto_unregister_skcipher); + +int crypto_register_skciphers(struct skcipher_alg *algs, int count) +{ + int i, ret; + + for (i = 0; i < count; i++) { + ret = crypto_register_skcipher(&algs[i]); + if (ret) + goto err; + } + + return 0; + +err: + for (--i; i >= 0; --i) + crypto_unregister_skcipher(&algs[i]); + + return ret; +} +EXPORT_SYMBOL_GPL(crypto_register_skciphers); + +void crypto_unregister_skciphers(struct skcipher_alg *algs, int count) +{ + int i; + + for (i = count - 1; i >= 0; --i) + crypto_unregister_skcipher(&algs[i]); +} +EXPORT_SYMBOL_GPL(crypto_unregister_skciphers); + +int skcipher_register_instance(struct crypto_template *tmpl, + struct skcipher_instance *inst) +{ + int err; + + err = skcipher_prepare_alg(&inst->alg); + if (err) + return err; + + return crypto_register_instance(tmpl, skcipher_crypto_instance(inst)); +} +EXPORT_SYMBOL_GPL(skcipher_register_instance); + MODULE_LICENSE("GPL"); MODULE_DESCRIPTION("Symmetric key cipher type"); diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index ae3fe4e64203..3b0b4bd67b71 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -472,9 +472,11 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm) } control = OSC_PCI_EXPRESS_CAPABILITY_CONTROL - | OSC_PCI_EXPRESS_NATIVE_HP_CONTROL | OSC_PCI_EXPRESS_PME_CONTROL; + if (IS_ENABLED(CONFIG_HOTPLUG_PCI_PCIE)) + control |= OSC_PCI_EXPRESS_NATIVE_HP_CONTROL; + if (pci_aer_available()) { if (aer_acpi_firmware_first()) dev_info(&device->dev, diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig index 01de42c8b74b..63ed9ceebf7b 100644 --- a/drivers/android/Kconfig +++ b/drivers/android/Kconfig @@ -9,7 +9,7 @@ if ANDROID config ANDROID_BINDER_IPC bool "Android Binder IPC Driver" - depends on MMU + depends on MMU && !M68K default n ---help--- Binder is used in Android for both communication between processes, @@ -31,19 +31,6 @@ config ANDROID_BINDER_DEVICES created. Each binder device has its own context manager, and is therefore logically separated from the other devices. -config ANDROID_BINDER_IPC_32BIT - bool - depends on !64BIT && ANDROID_BINDER_IPC - default y - ---help--- - The Binder API has been changed to support both 32 and 64bit - applications in a mixed environment. - - Enable this to support an old 32-bit Android user-space (v4.4 and - earlier). - - Note that enabling this will break newer Android user-space. - config ANDROID_BINDER_IPC_SELFTEST bool "Android Binder IPC Driver Selftest" depends on ANDROID_BINDER_IPC diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 2299c661b324..623094457788 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -71,10 +71,6 @@ #include <linux/security.h> #include <linux/spinlock.h> -#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT -#define BINDER_IPC_32BIT 1 -#endif - #include <uapi/linux/android/binder.h> #include "binder_alloc.h" #include "binder_trace.h" @@ -143,7 +139,7 @@ enum { }; static uint32_t binder_debug_mask = BINDER_DEBUG_USER_ERROR | BINDER_DEBUG_FAILED_TRANSACTION | BINDER_DEBUG_DEAD_TRANSACTION; -module_param_named(debug_mask, binder_debug_mask, uint, S_IWUSR | S_IRUGO); +module_param_named(debug_mask, binder_debug_mask, uint, 0644); static char *binder_devices_param = CONFIG_ANDROID_BINDER_DEVICES; module_param_named(devices, binder_devices_param, charp, S_IRUGO); @@ -162,7 +158,7 @@ static int binder_set_stop_on_user_error(const char *val, return ret; } module_param_call(stop_on_user_error, binder_set_stop_on_user_error, - param_get_int, &binder_stop_on_user_error, S_IWUSR | S_IRUGO); + param_get_int, &binder_stop_on_user_error, 0644); #define binder_debug(mask, x...) \ do { \ @@ -251,7 +247,7 @@ static struct binder_transaction_log_entry *binder_transaction_log_add( unsigned int cur = atomic_inc_return(&log->cur); if (cur >= ARRAY_SIZE(log->entry)) - log->full = 1; + log->full = true; e = &log->entry[cur % ARRAY_SIZE(log->entry)]; WRITE_ONCE(e->debug_id_done, 0); /* @@ -466,8 +462,9 @@ struct binder_ref { }; enum binder_deferred_state { - BINDER_DEFERRED_FLUSH = 0x01, - BINDER_DEFERRED_RELEASE = 0x02, + BINDER_DEFERRED_PUT_FILES = 0x01, + BINDER_DEFERRED_FLUSH = 0x02, + BINDER_DEFERRED_RELEASE = 0x04, }; /** @@ -504,6 +501,9 @@ struct binder_priority { * (invariant after initialized) * @tsk task_struct for group_leader of process * (invariant after initialized) + * @files files_struct for process + * (protected by @files_lock) + * @files_lock mutex to protect @files * @deferred_work_node: element for binder_deferred_list * (protected by binder_deferred_lock) * @deferred_work: bitmap of deferred work to perform @@ -548,6 +548,8 @@ struct binder_proc { struct list_head waiting_threads; int pid; struct task_struct *tsk; + struct files_struct *files; + struct mutex files_lock; struct hlist_node deferred_work_node; int deferred_work; bool is_dead; @@ -942,33 +944,27 @@ static void binder_free_thread(struct binder_thread *thread); static void binder_free_proc(struct binder_proc *proc); static void binder_inc_node_tmpref_ilocked(struct binder_node *node); -struct files_struct *binder_get_files_struct(struct binder_proc *proc) -{ - return get_files_struct(proc->tsk); -} - static int task_get_unused_fd_flags(struct binder_proc *proc, int flags) { - struct files_struct *files; unsigned long rlim_cur; unsigned long irqs; int ret; - files = binder_get_files_struct(proc); - if (files == NULL) - return -ESRCH; - + mutex_lock(&proc->files_lock); + if (proc->files == NULL) { + ret = -ESRCH; + goto err; + } if (!lock_task_sighand(proc->tsk, &irqs)) { ret = -EMFILE; goto err; } - rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE); unlock_task_sighand(proc->tsk, &irqs); - ret = __alloc_fd(files, 0, rlim_cur, flags); + ret = __alloc_fd(proc->files, 0, rlim_cur, flags); err: - put_files_struct(files); + mutex_unlock(&proc->files_lock); return ret; } @@ -978,12 +974,10 @@ err: static void task_fd_install( struct binder_proc *proc, unsigned int fd, struct file *file) { - struct files_struct *files = binder_get_files_struct(proc); - - if (files) { - __fd_install(files, fd, file); - put_files_struct(files); - } + mutex_lock(&proc->files_lock); + if (proc->files) + __fd_install(proc->files, fd, file); + mutex_unlock(&proc->files_lock); } /* @@ -991,21 +985,22 @@ static void task_fd_install( */ static long task_close_fd(struct binder_proc *proc, unsigned int fd) { - struct files_struct *files = binder_get_files_struct(proc); int retval; - if (files == NULL) - return -ESRCH; - - retval = __close_fd(files, fd); + mutex_lock(&proc->files_lock); + if (proc->files == NULL) { + retval = -ESRCH; + goto err; + } + retval = __close_fd(proc->files, fd); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || retval == -ERESTARTNOINTR || retval == -ERESTARTNOHAND || retval == -ERESTART_RESTARTBLOCK)) retval = -EINTR; - put_files_struct(files); - +err: + mutex_unlock(&proc->files_lock); return retval; } @@ -2215,8 +2210,8 @@ static size_t binder_validate_object(struct binder_buffer *buffer, u64 offset) struct binder_object_header *hdr; size_t object_size = 0; - if (offset > buffer->data_size - sizeof(*hdr) || - buffer->data_size < sizeof(*hdr) || + if (buffer->data_size < sizeof(*hdr) || + offset > buffer->data_size - sizeof(*hdr) || !IS_ALIGNED(offset, sizeof(u32))) return 0; @@ -2356,7 +2351,7 @@ static void binder_transaction_buffer_release(struct binder_proc *proc, int debug_id = buffer->debug_id; binder_debug(BINDER_DEBUG_TRANSACTION, - "%d buffer release %d, size %zd-%zd, failed at %p\n", + "%d buffer release %d, size %zd-%zd, failed at %pK\n", proc->pid, buffer->debug_id, buffer->data_size, buffer->offsets_size, failed_at); @@ -2805,7 +2800,7 @@ static bool binder_proc_transaction(struct binder_transaction *t, if (node->has_async_transaction) { pending_async = true; } else { - node->has_async_transaction = 1; + node->has_async_transaction = true; } } @@ -3670,7 +3665,7 @@ static int binder_thread_write(struct binder_proc *proc, w = binder_dequeue_work_head_ilocked( &buf_node->async_todo); if (!w) { - buf_node->has_async_transaction = 0; + buf_node->has_async_transaction = false; } else { binder_enqueue_work_ilocked( w, &proc->todo); @@ -3892,7 +3887,7 @@ static int binder_thread_write(struct binder_proc *proc, } } binder_debug(BINDER_DEBUG_DEAD_BINDER, - "%d:%d BC_DEAD_BINDER_DONE %016llx found %p\n", + "%d:%d BC_DEAD_BINDER_DONE %016llx found %pK\n", proc->pid, thread->pid, (u64)cookie, death); if (death == NULL) { @@ -4098,6 +4093,7 @@ retry: binder_inner_proc_unlock(proc); if (put_user(e->cmd, (uint32_t __user *)ptr)) return -EFAULT; + cmd = e->cmd; e->cmd = BR_OK; ptr += sizeof(uint32_t); @@ -4866,6 +4862,7 @@ static void binder_vma_close(struct vm_area_struct *vma) (vma->vm_end - vma->vm_start) / SZ_1K, vma->vm_flags, (unsigned long)pgprot_val(vma->vm_page_prot)); binder_alloc_vma_close(&proc->alloc); + binder_defer_work(proc, BINDER_DEFERRED_PUT_FILES); } static int binder_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf) @@ -4902,16 +4899,22 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) failure_string = "bad vm_flags"; goto err_bad_arg; } - vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE; + vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP; + vma->vm_flags &= ~VM_MAYWRITE; + vma->vm_ops = &binder_vm_ops; vma->vm_private_data = proc; ret = binder_alloc_mmap_handler(&proc->alloc, vma); - - return ret; + if (ret) + return ret; + mutex_lock(&proc->files_lock); + proc->files = get_files_struct(current); + mutex_unlock(&proc->files_lock); + return 0; err_bad_arg: - pr_err("binder_mmap: %d %lx-%lx %s failed %d\n", + pr_err("%s: %d %lx-%lx %s failed %d\n", __func__, proc->pid, vma->vm_start, vma->vm_end, failure_string, ret); return ret; } @@ -4921,7 +4924,7 @@ static int binder_open(struct inode *nodp, struct file *filp) struct binder_proc *proc; struct binder_device *binder_dev; - binder_debug(BINDER_DEBUG_OPEN_CLOSE, "binder_open: %d:%d\n", + binder_debug(BINDER_DEBUG_OPEN_CLOSE, "%s: %d:%d\n", __func__, current->group_leader->pid, current->pid); proc = kzalloc(sizeof(*proc), GFP_KERNEL); @@ -4931,6 +4934,7 @@ static int binder_open(struct inode *nodp, struct file *filp) spin_lock_init(&proc->outer_lock); get_task_struct(current->group_leader); proc->tsk = current->group_leader; + mutex_init(&proc->files_lock); INIT_LIST_HEAD(&proc->todo); if (binder_supported_policy(current->policy)) { proc->default_priority.sched_policy = current->policy; @@ -4966,7 +4970,7 @@ static int binder_open(struct inode *nodp, struct file *filp) * anyway print all contexts that a given PID has, so this * is not a problem. */ - proc->debugfs_entry = debugfs_create_file(strbuf, S_IRUGO, + proc->debugfs_entry = debugfs_create_file(strbuf, 0444, binder_debugfs_dir_entry_proc, (void *)(unsigned long)proc->pid, &binder_proc_fops); @@ -5087,6 +5091,8 @@ static void binder_deferred_release(struct binder_proc *proc) struct rb_node *n; int threads, nodes, incoming_refs, outgoing_refs, active_transactions; + BUG_ON(proc->files); + mutex_lock(&binder_procs_lock); hlist_del(&proc->proc_node); mutex_unlock(&binder_procs_lock); @@ -5168,6 +5174,8 @@ static void binder_deferred_release(struct binder_proc *proc) static void binder_deferred_func(struct work_struct *work) { struct binder_proc *proc; + struct files_struct *files; + int defer; do { @@ -5184,11 +5192,23 @@ static void binder_deferred_func(struct work_struct *work) } mutex_unlock(&binder_deferred_lock); + files = NULL; + if (defer & BINDER_DEFERRED_PUT_FILES) { + mutex_lock(&proc->files_lock); + files = proc->files; + if (files) + proc->files = NULL; + mutex_unlock(&proc->files_lock); + } + if (defer & BINDER_DEFERRED_FLUSH) binder_deferred_flush(proc); if (defer & BINDER_DEFERRED_RELEASE) binder_deferred_release(proc); /* frees proc */ + + if (files) + put_files_struct(files); } while (proc); } static DECLARE_WORK(binder_deferred_work, binder_deferred_func); @@ -5217,7 +5237,7 @@ static void print_binder_transaction_ilocked(struct seq_file *m, spin_lock(&t->lock); to_proc = t->to_proc; seq_printf(m, - "%s %d: %p from %d:%d to %d:%d code %x flags %x pri %d:%d r%d", + "%s %d: %pK from %d:%d to %d:%d code %x flags %x pri %d:%d r%d", prefix, t->debug_id, t, t->from ? t->from->proc->pid : 0, t->from ? t->from->pid : 0, @@ -5242,7 +5262,7 @@ static void print_binder_transaction_ilocked(struct seq_file *m, } if (buffer->target_node) seq_printf(m, " node %d", buffer->target_node->debug_id); - seq_printf(m, " size %zd:%zd data %p\n", + seq_printf(m, " size %zd:%zd data %pK\n", buffer->data_size, buffer->offsets_size, buffer->data); } @@ -5777,11 +5797,13 @@ static int __init init_binder_device(const char *name) static int __init binder_init(void) { int ret; - char *device_name, *device_names; + char *device_name, *device_names, *device_tmp; struct binder_device *device; struct hlist_node *tmp; - binder_alloc_shrinker_init(); + ret = binder_alloc_shrinker_init(); + if (ret) + return ret; atomic_set(&binder_transaction_log.cur, ~0U); atomic_set(&binder_transaction_log_failed.cur, ~0U); @@ -5796,27 +5818,27 @@ static int __init binder_init(void) if (binder_debugfs_dir_entry_root) { debugfs_create_file("state", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, NULL, &binder_state_fops); debugfs_create_file("stats", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, NULL, &binder_stats_fops); debugfs_create_file("transactions", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, NULL, &binder_transactions_fops); debugfs_create_file("transaction_log", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, &binder_transaction_log, &binder_transaction_log_fops); debugfs_create_file("failed_transaction_log", - S_IRUGO, + 0444, binder_debugfs_dir_entry_root, &binder_transaction_log_failed, &binder_transaction_log_fops); @@ -5833,7 +5855,8 @@ static int __init binder_init(void) } strcpy(device_names, binder_devices_param); - while ((device_name = strsep(&device_names, ","))) { + device_tmp = device_names; + while ((device_name = strsep(&device_tmp, ","))) { ret = init_binder_device(device_name); if (ret) goto err_init_binder_device_failed; @@ -5847,6 +5870,9 @@ err_init_binder_device_failed: hlist_del(&device->hlist); kfree(device); } + + kfree(device_names); + err_alloc_device_names_failed: debugfs_remove_recursive(binder_debugfs_dir_entry_root); diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 3a4279d219f7..1d9db2ef26bd 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -220,7 +220,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, mm = alloc->vma_vm_mm; if (mm) { - down_write(&mm->mmap_sem); + down_read(&mm->mmap_sem); vma = alloc->vma; } @@ -289,7 +289,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, /* vm_insert_page does not seem to increment the refcount */ } if (mm) { - up_write(&mm->mmap_sem); + up_read(&mm->mmap_sem); mmput(mm); } return 0; @@ -322,17 +322,18 @@ err_page_ptr_cleared: } err_no_vma: if (mm) { - up_write(&mm->mmap_sem); + up_read(&mm->mmap_sem); mmput(mm); } return vma ? -ENOMEM : -ESRCH; } -struct binder_buffer *binder_alloc_new_buf_locked(struct binder_alloc *alloc, - size_t data_size, - size_t offsets_size, - size_t extra_buffers_size, - int is_async) +static struct binder_buffer *binder_alloc_new_buf_locked( + struct binder_alloc *alloc, + size_t data_size, + size_t offsets_size, + size_t extra_buffers_size, + int is_async) { struct rb_node *n = alloc->free_buffers.rb_node; struct binder_buffer *buffer; @@ -670,7 +671,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc, goto err_already_mapped; } - area = get_vm_area(vma->vm_end - vma->vm_start, VM_IOREMAP); + area = get_vm_area(vma->vm_end - vma->vm_start, VM_ALLOC); if (area == NULL) { ret = -ENOMEM; failure_string = "get_vm_area"; @@ -1010,8 +1011,14 @@ void binder_alloc_init(struct binder_alloc *alloc) INIT_LIST_HEAD(&alloc->buffers); } -void binder_alloc_shrinker_init(void) +int binder_alloc_shrinker_init(void) { - list_lru_init(&binder_alloc_lru); - register_shrinker(&binder_shrinker); + int ret = list_lru_init(&binder_alloc_lru); + + if (ret == 0) { + ret = register_shrinker(&binder_shrinker); + if (ret) + list_lru_destroy(&binder_alloc_lru); + } + return ret; } diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h index 0b145307f1fd..9ef64e563856 100644 --- a/drivers/android/binder_alloc.h +++ b/drivers/android/binder_alloc.h @@ -130,7 +130,7 @@ extern struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc, size_t extra_buffers_size, int is_async); extern void binder_alloc_init(struct binder_alloc *alloc); -void binder_alloc_shrinker_init(void); +extern int binder_alloc_shrinker_init(void); extern void binder_alloc_vma_close(struct binder_alloc *alloc); extern struct binder_buffer * binder_alloc_prepare_to_free(struct binder_alloc *alloc, diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index 75cced210b2a..7db76b5c7ada 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2198,12 +2198,16 @@ static void ata_eh_link_autopsy(struct ata_link *link) if (qc->err_mask & ~AC_ERR_OTHER) qc->err_mask &= ~AC_ERR_OTHER; - /* SENSE_VALID trumps dev/unknown error and revalidation */ + /* + * SENSE_VALID trumps dev/unknown error and revalidation. Upper + * layers will determine whether the command is worth retrying + * based on the sense data and device class/type. Otherwise, + * determine directly if the command is worth retrying using its + * error mask and flags. + */ if (qc->flags & ATA_QCFLAG_SENSE_VALID) qc->err_mask &= ~(AC_ERR_DEV | AC_ERR_OTHER); - - /* determine whether the command is worth retrying */ - if (ata_eh_worth_retry(qc)) + else if (ata_eh_worth_retry(qc)) qc->flags |= ATA_QCFLAG_RETRY; /* accumulate error info */ diff --git a/drivers/atm/zatm.c b/drivers/atm/zatm.c index c302f47f6323..94712e1c5cf9 100644 --- a/drivers/atm/zatm.c +++ b/drivers/atm/zatm.c @@ -1481,6 +1481,8 @@ static int zatm_ioctl(struct atm_dev *dev,unsigned int cmd,void __user *arg) return -EFAULT; if (pool < 0 || pool > ZATM_LAST_POOL) return -EINVAL; + pool = array_index_nospec(pool, + ZATM_LAST_POOL + 1); if (copy_from_user(&info, &((struct zatm_pool_req __user *) arg)->info, sizeof(info))) return -EFAULT; diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c index cb5c54e258eb..21044b1f29bc 100644 --- a/drivers/base/cpu.c +++ b/drivers/base/cpu.c @@ -693,14 +693,22 @@ ssize_t __weak cpu_show_spectre_v2(struct device *dev, return sprintf(buf, "Not affected\n"); } +ssize_t __weak cpu_show_spec_store_bypass(struct device *dev, + struct device_attribute *attr, char *buf) +{ + return sprintf(buf, "Not affected\n"); +} + static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL); static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL); static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL); +static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL); static struct attribute *cpu_root_vulnerabilities_attrs[] = { &dev_attr_meltdown.attr, &dev_attr_spectre_v1.attr, &dev_attr_spectre_v2.attr, + &dev_attr_spec_store_bypass.attr, NULL }; diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 7e00d10d5a94..0ab3a8d36ad4 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -330,14 +330,6 @@ static int really_probe(struct device *dev, struct device_driver *drv) goto probe_failed; } - /* - * Ensure devices are listed in devices_kset in correct order - * It's important to move Dev to the end of devices_kset before - * calling .probe, because it could be recursive and parent Dev - * should always go first - */ - devices_kset_move_last(dev); - if (dev->bus->probe) { ret = dev->bus->probe(dev); if (ret) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 91676535a1a3..4a899b41145e 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -339,6 +339,9 @@ static const struct usb_device_id blacklist_table[] = { /* Additional Realtek 8723BU Bluetooth devices */ { USB_DEVICE(0x7392, 0xa611), .driver_info = BTUSB_REALTEK }, + /* Additional Realtek 8723DE Bluetooth devices */ + { USB_DEVICE(0x2ff8, 0xb011), .driver_info = BTUSB_REALTEK }, + /* Additional Realtek 8821AE Bluetooth devices */ { USB_DEVICE(0x0b05, 0x17dc), .driver_info = BTUSB_REALTEK }, { USB_DEVICE(0x13d3, 0x3414), .driver_info = BTUSB_REALTEK }, diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 476d39c7ba20..ecfb9ed2cff6 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -884,7 +884,7 @@ static int qca_set_baudrate(struct hci_dev *hdev, uint8_t baudrate) */ set_current_state(TASK_UNINTERRUPTIBLE); schedule_timeout(msecs_to_jiffies(BAUDRATE_SETTLE_TIMEOUT_MS)); - set_current_state(TASK_INTERRUPTIBLE); + set_current_state(TASK_RUNNING); return 0; } diff --git a/drivers/char/diag/diag_masks.c b/drivers/char/diag/diag_masks.c index 94613a47eafa..2a5d1aa3a190 100644 --- a/drivers/char/diag/diag_masks.c +++ b/drivers/char/diag/diag_masks.c @@ -171,6 +171,9 @@ static void diag_send_log_mask_update(uint8_t peripheral, int equip_id) mutex_lock(&mask_info->lock); for (i = 0; i < MAX_EQUIP_ID; i++, mask++) { + if (!mask->ptr) + continue; + if (equip_id != i && equip_id != ALL_EQUIP_ID) continue; @@ -400,6 +403,8 @@ static void diag_send_msg_mask_update(uint8_t peripheral, int first, int last) } for (i = 0; i < msg_mask_tbl_count_local; i++, mask++) { + if (!mask->ptr) + continue; mutex_lock(&driver->msg_mask_lock); if (((mask->ssid_first > first) || (mask->ssid_last_tools < last)) && first != ALL_SSID) { @@ -644,6 +649,8 @@ static int diag_cmd_get_build_mask(unsigned char *src_buf, int src_len, rsp.padding = 0; build_mask = (struct diag_msg_mask_t *)msg_bt_mask.ptr; for (i = 0; i < driver->bt_msg_mask_tbl_count; i++, build_mask++) { + if (!build_mask->ptr) + continue; if (build_mask->ssid_first != req->ssid_first) continue; num_entries = req->ssid_last - req->ssid_first + 1; @@ -720,6 +727,8 @@ static int diag_cmd_get_msg_mask(unsigned char *src_buf, int src_len, return -EINVAL; } for (i = 0; i < driver->msg_mask_tbl_count; i++, mask++) { + if (!mask->ptr) + continue; if ((req->ssid_first < mask->ssid_first) || (req->ssid_first > mask->ssid_last_tools)) { continue; @@ -789,6 +798,8 @@ static int diag_cmd_set_msg_mask(unsigned char *src_buf, int src_len, return -EINVAL; } for (i = 0; i < driver->msg_mask_tbl_count; i++, mask++) { + if (!mask->ptr) + continue; if (i < (driver->msg_mask_tbl_count - 1)) { mask_next = mask; mask_next++; @@ -1526,7 +1537,8 @@ static int diag_create_msg_mask_table(void) mutex_lock(&msg_mask.lock); mutex_lock(&driver->msg_mask_lock); driver->msg_mask_tbl_count = MSG_MASK_TBL_CNT; - for (i = 0; i < driver->msg_mask_tbl_count; i++, mask++) { + for (i = 0; (i < driver->msg_mask_tbl_count) && mask; + i++, mask++) { range.ssid_first = msg_mask_tbl[i].ssid_first; range.ssid_last = msg_mask_tbl[i].ssid_last; err = diag_create_msg_mask_table_entry(mask, &range); @@ -1551,7 +1563,8 @@ static int diag_create_build_time_mask(void) mutex_lock(&driver->msg_mask_lock); driver->bt_msg_mask_tbl_count = MSG_MASK_TBL_CNT; build_mask = (struct diag_msg_mask_t *)msg_bt_mask.ptr; - for (i = 0; i < driver->bt_msg_mask_tbl_count; i++, build_mask++) { + for (i = 0; (i < driver->bt_msg_mask_tbl_count) && build_mask; + i++, build_mask++) { range.ssid_first = msg_mask_tbl[i].ssid_first; range.ssid_last = msg_mask_tbl[i].ssid_last; err = diag_create_msg_mask_table_entry(build_mask, &range); @@ -1674,7 +1687,7 @@ static int diag_create_log_mask_table(void) mutex_lock(&log_mask.lock); mask = (struct diag_log_mask_t *)(log_mask.ptr); - for (i = 0; i < MAX_EQUIP_ID; i++, mask++) { + for (i = 0; (i < MAX_EQUIP_ID) && mask; i++, mask++) { mask->equip_id = i; mask->num_items = LOG_GET_ITEM_NUM(log_code_last_tbl[i]); mask->num_items_tools = mask->num_items; @@ -2069,6 +2082,8 @@ int diag_copy_to_user_msg_mask(char __user *buf, size_t count, return -EINVAL; } for (i = 0; i < driver->msg_mask_tbl_count; i++, mask++) { + if (!mask->ptr) + continue; ptr = mask_info->update_buf; len = 0; mutex_lock(&mask->lock); @@ -2143,6 +2158,8 @@ int diag_copy_to_user_log_mask(char __user *buf, size_t count, return -EINVAL; } for (i = 0; i < MAX_EQUIP_ID; i++, mask++) { + if (!mask->ptr) + continue; ptr = mask_info->update_buf; len = 0; mutex_lock(&mask->lock); diff --git a/drivers/char/diag/diagfwd_cntl.c b/drivers/char/diag/diagfwd_cntl.c index 907bf6ee8511..8fb9227d8cb8 100644 --- a/drivers/char/diag/diagfwd_cntl.c +++ b/drivers/char/diag/diagfwd_cntl.c @@ -662,7 +662,7 @@ static void process_ssid_range_report(uint8_t *buf, uint32_t len, mask_ptr = (struct diag_msg_mask_t *)msg_mask.ptr; found = 0; for (j = 0; j < driver->msg_mask_tbl_count; j++, mask_ptr++) { - if (!mask_ptr || !ssid_range) { + if (!mask_ptr->ptr || !ssid_range) { found = 1; break; } @@ -741,7 +741,7 @@ static void diag_build_time_mask_update(uint8_t *buf, num_items = range->ssid_last - range->ssid_first + 1; for (i = 0; i < driver->bt_msg_mask_tbl_count; i++, build_mask++) { - if (!build_mask) { + if (!build_mask->ptr) { found = 1; break; } diff --git a/drivers/char/random.c b/drivers/char/random.c index dffd06a3bb76..2916d08ee30e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1503,14 +1503,22 @@ static int write_pool(struct entropy_store *r, const char __user *buffer, size_t count) { size_t bytes; - __u32 buf[16]; + __u32 t, buf[16]; const char __user *p = buffer; while (count > 0) { + int b, i = 0; + bytes = min(count, sizeof(buf)); if (copy_from_user(&buf, p, bytes)) return -EFAULT; + for (b = bytes ; b > 0 ; b -= sizeof(__u32), i++) { + if (!arch_get_random_int(&t)) + break; + buf[i] ^= t; + } + count -= bytes; p += bytes; diff --git a/drivers/clk/tegra/clk-tegra30.c b/drivers/clk/tegra/clk-tegra30.c index 8c41c6fcb9ee..acf83569f86f 100644 --- a/drivers/clk/tegra/clk-tegra30.c +++ b/drivers/clk/tegra/clk-tegra30.c @@ -333,11 +333,11 @@ static struct pdiv_map pllu_p[] = { }; static struct tegra_clk_pll_freq_table pll_u_freq_table[] = { - { 12000000, 480000000, 960, 12, 0, 12}, - { 13000000, 480000000, 960, 13, 0, 12}, - { 16800000, 480000000, 400, 7, 0, 5}, - { 19200000, 480000000, 200, 4, 0, 3}, - { 26000000, 480000000, 960, 26, 0, 12}, + { 12000000, 480000000, 960, 12, 2, 12 }, + { 13000000, 480000000, 960, 13, 2, 12 }, + { 16800000, 480000000, 400, 7, 2, 5 }, + { 19200000, 480000000, 200, 4, 2, 3 }, + { 26000000, 480000000, 960, 26, 2, 12 }, { 0, 0, 0, 0, 0, 0 }, }; @@ -1372,6 +1372,7 @@ static struct tegra_clk_init_table init_table[] __initdata = { {TEGRA30_CLK_GR2D, TEGRA30_CLK_PLL_C, 300000000, 0}, {TEGRA30_CLK_GR3D, TEGRA30_CLK_PLL_C, 300000000, 0}, {TEGRA30_CLK_GR3D2, TEGRA30_CLK_PLL_C, 300000000, 0}, + { TEGRA30_CLK_PLL_U, TEGRA30_CLK_CLK_MAX, 480000000, 0 }, {TEGRA30_CLK_CLK_MAX, TEGRA30_CLK_CLK_MAX, 0, 0}, /* This MUST be the last entry. */ }; diff --git a/drivers/cpufreq/cpufreq_times.c b/drivers/cpufreq/cpufreq_times.c index e5df7a47cc16..e7a8b636a5f4 100644 --- a/drivers/cpufreq/cpufreq_times.c +++ b/drivers/cpufreq/cpufreq_times.c @@ -234,16 +234,19 @@ static int uid_time_in_state_seq_show(struct seq_file *m, void *v) void cpufreq_task_times_init(struct task_struct *p) { - void *temp; unsigned long flags; - unsigned int max_state; spin_lock_irqsave(&task_time_in_state_lock, flags); p->time_in_state = NULL; spin_unlock_irqrestore(&task_time_in_state_lock, flags); p->max_state = 0; +} - max_state = READ_ONCE(next_offset); +void cpufreq_task_times_alloc(struct task_struct *p) +{ + void *temp; + unsigned long flags; + unsigned int max_state = READ_ONCE(next_offset); /* We use one array to avoid multiple allocs per task */ temp = kcalloc(max_state, sizeof(p->time_in_state[0]), GFP_ATOMIC); diff --git a/drivers/crypto/amcc/crypto4xx_core.c b/drivers/crypto/amcc/crypto4xx_core.c index 58a630e55d5d..78d0722feacb 100644 --- a/drivers/crypto/amcc/crypto4xx_core.c +++ b/drivers/crypto/amcc/crypto4xx_core.c @@ -207,7 +207,7 @@ static u32 crypto4xx_build_pdr(struct crypto4xx_device *dev) dev->pdr_pa); return -ENOMEM; } - memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); + memset(dev->pdr, 0, sizeof(struct ce_pd) * PPC4XX_NUM_PD); dev->shadow_sa_pool = dma_alloc_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, &dev->shadow_sa_pool_pa, @@ -240,13 +240,15 @@ static u32 crypto4xx_build_pdr(struct crypto4xx_device *dev) static void crypto4xx_destroy_pdr(struct crypto4xx_device *dev) { - if (dev->pdr != NULL) + if (dev->pdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_pd) * PPC4XX_NUM_PD, dev->pdr, dev->pdr_pa); + if (dev->shadow_sa_pool) dma_free_coherent(dev->core_dev->device, 256 * PPC4XX_NUM_PD, dev->shadow_sa_pool, dev->shadow_sa_pool_pa); + if (dev->shadow_sr_pool) dma_free_coherent(dev->core_dev->device, sizeof(struct sa_state_record) * PPC4XX_NUM_PD, @@ -416,12 +418,12 @@ static u32 crypto4xx_build_sdr(struct crypto4xx_device *dev) static void crypto4xx_destroy_sdr(struct crypto4xx_device *dev) { - if (dev->sdr != NULL) + if (dev->sdr) dma_free_coherent(dev->core_dev->device, sizeof(struct ce_sd) * PPC4XX_NUM_SD, dev->sdr, dev->sdr_pa); - if (dev->scatter_buffer_va != NULL) + if (dev->scatter_buffer_va) dma_free_coherent(dev->core_dev->device, dev->scatter_buffer_size * PPC4XX_NUM_SD, dev->scatter_buffer_va, @@ -1029,12 +1031,10 @@ int crypto4xx_register_alg(struct crypto4xx_device *sec_dev, break; } - if (rc) { - list_del(&alg->entry); + if (rc) kfree(alg); - } else { + else list_add_tail(&alg->entry, &sec_dev->alg_list); - } } return 0; @@ -1188,7 +1188,7 @@ static int crypto4xx_probe(struct platform_device *ofdev) rc = crypto4xx_build_gdr(core_dev->dev); if (rc) - goto err_build_gdr; + goto err_build_pdr; rc = crypto4xx_build_sdr(core_dev->dev); if (rc) @@ -1230,12 +1230,11 @@ err_iomap: err_request_irq: irq_dispose_mapping(core_dev->irq); tasklet_kill(&core_dev->tasklet); - crypto4xx_destroy_sdr(core_dev->dev); err_build_sdr: + crypto4xx_destroy_sdr(core_dev->dev); crypto4xx_destroy_gdr(core_dev->dev); -err_build_gdr: - crypto4xx_destroy_pdr(core_dev->dev); err_build_pdr: + crypto4xx_destroy_pdr(core_dev->dev); kfree(core_dev->dev); err_alloc_dev: kfree(core_dev); diff --git a/drivers/crypto/padlock-aes.c b/drivers/crypto/padlock-aes.c index 97a364694bfc..047ef69b7e65 100644 --- a/drivers/crypto/padlock-aes.c +++ b/drivers/crypto/padlock-aes.c @@ -266,6 +266,8 @@ static inline void padlock_xcrypt_ecb(const u8 *input, u8 *output, void *key, return; } + count -= initial; + if (initial) asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */ : "+S"(input), "+D"(output) @@ -273,7 +275,7 @@ static inline void padlock_xcrypt_ecb(const u8 *input, u8 *output, void *key, asm volatile (".byte 0xf3,0x0f,0xa7,0xc8" /* rep xcryptecb */ : "+S"(input), "+D"(output) - : "d"(control_word), "b"(key), "c"(count - initial)); + : "d"(control_word), "b"(key), "c"(count)); } static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key, @@ -284,6 +286,8 @@ static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key, if (count < cbc_fetch_blocks) return cbc_crypt(input, output, key, iv, control_word, count); + count -= initial; + if (initial) asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */ : "+S" (input), "+D" (output), "+a" (iv) @@ -291,7 +295,7 @@ static inline u8 *padlock_xcrypt_cbc(const u8 *input, u8 *output, void *key, asm volatile (".byte 0xf3,0x0f,0xa7,0xd0" /* rep xcryptcbc */ : "+S" (input), "+D" (output), "+a" (iv) - : "d" (control_word), "b" (key), "c" (count-initial)); + : "d" (control_word), "b" (key), "c" (count)); return iv; } diff --git a/drivers/dma/pxa_dma.c b/drivers/dma/pxa_dma.c index 55f5d33f6dc7..4251e9ac0373 100644 --- a/drivers/dma/pxa_dma.c +++ b/drivers/dma/pxa_dma.c @@ -1321,7 +1321,7 @@ static int pxad_init_phys(struct platform_device *op, return 0; } -static const struct of_device_id const pxad_dt_ids[] = { +static const struct of_device_id pxad_dt_ids[] = { { .compatible = "marvell,pdma-1.0", }, {} }; diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 50d74e5ce41b..355ad1b97df6 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -960,7 +960,9 @@ drm_atomic_set_crtc_for_plane(struct drm_plane_state *plane_state, { struct drm_plane *plane = plane_state->plane; struct drm_crtc_state *crtc_state; - + /* Nothing to do for same crtc*/ + if (plane_state->crtc == crtc) + return 0; if (plane_state->crtc) { crtc_state = drm_atomic_get_crtc_state(plane_state->state, plane_state->crtc); diff --git a/drivers/gpu/drm/gma500/psb_intel_drv.h b/drivers/gpu/drm/gma500/psb_intel_drv.h index 860dd2177ca1..283570080d47 100644 --- a/drivers/gpu/drm/gma500/psb_intel_drv.h +++ b/drivers/gpu/drm/gma500/psb_intel_drv.h @@ -252,7 +252,7 @@ extern int intelfb_remove(struct drm_device *dev, extern bool psb_intel_lvds_mode_fixup(struct drm_encoder *encoder, const struct drm_display_mode *mode, struct drm_display_mode *adjusted_mode); -extern int psb_intel_lvds_mode_valid(struct drm_connector *connector, +extern enum drm_mode_status psb_intel_lvds_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode); extern int psb_intel_lvds_set_property(struct drm_connector *connector, struct drm_property *property, diff --git a/drivers/gpu/drm/gma500/psb_intel_lvds.c b/drivers/gpu/drm/gma500/psb_intel_lvds.c index 61e3a097a478..ccd1b8bf0fd5 100644 --- a/drivers/gpu/drm/gma500/psb_intel_lvds.c +++ b/drivers/gpu/drm/gma500/psb_intel_lvds.c @@ -343,7 +343,7 @@ static void psb_intel_lvds_restore(struct drm_connector *connector) } } -int psb_intel_lvds_mode_valid(struct drm_connector *connector, +enum drm_mode_status psb_intel_lvds_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_psb_private *dev_priv = connector->dev->dev_private; diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c index f1bd9967ba81..c2f5621ddf8b 100644 --- a/drivers/gpu/drm/msm/msm_drv.c +++ b/drivers/gpu/drm/msm/msm_drv.c @@ -15,6 +15,27 @@ * You should have received a copy of the GNU General Public License along with * this program. If not, see <http://www.gnu.org/licenses/>. */ +/* + * Copyright (c) 2016 Intel Corporation + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that copyright + * notice and this permission notice appear in supporting documentation, and + * that the name of the copyright holders not be used in advertising or + * publicity pertaining to distribution of the software without specific, + * written prior permission. The copyright holders make no representations + * about the suitability of this software for any purpose. It is provided "as + * is" without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, + * INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO + * EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR + * CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, + * DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER + * TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE + * OF THIS SOFTWARE. + */ #include <linux/of_address.h> #include "msm_drv.h" @@ -1845,6 +1866,54 @@ int msm_release(struct inode *inode, struct file *filp) return drm_release(inode, filp); } +/** + * msm_ioctl_rmfb2 - remove an FB from the configuration + * @dev: drm device for the ioctl + * @data: data pointer for the ioctl + * @file_priv: drm file for the ioctl call + * + * Remove the FB specified by the user. + * + * Called by the user via ioctl. + * + * Returns: + * Zero on success, negative errno on failure. + */ +static int msm_ioctl_rmfb2(struct drm_device *dev, void *data, + struct drm_file *file_priv) +{ + struct drm_framebuffer *fb = NULL; + struct drm_framebuffer *fbl = NULL; + uint32_t *id = data; + int found = 0; + + if (!drm_core_check_feature(dev, DRIVER_MODESET)) + return -EINVAL; + + fb = drm_framebuffer_lookup(dev, *id); + if (!fb) + return -ENOENT; + + /* drop extra ref from traversing drm_framebuffer_lookup */ + drm_framebuffer_unreference(fb); + + mutex_lock(&file_priv->fbs_lock); + list_for_each_entry(fbl, &file_priv->fbs, filp_head) + if (fb == fbl) + found = 1; + if (!found) { + mutex_unlock(&file_priv->fbs_lock); + return -ENOENT; + } + + list_del_init(&fb->filp_head); + mutex_unlock(&file_priv->fbs_lock); + + drm_framebuffer_unreference(fb); + + return 0; +} + static const struct drm_ioctl_desc msm_ioctls[] = { DRM_IOCTL_DEF_DRV(MSM_GET_PARAM, msm_ioctl_get_param, DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(MSM_GEM_NEW, msm_ioctl_gem_new, DRM_AUTH|DRM_RENDER_ALLOW), @@ -1874,6 +1943,8 @@ static const struct drm_ioctl_desc msm_ioctls[] = { DRM_AUTH|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(MSM_SUBMITQUEUE_QUERY, msm_ioctl_submitqueue_query, DRM_AUTH|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(MSM_RMFB2, msm_ioctl_rmfb2, + DRM_CONTROL_ALLOW|DRM_UNLOCKED), }; static const struct vm_operations_struct vm_ops = { diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c index 1a2a7365d0b5..c6bf378534f8 100644 --- a/drivers/gpu/drm/radeon/radeon_connectors.c +++ b/drivers/gpu/drm/radeon/radeon_connectors.c @@ -844,7 +844,7 @@ static int radeon_lvds_get_modes(struct drm_connector *connector) return ret; } -static int radeon_lvds_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_lvds_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_encoder *encoder = radeon_best_single_encoder(connector); @@ -993,7 +993,7 @@ static int radeon_vga_get_modes(struct drm_connector *connector) return ret; } -static int radeon_vga_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_vga_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_device *dev = connector->dev; @@ -1136,7 +1136,7 @@ static int radeon_tv_get_modes(struct drm_connector *connector) return 1; } -static int radeon_tv_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_tv_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { if ((mode->hdisplay > 1024) || (mode->vdisplay > 768)) @@ -1477,7 +1477,7 @@ static void radeon_dvi_force(struct drm_connector *connector) radeon_connector->use_digital = true; } -static int radeon_dvi_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_dvi_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_device *dev = connector->dev; @@ -1778,7 +1778,7 @@ out: return ret; } -static int radeon_dp_mode_valid(struct drm_connector *connector, +static enum drm_mode_status radeon_dp_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { struct drm_device *dev = connector->dev; diff --git a/drivers/hid/hid-plantronics.c b/drivers/hid/hid-plantronics.c index febb21ee190e..584b10d3fc3d 100644 --- a/drivers/hid/hid-plantronics.c +++ b/drivers/hid/hid-plantronics.c @@ -2,7 +2,7 @@ * Plantronics USB HID Driver * * Copyright (c) 2014 JD Cole <jd.cole@plantronics.com> - * Copyright (c) 2015 Terry Junge <terry.junge@plantronics.com> + * Copyright (c) 2015-2018 Terry Junge <terry.junge@plantronics.com> */ /* @@ -48,6 +48,10 @@ static int plantronics_input_mapping(struct hid_device *hdev, unsigned short mapped_key; unsigned long plt_type = (unsigned long)hid_get_drvdata(hdev); + /* special case for PTT products */ + if (field->application == HID_GD_JOYSTICK) + goto defaulted; + /* handle volume up/down mapping */ /* non-standard types or multi-HID interfaces - plt_type is PID */ if (!(plt_type & HID_USAGE_PAGE)) { diff --git a/drivers/hid/i2c-hid/i2c-hid.c b/drivers/hid/i2c-hid/i2c-hid.c index a5fed668fde1..4248d253c32a 100644 --- a/drivers/hid/i2c-hid/i2c-hid.c +++ b/drivers/hid/i2c-hid/i2c-hid.c @@ -1017,6 +1017,14 @@ static int i2c_hid_probe(struct i2c_client *client, pm_runtime_set_active(&client->dev); pm_runtime_enable(&client->dev); + /* Make sure there is something at this address */ + ret = i2c_smbus_read_byte(client); + if (ret < 0) { + dev_dbg(&client->dev, "nothing at this address: %d\n", ret); + ret = -ENXIO; + goto err_pm; + } + ret = i2c_hid_fetch_hid_descriptor(ihid); if (ret < 0) goto err_pm; diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c index 8d84c563ba75..616173b7a5e8 100644 --- a/drivers/infiniband/core/mad.c +++ b/drivers/infiniband/core/mad.c @@ -1548,7 +1548,8 @@ static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req, mad_reg_req->oui, 3)) { method = &(*vendor_table)->vendor_class[ vclass]->method_table[i]; - BUG_ON(!*method); + if (!*method) + goto error3; goto check_in_use; } } @@ -1558,10 +1559,12 @@ static int add_oui_reg_req(struct ib_mad_reg_req *mad_reg_req, vclass]->oui[i])) { method = &(*vendor_table)->vendor_class[ vclass]->method_table[i]; - BUG_ON(*method); /* Allocate method table for this OUI */ - if ((ret = allocate_method_table(method))) - goto error3; + if (!*method) { + ret = allocate_method_table(method); + if (ret) + goto error3; + } memcpy((*vendor_table)->vendor_class[vclass]->oui[i], mad_reg_req->oui, 3); goto check_in_use; diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 795938edce3f..55aa8d3d752f 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -217,7 +217,7 @@ static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx) return NULL; mutex_lock(&mut); - mc->id = idr_alloc(&multicast_idr, mc, 0, 0, GFP_KERNEL); + mc->id = idr_alloc(&multicast_idr, NULL, 0, 0, GFP_KERNEL); mutex_unlock(&mut); if (mc->id < 0) goto error; @@ -1375,6 +1375,10 @@ static ssize_t ucma_process_join(struct ucma_file *file, goto err3; } + mutex_lock(&mut); + idr_replace(&multicast_idr, mc, mc->id); + mutex_unlock(&mut); + mutex_unlock(&file->mut); ucma_put_ctx(ctx); return 0; diff --git a/drivers/input/mouse/elan_i2c_core.c b/drivers/input/mouse/elan_i2c_core.c index 97f6e05cffce..a716482774db 100644 --- a/drivers/input/mouse/elan_i2c_core.c +++ b/drivers/input/mouse/elan_i2c_core.c @@ -1251,6 +1251,8 @@ static const struct acpi_device_id elan_acpi_id[] = { { "ELAN0611", 0 }, { "ELAN0612", 0 }, { "ELAN0618", 0 }, + { "ELAN061D", 0 }, + { "ELAN0622", 0 }, { "ELAN1000", 0 }, { } }; diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index e484ea2dc787..34be09651ee8 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -527,6 +527,13 @@ static const struct dmi_system_id __initconst i8042_dmi_nomux_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "N24_25BU"), }, }, + { + /* Lenovo LaVie Z */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_VERSION, "Lenovo LaVie Z"), + }, + }, { } }; diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 9411deaaddf9..48b09dedd0fb 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -508,14 +508,14 @@ static int adjoin(struct dm_table *table, struct dm_target *ti) * On the other hand, dm-switch needs to process bulk data using messages and * excessive use of GFP_NOIO could cause trouble. */ -static char **realloc_argv(unsigned *array_size, char **old_argv) +static char **realloc_argv(unsigned *size, char **old_argv) { char **argv; unsigned new_size; gfp_t gfp; - if (*array_size) { - new_size = *array_size * 2; + if (*size) { + new_size = *size * 2; gfp = GFP_KERNEL; } else { new_size = 8; @@ -523,8 +523,8 @@ static char **realloc_argv(unsigned *array_size, char **old_argv) } argv = kmalloc(new_size * sizeof(*argv), gfp); if (argv) { - memcpy(argv, old_argv, *array_size * sizeof(*argv)); - *array_size = new_size; + memcpy(argv, old_argv, *size * sizeof(*argv)); + *size = new_size; } kfree(old_argv); diff --git a/drivers/md/md.c b/drivers/md/md.c index c25534e10f43..e60d855b82e0 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6145,6 +6145,9 @@ static int hot_remove_disk(struct mddev *mddev, dev_t dev) struct md_rdev *rdev; int ret = -1; + if (!mddev->pers) + return -ENODEV; + rdev = find_rdev(mddev, dev); if (!rdev) return -ENXIO; diff --git a/drivers/media/common/siano/smsendian.c b/drivers/media/common/siano/smsendian.c index bfe831c10b1c..b95a631f23f9 100644 --- a/drivers/media/common/siano/smsendian.c +++ b/drivers/media/common/siano/smsendian.c @@ -35,7 +35,7 @@ void smsendian_handle_tx_message(void *buffer) switch (msg->x_msg_header.msg_type) { case MSG_SMS_DATA_DOWNLOAD_REQ: { - msg->msg_data[0] = le32_to_cpu(msg->msg_data[0]); + msg->msg_data[0] = le32_to_cpu((__force __le32)(msg->msg_data[0])); break; } @@ -44,7 +44,7 @@ void smsendian_handle_tx_message(void *buffer) sizeof(struct sms_msg_hdr))/4; for (i = 0; i < msg_words; i++) - msg->msg_data[i] = le32_to_cpu(msg->msg_data[i]); + msg->msg_data[i] = le32_to_cpu((__force __le32)msg->msg_data[i]); break; } @@ -64,7 +64,7 @@ void smsendian_handle_rx_message(void *buffer) { struct sms_version_res *ver = (struct sms_version_res *) msg; - ver->chip_model = le16_to_cpu(ver->chip_model); + ver->chip_model = le16_to_cpu((__force __le16)ver->chip_model); break; } @@ -81,7 +81,7 @@ void smsendian_handle_rx_message(void *buffer) sizeof(struct sms_msg_hdr))/4; for (i = 0; i < msg_words; i++) - msg->msg_data[i] = le32_to_cpu(msg->msg_data[i]); + msg->msg_data[i] = le32_to_cpu((__force __le32)msg->msg_data[i]); break; } @@ -95,9 +95,9 @@ void smsendian_handle_message_header(void *msg) #ifdef __BIG_ENDIAN struct sms_msg_hdr *phdr = (struct sms_msg_hdr *)msg; - phdr->msg_type = le16_to_cpu(phdr->msg_type); - phdr->msg_length = le16_to_cpu(phdr->msg_length); - phdr->msg_flags = le16_to_cpu(phdr->msg_flags); + phdr->msg_type = le16_to_cpu((__force __le16)phdr->msg_type); + phdr->msg_length = le16_to_cpu((__force __le16)phdr->msg_length); + phdr->msg_flags = le16_to_cpu((__force __le16)phdr->msg_flags); #endif /* __BIG_ENDIAN */ } EXPORT_SYMBOL_GPL(smsendian_handle_message_header); diff --git a/drivers/media/i2c/smiapp/smiapp-core.c b/drivers/media/i2c/smiapp/smiapp-core.c index fb39dfd55e75..46a052c5be2e 100644 --- a/drivers/media/i2c/smiapp/smiapp-core.c +++ b/drivers/media/i2c/smiapp/smiapp-core.c @@ -981,7 +981,7 @@ static int smiapp_read_nvm(struct smiapp_sensor *sensor, if (rval) goto out; - for (i = 0; i < 1000; i++) { + for (i = 1000; i > 0; i--) { rval = smiapp_read( sensor, SMIAPP_REG_U8_DATA_TRANSFER_IF_1_STATUS, &s); @@ -992,11 +992,10 @@ static int smiapp_read_nvm(struct smiapp_sensor *sensor, if (s & SMIAPP_DATA_TRANSFER_IF_1_STATUS_RD_READY) break; - if (--i == 0) { - rval = -ETIMEDOUT; - goto out; - } - + } + if (!i) { + rval = -ETIMEDOUT; + goto out; } for (i = 0; i < SMIAPP_NVM_PAGE_SIZE; i++) { diff --git a/drivers/media/pci/saa7164/saa7164-fw.c b/drivers/media/pci/saa7164/saa7164-fw.c index 269e0782c7b6..93d53195e8ca 100644 --- a/drivers/media/pci/saa7164/saa7164-fw.c +++ b/drivers/media/pci/saa7164/saa7164-fw.c @@ -430,7 +430,8 @@ int saa7164_downloadfirmware(struct saa7164_dev *dev) __func__, fw->size); if (fw->size != fwlength) { - printk(KERN_ERR "xc5000: firmware incorrect size\n"); + printk(KERN_ERR "saa7164: firmware incorrect size %zu != %u\n", + fw->size, fwlength); ret = -ENOMEM; goto out; } diff --git a/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.c b/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.c index 79231fb314ad..ae3f7e89a8b4 100644 --- a/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.c +++ b/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2016-2017, The Linux Foundation. All rights reserved. +/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and @@ -25,8 +25,8 @@ #define MSM_CAMERA_TZ_BOOT_PROTECTED (false) /* Update version major number in case the HLOS-TA interface is changed*/ -#define TA_IF_VERSION_MAJ 1 -#define TA_IF_VERSION_MIN 2 +#define TA_IF_VERSION_MAJ 2 +#define TA_IF_VERSION_MIN 1 #undef CDBG #ifdef MSM_CAMERA_TZ_UTIL_VERBOSE diff --git a/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.h b/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.h index 146b7d029984..eaa77db3957c 100644 --- a/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.h +++ b/drivers/media/platform/msm/camera_v2/common/msm_camera_tz_util.h @@ -1,4 +1,4 @@ -/* Copyright (c) 2016, The Linux Foundation. All rights reserved. +/* Copyright (c) 2016-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and @@ -29,7 +29,7 @@ #define MSM_CAMERA_TZ_HW_BLOCK_CPP 0x0000000010 enum msm_camera_tz_cmd_id_t { - MSM_CAMERA_TZ_CMD_NONE, + MSM_CAMERA_TZ_CMD_NONE = 56000, MSM_CAMERA_TZ_CMD_GET_IF_VERSION, MSM_CAMERA_TZ_CMD_POWER_UP, MSM_CAMERA_TZ_CMD_POWER_DOWN, diff --git a/drivers/media/platform/msm/camera_v2/pproc/cpp/msm_cpp.c b/drivers/media/platform/msm/camera_v2/pproc/cpp/msm_cpp.c index 53a01aff4bdd..4b5671cd9c28 100644 --- a/drivers/media/platform/msm/camera_v2/pproc/cpp/msm_cpp.c +++ b/drivers/media/platform/msm/camera_v2/pproc/cpp/msm_cpp.c @@ -958,9 +958,14 @@ static irqreturn_t msm_cpp_irq(int irq_num, void *data) if (irq_status & 0x8) { tx_level = msm_camera_io_r(cpp_dev->base + MSM_CPP_MICRO_FIFO_TX_STAT) >> 2; - for (i = 0; i < tx_level; i++) { - tx_fifo[i] = msm_camera_io_r(cpp_dev->base + - MSM_CPP_MICRO_FIFO_TX_DATA); + if (tx_level < MSM_CPP_TX_FIFO_LEVEL) { + for (i = 0; i < tx_level; i++) { + tx_fifo[i] = msm_camera_io_r(cpp_dev->base + + MSM_CPP_MICRO_FIFO_TX_DATA); + } + } else { + pr_err("Fatal invalid tx level %d", tx_level); + goto err; } spin_lock_irqsave(&cpp_dev->tasklet_lock, flags); queue_cmd = &cpp_dev->tasklet_queue_cmd[cpp_dev->taskletq_idx]; @@ -1015,6 +1020,7 @@ static irqreturn_t msm_cpp_irq(int irq_num, void *data) pr_debug("DEBUG_R1: 0x%x\n", msm_camera_io_r(cpp_dev->base + 0x8C)); } +err: msm_camera_io_w(irq_status, cpp_dev->base + MSM_CPP_MICRO_IRQGEN_CLR); return IRQ_HANDLED; } diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c index 56e683b19a73..91e02c1ff392 100644 --- a/drivers/media/platform/omap3isp/isp.c +++ b/drivers/media/platform/omap3isp/isp.c @@ -2077,6 +2077,7 @@ error_csiphy: static void isp_detach_iommu(struct isp_device *isp) { + arm_iommu_detach_device(isp->dev); arm_iommu_release_mapping(isp->mapping); isp->mapping = NULL; iommu_group_remove_device(isp->dev); @@ -2110,8 +2111,7 @@ static int isp_attach_iommu(struct isp_device *isp) mapping = arm_iommu_create_mapping(&platform_bus_type, SZ_1G, SZ_2G); if (IS_ERR(mapping)) { dev_err(isp->dev, "failed to create ARM IOMMU mapping\n"); - ret = PTR_ERR(mapping); - goto error; + return PTR_ERR(mapping); } isp->mapping = mapping; @@ -2126,7 +2126,8 @@ static int isp_attach_iommu(struct isp_device *isp) return 0; error: - isp_detach_iommu(isp); + arm_iommu_release_mapping(isp->mapping); + isp->mapping = NULL; return ret; } diff --git a/drivers/media/platform/rcar_jpu.c b/drivers/media/platform/rcar_jpu.c index f8e3e83c52a2..20de5e9fc217 100644 --- a/drivers/media/platform/rcar_jpu.c +++ b/drivers/media/platform/rcar_jpu.c @@ -1278,7 +1278,7 @@ static int jpu_open(struct file *file) /* ...issue software reset */ ret = jpu_reset(jpu); if (ret) - goto device_prepare_rollback; + goto jpu_reset_rollback; } jpu->ref_count++; @@ -1286,6 +1286,8 @@ static int jpu_open(struct file *file) mutex_unlock(&jpu->mutex); return 0; +jpu_reset_rollback: + clk_disable_unprepare(jpu->clk); device_prepare_rollback: mutex_unlock(&jpu->mutex); v4l_prepare_rollback: diff --git a/drivers/media/radio/si470x/radio-si470x-i2c.c b/drivers/media/radio/si470x/radio-si470x-i2c.c index 471d6a8ae8a4..9326439bc49c 100644 --- a/drivers/media/radio/si470x/radio-si470x-i2c.c +++ b/drivers/media/radio/si470x/radio-si470x-i2c.c @@ -96,7 +96,7 @@ MODULE_PARM_DESC(max_rds_errors, "RDS maximum block errors: *1*"); */ int si470x_get_register(struct si470x_device *radio, int regnr) { - u16 buf[READ_REG_NUM]; + __be16 buf[READ_REG_NUM]; struct i2c_msg msgs[1] = { { .addr = radio->client->addr, @@ -121,7 +121,7 @@ int si470x_get_register(struct si470x_device *radio, int regnr) int si470x_set_register(struct si470x_device *radio, int regnr) { int i; - u16 buf[WRITE_REG_NUM]; + __be16 buf[WRITE_REG_NUM]; struct i2c_msg msgs[1] = { { .addr = radio->client->addr, @@ -151,7 +151,7 @@ int si470x_set_register(struct si470x_device *radio, int regnr) static int si470x_get_all_registers(struct si470x_device *radio) { int i; - u16 buf[READ_REG_NUM]; + __be16 buf[READ_REG_NUM]; struct i2c_msg msgs[1] = { { .addr = radio->client->addr, diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c index bb1e19f7ed5a..0c1a42bf27fd 100644 --- a/drivers/media/v4l2-core/videobuf2-core.c +++ b/drivers/media/v4l2-core/videobuf2-core.c @@ -870,9 +870,12 @@ void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state) dprintk(4, "done processing on buffer %d, state: %d\n", vb->index, state); - /* sync buffers */ - for (plane = 0; plane < vb->num_planes; ++plane) - call_void_memop(vb, finish, vb->planes[plane].mem_priv); + if (state != VB2_BUF_STATE_QUEUED && + state != VB2_BUF_STATE_REQUEUEING) { + /* sync buffers */ + for (plane = 0; plane < vb->num_planes; ++plane) + call_void_memop(vb, finish, vb->planes[plane].mem_priv); + } spin_lock_irqsave(&q->done_lock, flags); if (state == VB2_BUF_STATE_QUEUED || diff --git a/drivers/memory/tegra/mc.c b/drivers/memory/tegra/mc.c index a1ae0cc2b86d..6ab481ee8ece 100644 --- a/drivers/memory/tegra/mc.c +++ b/drivers/memory/tegra/mc.c @@ -20,14 +20,6 @@ #include "mc.h" #define MC_INTSTATUS 0x000 -#define MC_INT_DECERR_MTS (1 << 16) -#define MC_INT_SECERR_SEC (1 << 13) -#define MC_INT_DECERR_VPR (1 << 12) -#define MC_INT_INVALID_APB_ASID_UPDATE (1 << 11) -#define MC_INT_INVALID_SMMU_PAGE (1 << 10) -#define MC_INT_ARBITRATION_EMEM (1 << 9) -#define MC_INT_SECURITY_VIOLATION (1 << 8) -#define MC_INT_DECERR_EMEM (1 << 6) #define MC_INTMASK 0x004 @@ -248,12 +240,13 @@ static const char *const error_names[8] = { static irqreturn_t tegra_mc_irq(int irq, void *data) { struct tegra_mc *mc = data; - unsigned long status, mask; + unsigned long status; unsigned int bit; /* mask all interrupts to avoid flooding */ - status = mc_readl(mc, MC_INTSTATUS); - mask = mc_readl(mc, MC_INTMASK); + status = mc_readl(mc, MC_INTSTATUS) & mc->soc->intmask; + if (!status) + return IRQ_NONE; for_each_set_bit(bit, &status, 32) { const char *error = status_names[bit] ?: "unknown"; @@ -346,7 +339,6 @@ static int tegra_mc_probe(struct platform_device *pdev) const struct of_device_id *match; struct resource *res; struct tegra_mc *mc; - u32 value; int err; match = of_match_node(tegra_mc_of_match, pdev->dev.of_node); @@ -414,11 +406,7 @@ static int tegra_mc_probe(struct platform_device *pdev) WARN(!mc->soc->client_id_mask, "Missing client ID mask for this SoC\n"); - value = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | - MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | - MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM; - - mc_writel(mc, value, MC_INTMASK); + mc_writel(mc, mc->soc->intmask, MC_INTMASK); return 0; } diff --git a/drivers/memory/tegra/mc.h b/drivers/memory/tegra/mc.h index ddb16676c3af..24e020b4609b 100644 --- a/drivers/memory/tegra/mc.h +++ b/drivers/memory/tegra/mc.h @@ -14,6 +14,15 @@ #include <soc/tegra/mc.h> +#define MC_INT_DECERR_MTS (1 << 16) +#define MC_INT_SECERR_SEC (1 << 13) +#define MC_INT_DECERR_VPR (1 << 12) +#define MC_INT_INVALID_APB_ASID_UPDATE (1 << 11) +#define MC_INT_INVALID_SMMU_PAGE (1 << 10) +#define MC_INT_ARBITRATION_EMEM (1 << 9) +#define MC_INT_SECURITY_VIOLATION (1 << 8) +#define MC_INT_DECERR_EMEM (1 << 6) + static inline u32 mc_readl(struct tegra_mc *mc, unsigned long offset) { return readl(mc->regs + offset); diff --git a/drivers/memory/tegra/tegra114.c b/drivers/memory/tegra/tegra114.c index ba8fff3d66a6..6d2a5a849d92 100644 --- a/drivers/memory/tegra/tegra114.c +++ b/drivers/memory/tegra/tegra114.c @@ -930,4 +930,6 @@ const struct tegra_mc_soc tegra114_mc_soc = { .atom_size = 32, .client_id_mask = 0x7f, .smmu = &tegra114_smmu_soc, + .intmask = MC_INT_INVALID_SMMU_PAGE | MC_INT_SECURITY_VIOLATION | + MC_INT_DECERR_EMEM, }; diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c index 21e7255e3d96..234e74f97a4b 100644 --- a/drivers/memory/tegra/tegra124.c +++ b/drivers/memory/tegra/tegra124.c @@ -1019,6 +1019,9 @@ const struct tegra_mc_soc tegra124_mc_soc = { .smmu = &tegra124_smmu_soc, .emem_regs = tegra124_mc_emem_regs, .num_emem_regs = ARRAY_SIZE(tegra124_mc_emem_regs), + .intmask = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | + MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | + MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM, }; #endif /* CONFIG_ARCH_TEGRA_124_SOC */ @@ -1041,5 +1044,8 @@ const struct tegra_mc_soc tegra132_mc_soc = { .atom_size = 32, .client_id_mask = 0x7f, .smmu = &tegra132_smmu_soc, + .intmask = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | + MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | + MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM, }; #endif /* CONFIG_ARCH_TEGRA_132_SOC */ diff --git a/drivers/memory/tegra/tegra210.c b/drivers/memory/tegra/tegra210.c index 5e144abe4c18..47c78a6d8f00 100644 --- a/drivers/memory/tegra/tegra210.c +++ b/drivers/memory/tegra/tegra210.c @@ -1077,4 +1077,7 @@ const struct tegra_mc_soc tegra210_mc_soc = { .atom_size = 64, .client_id_mask = 0xff, .smmu = &tegra210_smmu_soc, + .intmask = MC_INT_DECERR_MTS | MC_INT_SECERR_SEC | MC_INT_DECERR_VPR | + MC_INT_INVALID_APB_ASID_UPDATE | MC_INT_INVALID_SMMU_PAGE | + MC_INT_SECURITY_VIOLATION | MC_INT_DECERR_EMEM, }; diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c index b44737840e70..d0689428ea1a 100644 --- a/drivers/memory/tegra/tegra30.c +++ b/drivers/memory/tegra/tegra30.c @@ -952,4 +952,6 @@ const struct tegra_mc_soc tegra30_mc_soc = { .atom_size = 16, .client_id_mask = 0x7f, .smmu = &tegra30_smmu_soc, + .intmask = MC_INT_INVALID_SMMU_PAGE | MC_INT_SECURITY_VIOLATION | + MC_INT_DECERR_EMEM, }; diff --git a/drivers/mfd/cros_ec.c b/drivers/mfd/cros_ec.c index 0eee63542038..115a6f67ab51 100644 --- a/drivers/mfd/cros_ec.c +++ b/drivers/mfd/cros_ec.c @@ -68,7 +68,11 @@ int cros_ec_register(struct cros_ec_device *ec_dev) mutex_init(&ec_dev->lock); - cros_ec_query_all(ec_dev); + err = cros_ec_query_all(ec_dev); + if (err) { + dev_err(dev, "Cannot identify the EC: error %d\n", err); + return err; + } err = mfd_add_devices(ec_dev->dev, PLATFORM_DEVID_AUTO, &ec_cell, 1, NULL, ec_dev->irq, NULL); diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c index 5e3fa5861039..2c0bbaed3609 100644 --- a/drivers/mtd/nand/fsl_ifc_nand.c +++ b/drivers/mtd/nand/fsl_ifc_nand.c @@ -449,9 +449,16 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, case NAND_CMD_READID: case NAND_CMD_PARAM: { + /* + * For READID, read 8 bytes that are currently used. + * For PARAM, read all 3 copies of 256-bytes pages. + */ + int len = 8; int timing = IFC_FIR_OP_RB; - if (command == NAND_CMD_PARAM) + if (command == NAND_CMD_PARAM) { timing = IFC_FIR_OP_RBCD; + len = 256 * 3; + } ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) | (IFC_FIR_OP_UA << IFC_NAND_FIR0_OP1_SHIFT) | @@ -461,12 +468,8 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, &ifc->ifc_nand.nand_fcr0); ifc_out32(column, &ifc->ifc_nand.row3); - /* - * although currently it's 8 bytes for READID, we always read - * the maximum 256 bytes(for PARAM) - */ - ifc_out32(256, &ifc->ifc_nand.nand_fbcr); - ifc_nand_ctrl->read_bytes = 256; + ifc_out32(len, &ifc->ifc_nand.nand_fbcr); + ifc_nand_ctrl->read_bytes = len; set_addr(mtd, 0, 0, 0); fsl_ifc_run_command(mtd); diff --git a/drivers/mtd/ubi/attach.c b/drivers/mtd/ubi/attach.c index c1aaf0336cf2..5cde3ad1665e 100644 --- a/drivers/mtd/ubi/attach.c +++ b/drivers/mtd/ubi/attach.c @@ -175,6 +175,40 @@ static int add_corrupted(struct ubi_attach_info *ai, int pnum, int ec) } /** + * add_fastmap - add a Fastmap related physical eraseblock. + * @ai: attaching information + * @pnum: physical eraseblock number the VID header came from + * @vid_hdr: the volume identifier header + * @ec: erase counter of the physical eraseblock + * + * This function allocates a 'struct ubi_ainf_peb' object for a Fastamp + * physical eraseblock @pnum and adds it to the 'fastmap' list. + * Such blocks can be Fastmap super and data blocks from both the most + * recent Fastmap we're attaching from or from old Fastmaps which will + * be erased. + */ +static int add_fastmap(struct ubi_attach_info *ai, int pnum, + struct ubi_vid_hdr *vid_hdr, int ec) +{ + struct ubi_ainf_peb *aeb; + + aeb = kmem_cache_alloc(ai->aeb_slab_cache, GFP_KERNEL); + if (!aeb) + return -ENOMEM; + + aeb->pnum = pnum; + aeb->vol_id = be32_to_cpu(vidh->vol_id); + aeb->sqnum = be64_to_cpu(vidh->sqnum); + aeb->ec = ec; + list_add(&aeb->u.list, &ai->fastmap); + + dbg_bld("add to fastmap list: PEB %d, vol_id %d, sqnum: %llu", pnum, + aeb->vol_id, aeb->sqnum); + + return 0; +} + +/** * validate_vid_hdr - check volume identifier header. * @ubi: UBI device description object * @vid_hdr: the volume identifier header to check @@ -803,13 +837,26 @@ out_unlock: return err; } +static bool vol_ignored(int vol_id) +{ + switch (vol_id) { + case UBI_LAYOUT_VOLUME_ID: + return true; + } + +#ifdef CONFIG_MTD_UBI_FASTMAP + return ubi_is_fm_vol(vol_id); +#else + return false; +#endif +} + /** * scan_peb - scan and process UBI headers of a PEB. * @ubi: UBI device description object * @ai: attaching information * @pnum: the physical eraseblock number - * @vid: The volume ID of the found volume will be stored in this pointer - * @sqnum: The sqnum of the found volume will be stored in this pointer + * @fast: true if we're scanning for a Fastmap * * This function reads UBI headers of PEB @pnum, checks them, and adds * information about this PEB to the corresponding list or RB-tree in the @@ -817,9 +864,9 @@ out_unlock: * successfully handled and a negative error code in case of failure. */ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai, - int pnum, int *vid, unsigned long long *sqnum) + int pnum, bool fast) { - long long uninitialized_var(ec); + long long ec; int err, bitflips = 0, vol_id = -1, ec_err = 0; dbg_bld("scan PEB %d", pnum); @@ -935,6 +982,20 @@ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai, */ ai->maybe_bad_peb_count += 1; case UBI_IO_BAD_HDR: + /* + * If we're facing a bad VID header we have to drop *all* + * Fastmap data structures we find. The most recent Fastmap + * could be bad and therefore there is a chance that we attach + * from an old one. On a fine MTD stack a PEB must not render + * bad all of a sudden, but the reality is different. + * So, let's be paranoid and help finding the root cause by + * falling back to scanning mode instead of attaching with a + * bad EBA table and cause data corruption which is hard to + * analyze. + */ + if (fast) + ai->force_full_scan = 1; + if (ec_err) /* * Both headers are corrupted. There is a possibility @@ -991,21 +1052,15 @@ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai, } vol_id = be32_to_cpu(vidh->vol_id); - if (vid) - *vid = vol_id; - if (sqnum) - *sqnum = be64_to_cpu(vidh->sqnum); - if (vol_id > UBI_MAX_VOLUMES && vol_id != UBI_LAYOUT_VOLUME_ID) { + if (vol_id > UBI_MAX_VOLUMES && !vol_ignored(vol_id)) { int lnum = be32_to_cpu(vidh->lnum); /* Unsupported internal volume */ switch (vidh->compat) { case UBI_COMPAT_DELETE: - if (vol_id != UBI_FM_SB_VOLUME_ID - && vol_id != UBI_FM_DATA_VOLUME_ID) { - ubi_msg(ubi, "\"delete\" compatible internal volume %d:%d found, will remove it", - vol_id, lnum); - } + ubi_msg(ubi, "\"delete\" compatible internal volume %d:%d found, will remove it", + vol_id, lnum); + err = add_to_list(ai, pnum, vol_id, lnum, ec, 1, &ai->erase); if (err) @@ -1037,7 +1092,12 @@ static int scan_peb(struct ubi_device *ubi, struct ubi_attach_info *ai, if (ec_err) ubi_warn(ubi, "valid VID header but corrupted EC header at PEB %d", pnum); - err = ubi_add_to_av(ubi, ai, pnum, ec, vidh, bitflips); + + if (ubi_is_fm_vol(vol_id)) + err = add_fastmap(ai, pnum, vidh, ec); + else + err = ubi_add_to_av(ubi, ai, pnum, ec, vidh, bitflips); + if (err) return err; @@ -1186,6 +1246,10 @@ static void destroy_ai(struct ubi_attach_info *ai) list_del(&aeb->u.list); kmem_cache_free(ai->aeb_slab_cache, aeb); } + list_for_each_entry_safe(aeb, aeb_tmp, &ai->fastmap, u.list) { + list_del(&aeb->u.list); + kmem_cache_free(ai->aeb_slab_cache, aeb); + } /* Destroy the volume RB-tree */ rb = ai->volumes.rb_node; @@ -1245,7 +1309,7 @@ static int scan_all(struct ubi_device *ubi, struct ubi_attach_info *ai, cond_resched(); dbg_gen("process PEB %d", pnum); - err = scan_peb(ubi, ai, pnum, NULL, NULL); + err = scan_peb(ubi, ai, pnum, false); if (err < 0) goto out_vidh; } @@ -1311,6 +1375,7 @@ static struct ubi_attach_info *alloc_ai(void) INIT_LIST_HEAD(&ai->free); INIT_LIST_HEAD(&ai->erase); INIT_LIST_HEAD(&ai->alien); + INIT_LIST_HEAD(&ai->fastmap); ai->volumes = RB_ROOT; ai->aeb_slab_cache = kmem_cache_create("ubi_aeb_slab_cache", sizeof(struct ubi_ainf_peb), @@ -1337,52 +1402,58 @@ static struct ubi_attach_info *alloc_ai(void) */ static int scan_fast(struct ubi_device *ubi, struct ubi_attach_info **ai) { - int err, pnum, fm_anchor = -1; - unsigned long long max_sqnum = 0; + int err, pnum; + struct ubi_attach_info *scan_ai; err = -ENOMEM; + scan_ai = alloc_ai(); + if (!scan_ai) + goto out; + ech = kzalloc(ubi->ec_hdr_alsize, GFP_KERNEL); if (!ech) - goto out; + goto out_ai; vidh = ubi_zalloc_vid_hdr(ubi, GFP_KERNEL); if (!vidh) goto out_ech; for (pnum = 0; pnum < UBI_FM_MAX_START; pnum++) { - int vol_id = -1; - unsigned long long sqnum = -1; cond_resched(); dbg_gen("process PEB %d", pnum); - err = scan_peb(ubi, *ai, pnum, &vol_id, &sqnum); + err = scan_peb(ubi, scan_ai, pnum, true); if (err < 0) goto out_vidh; - - if (vol_id == UBI_FM_SB_VOLUME_ID && sqnum > max_sqnum) { - max_sqnum = sqnum; - fm_anchor = pnum; - } } ubi_free_vid_hdr(ubi, vidh); kfree(ech); - if (fm_anchor < 0) - return UBI_NO_FASTMAP; + if (scan_ai->force_full_scan) + err = UBI_NO_FASTMAP; + else + err = ubi_scan_fastmap(ubi, *ai, scan_ai); - destroy_ai(*ai); - *ai = alloc_ai(); - if (!*ai) - return -ENOMEM; + if (err) { + /* + * Didn't attach via fastmap, do a full scan but reuse what + * we've aready scanned. + */ + destroy_ai(*ai); + *ai = scan_ai; + } else + destroy_ai(scan_ai); - return ubi_scan_fastmap(ubi, *ai, fm_anchor); + return err; out_vidh: ubi_free_vid_hdr(ubi, vidh); out_ech: kfree(ech); +out_ai: + destroy_ai(scan_ai); out: return err; } diff --git a/drivers/mtd/ubi/eba.c b/drivers/mtd/ubi/eba.c index c4a25c858c07..03cf0553ec1b 100644 --- a/drivers/mtd/ubi/eba.c +++ b/drivers/mtd/ubi/eba.c @@ -1178,6 +1178,8 @@ int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to, struct ubi_volume *vol; uint32_t crc; + ubi_assert(rwsem_is_locked(&ubi->fm_eba_sem)); + vol_id = be32_to_cpu(vid_hdr->vol_id); lnum = be32_to_cpu(vid_hdr->lnum); @@ -1346,9 +1348,7 @@ int ubi_eba_copy_leb(struct ubi_device *ubi, int from, int to, } ubi_assert(vol->eba_tbl[lnum] == from); - down_read(&ubi->fm_eba_sem); vol->eba_tbl[lnum] = to; - up_read(&ubi->fm_eba_sem); out_unlock_buf: mutex_unlock(&ubi->buf_mutex); diff --git a/drivers/mtd/ubi/fastmap-wl.c b/drivers/mtd/ubi/fastmap-wl.c index ed62f1efe6eb..69dd21679a30 100644 --- a/drivers/mtd/ubi/fastmap-wl.c +++ b/drivers/mtd/ubi/fastmap-wl.c @@ -262,6 +262,8 @@ static struct ubi_wl_entry *get_peb_for_wl(struct ubi_device *ubi) struct ubi_fm_pool *pool = &ubi->fm_wl_pool; int pnum; + ubi_assert(rwsem_is_locked(&ubi->fm_eba_sem)); + if (pool->used == pool->size) { /* We cannot update the fastmap here because this * function is called in atomic context. @@ -303,7 +305,7 @@ int ubi_ensure_anchor_pebs(struct ubi_device *ubi) wrk->anchor = 1; wrk->func = &wear_leveling_worker; - schedule_ubi_work(ubi, wrk); + __schedule_ubi_work(ubi, wrk); return 0; } @@ -344,7 +346,7 @@ int ubi_wl_put_fm_peb(struct ubi_device *ubi, struct ubi_wl_entry *fm_e, spin_unlock(&ubi->wl_lock); vol_id = lnum ? UBI_FM_DATA_VOLUME_ID : UBI_FM_SB_VOLUME_ID; - return schedule_erase(ubi, e, vol_id, lnum, torture); + return schedule_erase(ubi, e, vol_id, lnum, torture, true); } /** diff --git a/drivers/mtd/ubi/fastmap.c b/drivers/mtd/ubi/fastmap.c index bba7dd1b5ebf..72e89b352034 100644 --- a/drivers/mtd/ubi/fastmap.c +++ b/drivers/mtd/ubi/fastmap.c @@ -326,6 +326,7 @@ static int update_vol(struct ubi_device *ubi, struct ubi_attach_info *ai, aeb->pnum = new_aeb->pnum; aeb->copy_flag = new_vh->copy_flag; aeb->scrub = new_aeb->scrub; + aeb->sqnum = new_aeb->sqnum; kmem_cache_free(ai->aeb_slab_cache, new_aeb); /* new_aeb is older */ @@ -851,27 +852,57 @@ fail: } /** + * find_fm_anchor - find the most recent Fastmap superblock (anchor) + * @ai: UBI attach info to be filled + */ +static int find_fm_anchor(struct ubi_attach_info *ai) +{ + int ret = -1; + struct ubi_ainf_peb *aeb; + unsigned long long max_sqnum = 0; + + list_for_each_entry(aeb, &ai->fastmap, u.list) { + if (aeb->vol_id == UBI_FM_SB_VOLUME_ID && aeb->sqnum > max_sqnum) { + max_sqnum = aeb->sqnum; + ret = aeb->pnum; + } + } + + return ret; +} + +/** * ubi_scan_fastmap - scan the fastmap. * @ubi: UBI device object * @ai: UBI attach info to be filled - * @fm_anchor: The fastmap starts at this PEB + * @scan_ai: UBI attach info from the first 64 PEBs, + * used to find the most recent Fastmap data structure * * Returns 0 on success, UBI_NO_FASTMAP if no fastmap was found, * UBI_BAD_FASTMAP if one was found but is not usable. * < 0 indicates an internal error. */ int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai, - int fm_anchor) + struct ubi_attach_info *scan_ai) { struct ubi_fm_sb *fmsb, *fmsb2; struct ubi_vid_hdr *vh; struct ubi_ec_hdr *ech; struct ubi_fastmap_layout *fm; - int i, used_blocks, pnum, ret = 0; + struct ubi_ainf_peb *tmp_aeb, *aeb; + int i, used_blocks, pnum, fm_anchor, ret = 0; size_t fm_size; __be32 crc, tmp_crc; unsigned long long sqnum = 0; + fm_anchor = find_fm_anchor(scan_ai); + if (fm_anchor < 0) + return UBI_NO_FASTMAP; + + /* Move all (possible) fastmap blocks into our new attach structure. */ + list_for_each_entry_safe(aeb, tmp_aeb, &scan_ai->fastmap, u.list) + list_move_tail(&aeb->u.list, &ai->fastmap); + down_write(&ubi->fm_protect); memset(ubi->fm_buf, 0, ubi->fm_size); @@ -1484,22 +1515,30 @@ int ubi_update_fastmap(struct ubi_device *ubi) struct ubi_wl_entry *tmp_e; down_write(&ubi->fm_protect); + down_write(&ubi->work_sem); + down_write(&ubi->fm_eba_sem); ubi_refill_pools(ubi); if (ubi->ro_mode || ubi->fm_disabled) { + up_write(&ubi->fm_eba_sem); + up_write(&ubi->work_sem); up_write(&ubi->fm_protect); return 0; } ret = ubi_ensure_anchor_pebs(ubi); if (ret) { + up_write(&ubi->fm_eba_sem); + up_write(&ubi->work_sem); up_write(&ubi->fm_protect); return ret; } new_fm = kzalloc(sizeof(*new_fm), GFP_KERNEL); if (!new_fm) { + up_write(&ubi->fm_eba_sem); + up_write(&ubi->work_sem); up_write(&ubi->fm_protect); return -ENOMEM; } @@ -1608,16 +1647,14 @@ int ubi_update_fastmap(struct ubi_device *ubi) new_fm->e[0] = tmp_e; } - down_write(&ubi->work_sem); - down_write(&ubi->fm_eba_sem); ret = ubi_write_fastmap(ubi, new_fm); - up_write(&ubi->fm_eba_sem); - up_write(&ubi->work_sem); if (ret) goto err; out_unlock: + up_write(&ubi->fm_eba_sem); + up_write(&ubi->work_sem); up_write(&ubi->fm_protect); kfree(old_fm); return ret; diff --git a/drivers/mtd/ubi/ubi.h b/drivers/mtd/ubi/ubi.h index bdb885d9d3fc..61f039d3289e 100644 --- a/drivers/mtd/ubi/ubi.h +++ b/drivers/mtd/ubi/ubi.h @@ -705,6 +705,8 @@ struct ubi_ainf_volume { * @erase: list of physical eraseblocks which have to be erased * @alien: list of physical eraseblocks which should not be used by UBI (e.g., * those belonging to "preserve"-compatible internal volumes) + * @fastmap: list of physical eraseblocks which relate to fastmap (e.g., + * eraseblocks of the current and not yet erased old fastmap blocks) * @corr_peb_count: count of PEBs in the @corr list * @empty_peb_count: count of PEBs which are presumably empty (contain only * 0xFF bytes) @@ -715,6 +717,8 @@ struct ubi_ainf_volume { * @vols_found: number of volumes found * @highest_vol_id: highest volume ID * @is_empty: flag indicating whether the MTD device is empty or not + * @force_full_scan: flag indicating whether we need to do a full scan and drop + all existing Fastmap data structures * @min_ec: lowest erase counter value * @max_ec: highest erase counter value * @max_sqnum: highest sequence number value @@ -733,6 +737,7 @@ struct ubi_attach_info { struct list_head free; struct list_head erase; struct list_head alien; + struct list_head fastmap; int corr_peb_count; int empty_peb_count; int alien_peb_count; @@ -741,6 +746,7 @@ struct ubi_attach_info { int vols_found; int highest_vol_id; int is_empty; + int force_full_scan; int min_ec; int max_ec; unsigned long long max_sqnum; @@ -919,7 +925,7 @@ int ubi_compare_lebs(struct ubi_device *ubi, const struct ubi_ainf_peb *aeb, size_t ubi_calc_fm_size(struct ubi_device *ubi); int ubi_update_fastmap(struct ubi_device *ubi); int ubi_scan_fastmap(struct ubi_device *ubi, struct ubi_attach_info *ai, - int fm_anchor); + struct ubi_attach_info *scan_ai); #else static inline int ubi_update_fastmap(struct ubi_device *ubi) { return 0; } #endif @@ -1113,4 +1119,42 @@ static inline int idx2vol_id(const struct ubi_device *ubi, int idx) return idx; } +/** + * ubi_is_fm_vol - check whether a volume ID is a Fastmap volume. + * @vol_id: volume ID + */ +static inline bool ubi_is_fm_vol(int vol_id) +{ + switch (vol_id) { + case UBI_FM_SB_VOLUME_ID: + case UBI_FM_DATA_VOLUME_ID: + return true; + } + + return false; +} + +/** + * ubi_find_fm_block - check whether a PEB is part of the current Fastmap. + * @ubi: UBI device description object + * @pnum: physical eraseblock to look for + * + * This function returns a wear leveling object if @pnum relates to the current + * fastmap, @NULL otherwise. + */ +static inline struct ubi_wl_entry *ubi_find_fm_block(const struct ubi_device *ubi, + int pnum) +{ + int i; + + if (ubi->fm) { + for (i = 0; i < ubi->fm->used_blocks; i++) { + if (ubi->fm->e[i]->pnum == pnum) + return ubi->fm->e[i]; + } + } + + return NULL; +} + #endif /* !__UBI_UBI_H__ */ diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 5e65ab837e99..ffcb64bc2bf8 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -599,7 +599,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk, * failure. */ static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e, - int vol_id, int lnum, int torture) + int vol_id, int lnum, int torture, bool nested) { struct ubi_work *wl_wrk; @@ -618,7 +618,10 @@ static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e, wl_wrk->lnum = lnum; wl_wrk->torture = torture; - schedule_ubi_work(ubi, wl_wrk); + if (nested) + __schedule_ubi_work(ubi, wl_wrk); + else + schedule_ubi_work(ubi, wl_wrk); return 0; } @@ -679,6 +682,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk, if (!vid_hdr) return -ENOMEM; + down_read(&ubi->fm_eba_sem); mutex_lock(&ubi->move_mutex); spin_lock(&ubi->wl_lock); ubi_assert(!ubi->move_from && !ubi->move_to); @@ -919,6 +923,7 @@ static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk, dbg_wl("done"); mutex_unlock(&ubi->move_mutex); + up_read(&ubi->fm_eba_sem); return 0; /* @@ -969,6 +974,7 @@ out_not_moved: } mutex_unlock(&ubi->move_mutex); + up_read(&ubi->fm_eba_sem); return 0; out_error: @@ -990,6 +996,7 @@ out_error: out_ro: ubi_ro_mode(ubi); mutex_unlock(&ubi->move_mutex); + up_read(&ubi->fm_eba_sem); ubi_assert(err != 0); return err < 0 ? err : -EIO; @@ -997,6 +1004,7 @@ out_cancel: ubi->wl_scheduled = 0; spin_unlock(&ubi->wl_lock); mutex_unlock(&ubi->move_mutex); + up_read(&ubi->fm_eba_sem); ubi_free_vid_hdr(ubi, vid_hdr); return 0; } @@ -1119,7 +1127,7 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk) int err1; /* Re-schedule the LEB for erasure */ - err1 = schedule_erase(ubi, e, vol_id, lnum, 0); + err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false); if (err1) { wl_entry_destroy(ubi, e); err = err1; @@ -1315,7 +1323,7 @@ retry: } spin_unlock(&ubi->wl_lock); - err = schedule_erase(ubi, e, vol_id, lnum, torture); + err = schedule_erase(ubi, e, vol_id, lnum, torture, false); if (err) { spin_lock(&ubi->wl_lock); wl_tree_add(e, &ubi->used); @@ -1751,6 +1759,48 @@ static void shutdown_work(struct ubi_device *ubi) } /** + * erase_aeb - erase a PEB given in UBI attach info PEB + * @ubi: UBI device description object + * @aeb: UBI attach info PEB + * @sync: If true, erase synchronously. Otherwise schedule for erasure + */ +static int erase_aeb(struct ubi_device *ubi, struct ubi_ainf_peb *aeb, bool sync) +{ + struct ubi_wl_entry *e; + int err; + + e = kmem_cache_alloc(ubi_wl_entry_slab, GFP_KERNEL); + if (!e) + return -ENOMEM; + + e->pnum = aeb->pnum; + e->ec = aeb->ec; + e->tagged_scrub_all = 0; + e->sqnum = aeb->sqnum; + ubi->lookuptbl[e->pnum] = e; + + if (sync) { + err = sync_erase(ubi, e, false); + if (err) + goto out_free; + + wl_tree_add(e, &ubi->free); + ubi->free_count++; + } else { + err = schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0, false); + if (err) + goto out_free; + } + + return 0; + +out_free: + wl_entry_destroy(ubi, e); + + return err; +} + +/** * ubi_wl_init - initialize the WL sub-system using attaching information. * @ubi: UBI device description object * @ai: attaching information @@ -1787,19 +1837,9 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai) list_for_each_entry_safe(aeb, tmp, &ai->erase, u.list) { cond_resched(); - e = kmem_cache_alloc(ubi_wl_entry_slab, GFP_KERNEL); - if (!e) - goto out_free; - - e->pnum = aeb->pnum; - e->ec = aeb->ec; - e->tagged_scrub_all = 0; - e->sqnum = aeb->sqnum; - ubi->lookuptbl[e->pnum] = e; - if (schedule_erase(ubi, e, aeb->vol_id, aeb->lnum, 0)) { - wl_entry_destroy(ubi, e); + err = erase_aeb(ubi, aeb, false); + if (err) goto out_free; - } found_pebs++; } @@ -1854,19 +1894,49 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai) } } - dbg_wl("found %i PEBs", found_pebs); + list_for_each_entry(aeb, &ai->fastmap, u.list) { + cond_resched(); - if (ubi->fm) { - ubi_assert(ubi->good_peb_count == - found_pebs + ubi->fm->used_blocks); + e = ubi_find_fm_block(ubi, aeb->pnum); - for (i = 0; i < ubi->fm->used_blocks; i++) { - e = ubi->fm->e[i]; + if (e) { + ubi_assert(!ubi->lookuptbl[e->pnum]); ubi->lookuptbl[e->pnum] = e; + } else { + bool sync = false; + + /* + * Usually old Fastmap PEBs are scheduled for erasure + * and we don't have to care about them but if we face + * an power cut before scheduling them we need to + * take care of them here. + */ + if (ubi->lookuptbl[aeb->pnum]) + continue; + + /* + * The fastmap update code might not find a free PEB for + * writing the fastmap anchor to and then reuses the + * current fastmap anchor PEB. When this PEB gets erased + * and a power cut happens before it is written again we + * must make sure that the fastmap attach code doesn't + * find any outdated fastmap anchors, hence we erase the + * outdated fastmap anchor PEBs synchronously here. + */ + if (aeb->vol_id == UBI_FM_SB_VOLUME_ID) + sync = true; + + err = erase_aeb(ubi, aeb, sync); + if (err) + goto out_free; } + + found_pebs++; } - else - ubi_assert(ubi->good_peb_count == found_pebs); + + dbg_wl("found %i PEBs", found_pebs); + + ubi_assert(ubi->good_peb_count == found_pebs); reserved_pebs = WL_RESERVED_PEBS; ubi_fastmap_init(ubi, &reserved_pebs); diff --git a/drivers/net/can/usb/ems_usb.c b/drivers/net/can/usb/ems_usb.c index 357c9e89fdf9..047348033e27 100644 --- a/drivers/net/can/usb/ems_usb.c +++ b/drivers/net/can/usb/ems_usb.c @@ -1078,6 +1078,7 @@ static void ems_usb_disconnect(struct usb_interface *intf) usb_free_urb(dev->intr_urb); kfree(dev->intr_in_buffer); + kfree(dev->tx_msg_buffer); } } diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c index 51670b322409..700b98d9c250 100644 --- a/drivers/net/can/xilinx_can.c +++ b/drivers/net/can/xilinx_can.c @@ -2,6 +2,7 @@ * * Copyright (C) 2012 - 2014 Xilinx, Inc. * Copyright (C) 2009 PetaLogix. All rights reserved. + * Copyright (C) 2017 Sandvik Mining and Construction Oy * * Description: * This driver is developed for Axi CAN IP and for Zynq CANPS Controller. @@ -25,8 +26,10 @@ #include <linux/module.h> #include <linux/netdevice.h> #include <linux/of.h> +#include <linux/of_device.h> #include <linux/platform_device.h> #include <linux/skbuff.h> +#include <linux/spinlock.h> #include <linux/string.h> #include <linux/types.h> #include <linux/can/dev.h> @@ -100,7 +103,7 @@ enum xcan_reg { #define XCAN_INTR_ALL (XCAN_IXR_TXOK_MASK | XCAN_IXR_BSOFF_MASK |\ XCAN_IXR_WKUP_MASK | XCAN_IXR_SLP_MASK | \ XCAN_IXR_RXNEMP_MASK | XCAN_IXR_ERROR_MASK | \ - XCAN_IXR_ARBLST_MASK | XCAN_IXR_RXOK_MASK) + XCAN_IXR_RXOFLW_MASK | XCAN_IXR_ARBLST_MASK) /* CAN register bit shift - XCAN_<REG>_<BIT>_SHIFT */ #define XCAN_BTR_SJW_SHIFT 7 /* Synchronous jump width */ @@ -117,6 +120,7 @@ enum xcan_reg { /** * struct xcan_priv - This definition define CAN driver instance * @can: CAN private data structure. + * @tx_lock: Lock for synchronizing TX interrupt handling * @tx_head: Tx CAN packets ready to send on the queue * @tx_tail: Tx CAN packets successfully sended on the queue * @tx_max: Maximum number packets the driver can send @@ -131,6 +135,7 @@ enum xcan_reg { */ struct xcan_priv { struct can_priv can; + spinlock_t tx_lock; unsigned int tx_head; unsigned int tx_tail; unsigned int tx_max; @@ -158,6 +163,11 @@ static const struct can_bittiming_const xcan_bittiming_const = { .brp_inc = 1, }; +#define XCAN_CAP_WATERMARK 0x0001 +struct xcan_devtype_data { + unsigned int caps; +}; + /** * xcan_write_reg_le - Write a value to the device register little endian * @priv: Driver private data structure @@ -237,6 +247,10 @@ static int set_reset_mode(struct net_device *ndev) usleep_range(500, 10000); } + /* reset clears FIFOs */ + priv->tx_head = 0; + priv->tx_tail = 0; + return 0; } @@ -391,6 +405,7 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev) struct net_device_stats *stats = &ndev->stats; struct can_frame *cf = (struct can_frame *)skb->data; u32 id, dlc, data[2] = {0, 0}; + unsigned long flags; if (can_dropped_invalid_skb(ndev, skb)) return NETDEV_TX_OK; @@ -438,6 +453,9 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev) data[1] = be32_to_cpup((__be32 *)(cf->data + 4)); can_put_echo_skb(skb, ndev, priv->tx_head % priv->tx_max); + + spin_lock_irqsave(&priv->tx_lock, flags); + priv->tx_head++; /* Write the Frame to Xilinx CAN TX FIFO */ @@ -453,10 +471,16 @@ static int xcan_start_xmit(struct sk_buff *skb, struct net_device *ndev) stats->tx_bytes += cf->can_dlc; } + /* Clear TX-FIFO-empty interrupt for xcan_tx_interrupt() */ + if (priv->tx_max > 1) + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXFEMP_MASK); + /* Check if the TX buffer is full */ if ((priv->tx_head - priv->tx_tail) == priv->tx_max) netif_stop_queue(ndev); + spin_unlock_irqrestore(&priv->tx_lock, flags); + return NETDEV_TX_OK; } @@ -529,6 +553,123 @@ static int xcan_rx(struct net_device *ndev) } /** + * xcan_current_error_state - Get current error state from HW + * @ndev: Pointer to net_device structure + * + * Checks the current CAN error state from the HW. Note that this + * only checks for ERROR_PASSIVE and ERROR_WARNING. + * + * Return: + * ERROR_PASSIVE or ERROR_WARNING if either is active, ERROR_ACTIVE + * otherwise. + */ +static enum can_state xcan_current_error_state(struct net_device *ndev) +{ + struct xcan_priv *priv = netdev_priv(ndev); + u32 status = priv->read_reg(priv, XCAN_SR_OFFSET); + + if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK) + return CAN_STATE_ERROR_PASSIVE; + else if (status & XCAN_SR_ERRWRN_MASK) + return CAN_STATE_ERROR_WARNING; + else + return CAN_STATE_ERROR_ACTIVE; +} + +/** + * xcan_set_error_state - Set new CAN error state + * @ndev: Pointer to net_device structure + * @new_state: The new CAN state to be set + * @cf: Error frame to be populated or NULL + * + * Set new CAN error state for the device, updating statistics and + * populating the error frame if given. + */ +static void xcan_set_error_state(struct net_device *ndev, + enum can_state new_state, + struct can_frame *cf) +{ + struct xcan_priv *priv = netdev_priv(ndev); + u32 ecr = priv->read_reg(priv, XCAN_ECR_OFFSET); + u32 txerr = ecr & XCAN_ECR_TEC_MASK; + u32 rxerr = (ecr & XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT; + + priv->can.state = new_state; + + if (cf) { + cf->can_id |= CAN_ERR_CRTL; + cf->data[6] = txerr; + cf->data[7] = rxerr; + } + + switch (new_state) { + case CAN_STATE_ERROR_PASSIVE: + priv->can.can_stats.error_passive++; + if (cf) + cf->data[1] = (rxerr > 127) ? + CAN_ERR_CRTL_RX_PASSIVE : + CAN_ERR_CRTL_TX_PASSIVE; + break; + case CAN_STATE_ERROR_WARNING: + priv->can.can_stats.error_warning++; + if (cf) + cf->data[1] |= (txerr > rxerr) ? + CAN_ERR_CRTL_TX_WARNING : + CAN_ERR_CRTL_RX_WARNING; + break; + case CAN_STATE_ERROR_ACTIVE: + if (cf) + cf->data[1] |= CAN_ERR_CRTL_ACTIVE; + break; + default: + /* non-ERROR states are handled elsewhere */ + WARN_ON(1); + break; + } +} + +/** + * xcan_update_error_state_after_rxtx - Update CAN error state after RX/TX + * @ndev: Pointer to net_device structure + * + * If the device is in a ERROR-WARNING or ERROR-PASSIVE state, check if + * the performed RX/TX has caused it to drop to a lesser state and set + * the interface state accordingly. + */ +static void xcan_update_error_state_after_rxtx(struct net_device *ndev) +{ + struct xcan_priv *priv = netdev_priv(ndev); + enum can_state old_state = priv->can.state; + enum can_state new_state; + + /* changing error state due to successful frame RX/TX can only + * occur from these states + */ + if (old_state != CAN_STATE_ERROR_WARNING && + old_state != CAN_STATE_ERROR_PASSIVE) + return; + + new_state = xcan_current_error_state(ndev); + + if (new_state != old_state) { + struct sk_buff *skb; + struct can_frame *cf; + + skb = alloc_can_err_skb(ndev, &cf); + + xcan_set_error_state(ndev, new_state, skb ? cf : NULL); + + if (skb) { + struct net_device_stats *stats = &ndev->stats; + + stats->rx_packets++; + stats->rx_bytes += cf->can_dlc; + netif_rx(skb); + } + } +} + +/** * xcan_err_interrupt - error frame Isr * @ndev: net_device pointer * @isr: interrupt status register value @@ -543,16 +684,12 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr) struct net_device_stats *stats = &ndev->stats; struct can_frame *cf; struct sk_buff *skb; - u32 err_status, status, txerr = 0, rxerr = 0; + u32 err_status; skb = alloc_can_err_skb(ndev, &cf); err_status = priv->read_reg(priv, XCAN_ESR_OFFSET); priv->write_reg(priv, XCAN_ESR_OFFSET, err_status); - txerr = priv->read_reg(priv, XCAN_ECR_OFFSET) & XCAN_ECR_TEC_MASK; - rxerr = ((priv->read_reg(priv, XCAN_ECR_OFFSET) & - XCAN_ECR_REC_MASK) >> XCAN_ESR_REC_SHIFT); - status = priv->read_reg(priv, XCAN_SR_OFFSET); if (isr & XCAN_IXR_BSOFF_MASK) { priv->can.state = CAN_STATE_BUS_OFF; @@ -562,28 +699,10 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr) can_bus_off(ndev); if (skb) cf->can_id |= CAN_ERR_BUSOFF; - } else if ((status & XCAN_SR_ESTAT_MASK) == XCAN_SR_ESTAT_MASK) { - priv->can.state = CAN_STATE_ERROR_PASSIVE; - priv->can.can_stats.error_passive++; - if (skb) { - cf->can_id |= CAN_ERR_CRTL; - cf->data[1] = (rxerr > 127) ? - CAN_ERR_CRTL_RX_PASSIVE : - CAN_ERR_CRTL_TX_PASSIVE; - cf->data[6] = txerr; - cf->data[7] = rxerr; - } - } else if (status & XCAN_SR_ERRWRN_MASK) { - priv->can.state = CAN_STATE_ERROR_WARNING; - priv->can.can_stats.error_warning++; - if (skb) { - cf->can_id |= CAN_ERR_CRTL; - cf->data[1] |= (txerr > rxerr) ? - CAN_ERR_CRTL_TX_WARNING : - CAN_ERR_CRTL_RX_WARNING; - cf->data[6] = txerr; - cf->data[7] = rxerr; - } + } else { + enum can_state new_state = xcan_current_error_state(ndev); + + xcan_set_error_state(ndev, new_state, skb ? cf : NULL); } /* Check for Arbitration lost interrupt */ @@ -599,7 +718,6 @@ static void xcan_err_interrupt(struct net_device *ndev, u32 isr) if (isr & XCAN_IXR_RXOFLW_MASK) { stats->rx_over_errors++; stats->rx_errors++; - priv->write_reg(priv, XCAN_SRR_OFFSET, XCAN_SRR_RESET_MASK); if (skb) { cf->can_id |= CAN_ERR_CRTL; cf->data[1] |= CAN_ERR_CRTL_RX_OVERFLOW; @@ -708,26 +826,20 @@ static int xcan_rx_poll(struct napi_struct *napi, int quota) isr = priv->read_reg(priv, XCAN_ISR_OFFSET); while ((isr & XCAN_IXR_RXNEMP_MASK) && (work_done < quota)) { - if (isr & XCAN_IXR_RXOK_MASK) { - priv->write_reg(priv, XCAN_ICR_OFFSET, - XCAN_IXR_RXOK_MASK); - work_done += xcan_rx(ndev); - } else { - priv->write_reg(priv, XCAN_ICR_OFFSET, - XCAN_IXR_RXNEMP_MASK); - break; - } + work_done += xcan_rx(ndev); priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_RXNEMP_MASK); isr = priv->read_reg(priv, XCAN_ISR_OFFSET); } - if (work_done) + if (work_done) { can_led_event(ndev, CAN_LED_EVENT_RX); + xcan_update_error_state_after_rxtx(ndev); + } if (work_done < quota) { napi_complete(napi); ier = priv->read_reg(priv, XCAN_IER_OFFSET); - ier |= (XCAN_IXR_RXOK_MASK | XCAN_IXR_RXNEMP_MASK); + ier |= XCAN_IXR_RXNEMP_MASK; priv->write_reg(priv, XCAN_IER_OFFSET, ier); } return work_done; @@ -742,18 +854,71 @@ static void xcan_tx_interrupt(struct net_device *ndev, u32 isr) { struct xcan_priv *priv = netdev_priv(ndev); struct net_device_stats *stats = &ndev->stats; + unsigned int frames_in_fifo; + int frames_sent = 1; /* TXOK => at least 1 frame was sent */ + unsigned long flags; + int retries = 0; + + /* Synchronize with xmit as we need to know the exact number + * of frames in the FIFO to stay in sync due to the TXFEMP + * handling. + * This also prevents a race between netif_wake_queue() and + * netif_stop_queue(). + */ + spin_lock_irqsave(&priv->tx_lock, flags); - while ((priv->tx_head - priv->tx_tail > 0) && - (isr & XCAN_IXR_TXOK_MASK)) { + frames_in_fifo = priv->tx_head - priv->tx_tail; + + if (WARN_ON_ONCE(frames_in_fifo == 0)) { + /* clear TXOK anyway to avoid getting back here */ priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); + spin_unlock_irqrestore(&priv->tx_lock, flags); + return; + } + + /* Check if 2 frames were sent (TXOK only means that at least 1 + * frame was sent). + */ + if (frames_in_fifo > 1) { + WARN_ON(frames_in_fifo > priv->tx_max); + + /* Synchronize TXOK and isr so that after the loop: + * (1) isr variable is up-to-date at least up to TXOK clear + * time. This avoids us clearing a TXOK of a second frame + * but not noticing that the FIFO is now empty and thus + * marking only a single frame as sent. + * (2) No TXOK is left. Having one could mean leaving a + * stray TXOK as we might process the associated frame + * via TXFEMP handling as we read TXFEMP *after* TXOK + * clear to satisfy (1). + */ + while ((isr & XCAN_IXR_TXOK_MASK) && !WARN_ON(++retries == 100)) { + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); + isr = priv->read_reg(priv, XCAN_ISR_OFFSET); + } + + if (isr & XCAN_IXR_TXFEMP_MASK) { + /* nothing in FIFO anymore */ + frames_sent = frames_in_fifo; + } + } else { + /* single frame in fifo, just clear TXOK */ + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); + } + + while (frames_sent--) { can_get_echo_skb(ndev, priv->tx_tail % priv->tx_max); priv->tx_tail++; stats->tx_packets++; - isr = priv->read_reg(priv, XCAN_ISR_OFFSET); } - can_led_event(ndev, CAN_LED_EVENT_TX); + netif_wake_queue(ndev); + + spin_unlock_irqrestore(&priv->tx_lock, flags); + + can_led_event(ndev, CAN_LED_EVENT_TX); + xcan_update_error_state_after_rxtx(ndev); } /** @@ -772,6 +937,7 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id) struct net_device *ndev = (struct net_device *)dev_id; struct xcan_priv *priv = netdev_priv(ndev); u32 isr, ier; + u32 isr_errors; /* Get the interrupt status from Xilinx CAN */ isr = priv->read_reg(priv, XCAN_ISR_OFFSET); @@ -790,18 +956,17 @@ static irqreturn_t xcan_interrupt(int irq, void *dev_id) xcan_tx_interrupt(ndev, isr); /* Check for the type of error interrupt and Processing it */ - if (isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK | - XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK)) { - priv->write_reg(priv, XCAN_ICR_OFFSET, (XCAN_IXR_ERROR_MASK | - XCAN_IXR_RXOFLW_MASK | XCAN_IXR_BSOFF_MASK | - XCAN_IXR_ARBLST_MASK)); + isr_errors = isr & (XCAN_IXR_ERROR_MASK | XCAN_IXR_RXOFLW_MASK | + XCAN_IXR_BSOFF_MASK | XCAN_IXR_ARBLST_MASK); + if (isr_errors) { + priv->write_reg(priv, XCAN_ICR_OFFSET, isr_errors); xcan_err_interrupt(ndev, isr); } /* Check for the type of receive interrupt and Processing it */ - if (isr & (XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK)) { + if (isr & XCAN_IXR_RXNEMP_MASK) { ier = priv->read_reg(priv, XCAN_IER_OFFSET); - ier &= ~(XCAN_IXR_RXNEMP_MASK | XCAN_IXR_RXOK_MASK); + ier &= ~XCAN_IXR_RXNEMP_MASK; priv->write_reg(priv, XCAN_IER_OFFSET, ier); napi_schedule(&priv->napi); } @@ -1030,6 +1195,18 @@ static int __maybe_unused xcan_resume(struct device *dev) static SIMPLE_DEV_PM_OPS(xcan_dev_pm_ops, xcan_suspend, xcan_resume); +static const struct xcan_devtype_data xcan_zynq_data = { + .caps = XCAN_CAP_WATERMARK, +}; + +/* Match table for OF platform binding */ +static const struct of_device_id xcan_of_match[] = { + { .compatible = "xlnx,zynq-can-1.0", .data = &xcan_zynq_data }, + { .compatible = "xlnx,axi-can-1.00.a", }, + { /* end of list */ }, +}; +MODULE_DEVICE_TABLE(of, xcan_of_match); + /** * xcan_probe - Platform registration call * @pdev: Handle to the platform device structure @@ -1044,8 +1221,10 @@ static int xcan_probe(struct platform_device *pdev) struct resource *res; /* IO mem resources */ struct net_device *ndev; struct xcan_priv *priv; + const struct of_device_id *of_id; + int caps = 0; void __iomem *addr; - int ret, rx_max, tx_max; + int ret, rx_max, tx_max, tx_fifo_depth; /* Get the virtual base address for the device */ res = platform_get_resource(pdev, IORESOURCE_MEM, 0); @@ -1055,7 +1234,8 @@ static int xcan_probe(struct platform_device *pdev) goto err; } - ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", &tx_max); + ret = of_property_read_u32(pdev->dev.of_node, "tx-fifo-depth", + &tx_fifo_depth); if (ret < 0) goto err; @@ -1063,6 +1243,30 @@ static int xcan_probe(struct platform_device *pdev) if (ret < 0) goto err; + of_id = of_match_device(xcan_of_match, &pdev->dev); + if (of_id) { + const struct xcan_devtype_data *devtype_data = of_id->data; + + if (devtype_data) + caps = devtype_data->caps; + } + + /* There is no way to directly figure out how many frames have been + * sent when the TXOK interrupt is processed. If watermark programming + * is supported, we can have 2 frames in the FIFO and use TXFEMP + * to determine if 1 or 2 frames have been sent. + * Theoretically we should be able to use TXFWMEMP to determine up + * to 3 frames, but it seems that after putting a second frame in the + * FIFO, with watermark at 2 frames, it can happen that TXFWMEMP (less + * than 2 frames in FIFO) is set anyway with no TXOK (a frame was + * sent), which is not a sensible state - possibly TXFWMEMP is not + * completely synchronized with the rest of the bits? + */ + if (caps & XCAN_CAP_WATERMARK) + tx_max = min(tx_fifo_depth, 2); + else + tx_max = 1; + /* Create a CAN device instance */ ndev = alloc_candev(sizeof(struct xcan_priv), tx_max); if (!ndev) @@ -1077,6 +1281,7 @@ static int xcan_probe(struct platform_device *pdev) CAN_CTRLMODE_BERR_REPORTING; priv->reg_base = addr; priv->tx_max = tx_max; + spin_lock_init(&priv->tx_lock); /* Get IRQ for the device */ ndev->irq = platform_get_irq(pdev, 0); @@ -1144,9 +1349,9 @@ static int xcan_probe(struct platform_device *pdev) devm_can_led_init(ndev); clk_disable_unprepare(priv->bus_clk); clk_disable_unprepare(priv->can_clk); - netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth:%d\n", + netdev_dbg(ndev, "reg_base=0x%p irq=%d clock=%d, tx fifo depth: actual %d, using %d\n", priv->reg_base, ndev->irq, priv->can.clock.freq, - priv->tx_max); + tx_fifo_depth, priv->tx_max); return 0; @@ -1182,14 +1387,6 @@ static int xcan_remove(struct platform_device *pdev) return 0; } -/* Match table for OF platform binding */ -static const struct of_device_id xcan_of_match[] = { - { .compatible = "xlnx,zynq-can-1.0", }, - { .compatible = "xlnx,axi-can-1.00.a", }, - { /* end of list */ }, -}; -MODULE_DEVICE_TABLE(of, xcan_of_match); - static struct platform_driver xcan_driver = { .probe = xcan_probe, .remove = xcan_remove, diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c index 446058081866..7a0ab4c44ee4 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c @@ -872,14 +872,14 @@ static void xgbe_phy_adjust_link(struct xgbe_prv_data *pdata) if (pdata->tx_pause != pdata->phy.tx_pause) { new_state = 1; - pdata->hw_if.config_tx_flow_control(pdata); pdata->tx_pause = pdata->phy.tx_pause; + pdata->hw_if.config_tx_flow_control(pdata); } if (pdata->rx_pause != pdata->phy.rx_pause) { new_state = 1; - pdata->hw_if.config_rx_flow_control(pdata); pdata->rx_pause = pdata->phy.rx_pause; + pdata->hw_if.config_rx_flow_control(pdata); } /* Speed support */ diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c index 8b1929e9f698..ec5834087e4b 100644 --- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c +++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c @@ -1063,7 +1063,8 @@ static int bcm_enet_open(struct net_device *dev) val = enet_readl(priv, ENET_CTL_REG); val |= ENET_CTL_ENABLE_MASK; enet_writel(priv, val, ENET_CTL_REG); - enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); + if (priv->dma_has_sram) + enet_dma_writel(priv, ENETDMA_CFG_EN_MASK, ENETDMA_CFG_REG); enet_dmac_writel(priv, priv->dma_chan_en_mask, ENETDMAC_CHANCFG, priv->rx_chan); @@ -1787,7 +1788,9 @@ static int bcm_enet_probe(struct platform_device *pdev) ret = PTR_ERR(priv->mac_clk); goto out; } - clk_prepare_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk_mac; /* initialize default and fetch platform data */ priv->rx_ring_size = BCMENET_DEF_RX_DESC; @@ -1819,9 +1822,11 @@ static int bcm_enet_probe(struct platform_device *pdev) if (IS_ERR(priv->phy_clk)) { ret = PTR_ERR(priv->phy_clk); priv->phy_clk = NULL; - goto out_put_clk_mac; + goto out_disable_clk_mac; } - clk_prepare_enable(priv->phy_clk); + ret = clk_prepare_enable(priv->phy_clk); + if (ret) + goto out_put_clk_phy; } /* do minimal hardware init to be able to probe mii bus */ @@ -1921,13 +1926,16 @@ out_free_mdio: out_uninit_hw: /* turn off mdc clock */ enet_writel(priv, 0, ENET_MIISC_REG); - if (priv->phy_clk) { + if (priv->phy_clk) clk_disable_unprepare(priv->phy_clk); + +out_put_clk_phy: + if (priv->phy_clk) clk_put(priv->phy_clk); - } -out_put_clk_mac: +out_disable_clk_mac: clk_disable_unprepare(priv->mac_clk); +out_put_clk_mac: clk_put(priv->mac_clk); out: free_netdev(dev); @@ -2772,7 +2780,9 @@ static int bcm_enetsw_probe(struct platform_device *pdev) ret = PTR_ERR(priv->mac_clk); goto out_unmap; } - clk_enable(priv->mac_clk); + ret = clk_prepare_enable(priv->mac_clk); + if (ret) + goto out_put_clk; priv->rx_chan = 0; priv->tx_chan = 1; @@ -2793,7 +2803,7 @@ static int bcm_enetsw_probe(struct platform_device *pdev) ret = register_netdev(dev); if (ret) - goto out_put_clk; + goto out_disable_clk; netif_carrier_off(dev); platform_set_drvdata(pdev, dev); @@ -2802,6 +2812,9 @@ static int bcm_enetsw_probe(struct platform_device *pdev) return 0; +out_disable_clk: + clk_disable_unprepare(priv->mac_clk); + out_put_clk: clk_put(priv->mac_clk); @@ -2833,6 +2846,9 @@ static int bcm_enetsw_remove(struct platform_device *pdev) res = platform_get_resource(pdev, IORESOURCE_MEM, 0); release_mem_region(res->start, resource_size(res)); + clk_disable_unprepare(priv->mac_clk); + clk_put(priv->mac_clk); + free_netdev(dev); return 0; } diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 1325825d5225..ce3a56bea6e6 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -9278,6 +9278,15 @@ static int tg3_chip_reset(struct tg3 *tp) tg3_restore_clk(tp); + /* Increase the core clock speed to fix tx timeout issue for 5762 + * with 100Mbps link speed. + */ + if (tg3_asic_rev(tp) == ASIC_REV_5762) { + val = tr32(TG3_CPMU_CLCK_ORIDE_ENABLE); + tw32(TG3_CPMU_CLCK_ORIDE_ENABLE, val | + TG3_CPMU_MAC_ORIDE_ENABLE); + } + /* Reprobe ASF enable state. */ tg3_flag_clear(tp, ENABLE_ASF); tp->phy_flags &= ~(TG3_PHYFLG_1G_ON_VAUX_OK | diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c index 8f7aa53a4c4b..7ae8374bff13 100644 --- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c +++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c @@ -50,6 +50,7 @@ #include <linux/stringify.h> #include <linux/sched.h> #include <linux/slab.h> +#include <linux/nospec.h> #include <asm/uaccess.h> #include "common.h" @@ -2256,6 +2257,7 @@ static int cxgb_extension_ioctl(struct net_device *dev, void __user *useraddr) if (t.qset_idx >= nqsets) return -EINVAL; + t.qset_idx = array_index_nospec(t.qset_idx, nqsets); q = &adapter->params.sge.qset[q1 + t.qset_idx]; t.rspq_size = q->rspq_size; diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c index e3080fbd9d00..7911dc3da98e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c +++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c @@ -2891,7 +2891,7 @@ int mlx4_RST2INIT_QP_wrapper(struct mlx4_dev *dev, int slave, u32 srqn = qp_get_srqn(qpc) & 0xffffff; int use_srq = (qp_get_srqn(qpc) >> 24) & 1; struct res_srq *srq; - int local_qpn = be32_to_cpu(qpc->local_qpn) & 0xffffff; + int local_qpn = vhcr->in_modifier & 0xffffff; err = adjust_qp_sched_queue(dev, slave, qpc, inbox); if (err) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 16bd585365a8..9ac14df0ca3b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -643,6 +643,7 @@ static void cmd_work_handler(struct work_struct *work) struct semaphore *sem; unsigned long flags; int alloc_ret; + int cmd_mode; sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem; down(sem); @@ -688,6 +689,7 @@ static void cmd_work_handler(struct work_struct *work) set_signature(ent, !cmd->checksum_disabled); dump_command(dev, ent, 1); ent->ts1 = ktime_get_ns(); + cmd_mode = cmd->mode; /* ring doorbell after the descriptor is valid */ mlx5_core_dbg(dev, "writing 0x%x to command doorbell\n", 1 << ent->idx); @@ -695,7 +697,7 @@ static void cmd_work_handler(struct work_struct *work) iowrite32be(1 << ent->idx, &dev->iseg->cmd_dbell); mmiowb(); /* if not in polling don't use ent after this point */ - if (cmd->mode == CMD_MODE_POLLING) { + if (cmd_mode == CMD_MODE_POLLING) { poll_timeout(ent); /* make sure we read the descriptor after ownership is SW */ rmb(); @@ -1126,7 +1128,7 @@ static ssize_t outlen_write(struct file *filp, const char __user *buf, { struct mlx5_core_dev *dev = filp->private_data; struct mlx5_cmd_debug *dbg = &dev->cmd.dbg; - char outlen_str[8]; + char outlen_str[8] = {0}; int outlen; void *ptr; int err; @@ -1141,8 +1143,6 @@ static ssize_t outlen_write(struct file *filp, const char __user *buf, if (copy_from_user(outlen_str, buf, count)) return -EFAULT; - outlen_str[7] = 0; - err = sscanf(outlen_str, "%d", &outlen); if (err < 0) return err; diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c index 174f7341c5c3..688b6da5a9bb 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_main.c +++ b/drivers/net/ethernet/qlogic/qed/qed_main.c @@ -22,6 +22,7 @@ #include <linux/etherdevice.h> #include <linux/vmalloc.h> #include <linux/qed/qed_if.h> +#include <linux/crash_dump.h> #include "qed.h" #include "qed_sp.h" @@ -634,6 +635,14 @@ static int qed_slowpath_setup_int(struct qed_dev *cdev, /* We want a minimum of one slowpath and one fastpath vector per hwfn */ cdev->int_params.in.min_msix_cnt = cdev->num_hwfns * 2; + if (is_kdump_kernel()) { + DP_INFO(cdev, + "Kdump kernel: Limit the max number of requested MSI-X vectors to %hd\n", + cdev->int_params.in.min_msix_cnt); + cdev->int_params.in.num_vectors = + cdev->int_params.in.min_msix_cnt; + } + rc = qed_set_int_mode(cdev, false); if (rc) { DP_ERR(cdev, "qed_slowpath_setup_int ERR\n"); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 5adaf537513b..7bba30f24135 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -54,7 +54,7 @@ #include <linux/reset.h> #include <linux/of_mdio.h> -#define STMMAC_ALIGN(x) L1_CACHE_ALIGN(x) +#define STMMAC_ALIGN(x) __ALIGN_KERNEL(x, SMP_CACHE_BYTES) /* Module parameters */ #define TX_TIMEO 5000 diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c index e23a642357e7..eb4d8df49399 100644 --- a/drivers/net/ethernet/sun/sungem.c +++ b/drivers/net/ethernet/sun/sungem.c @@ -60,8 +60,7 @@ #include <linux/sungem_phy.h> #include "sungem.h" -/* Stripping FCS is causing problems, disabled for now */ -#undef STRIP_FCS +#define STRIP_FCS #define DEFAULT_MSG (NETIF_MSG_DRV | \ NETIF_MSG_PROBE | \ @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp) writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW); writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN) writel(((5 & RXDMA_BLANK_IPKTS) | @@ -760,7 +759,6 @@ static int gem_rx(struct gem *gp, int work_to_do) struct net_device *dev = gp->dev; int entry, drops, work_done = 0; u32 done; - __sum16 csum; if (netif_msg_rx_status(gp)) printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n", @@ -855,9 +853,13 @@ static int gem_rx(struct gem *gp, int work_to_do) skb = copy_skb; } - csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); - skb->csum = csum_unfold(csum); - skb->ip_summed = CHECKSUM_COMPLETE; + if (likely(dev->features & NETIF_F_RXCSUM)) { + __sum16 csum; + + csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff); + skb->csum = csum_unfold(csum); + skb->ip_summed = CHECKSUM_COMPLETE; + } skb->protocol = eth_type_trans(skb, gp->dev); napi_gro_receive(&gp->napi, skb); @@ -1755,7 +1757,7 @@ static void gem_init_dma(struct gem *gp) writel(0, gp->regs + TXDMA_KICK); val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) | - ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128); + (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128); writel(val, gp->regs + RXDMA_CFG); writel(desc_dma >> 32, gp->regs + RXDMA_DBHI); @@ -2973,8 +2975,8 @@ static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) pci_set_drvdata(pdev, dev); /* We can do scatter/gather and HW checksum */ - dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM; - dev->features |= dev->hw_features | NETIF_F_RXCSUM; + dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM; + dev->features = dev->hw_features; if (pci_using_dac) dev->features |= NETIF_F_HIGHDMA; diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 8179727d3423..1f2f25a71d18 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -1265,11 +1265,8 @@ static int gen10g_resume(struct phy_device *phydev) static int __set_phy_supported(struct phy_device *phydev, u32 max_speed) { - /* The default values for phydev->supported are provided by the PHY - * driver "features" member, we want to reset to sane defaults first - * before supporting higher speeds. - */ - phydev->supported &= PHY_DEFAULT_FEATURES; + phydev->supported &= ~(PHY_1000BT_FEATURES | PHY_100BT_FEATURES | + PHY_10BT_FEATURES); switch (max_speed) { default: diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c index a6d429950cb0..acec4b565511 100644 --- a/drivers/net/usb/lan78xx.c +++ b/drivers/net/usb/lan78xx.c @@ -1361,6 +1361,8 @@ static void lan78xx_init_mac_address(struct lan78xx_net *dev) netif_dbg(dev, ifup, dev->net, "MAC address set to random addr"); } + + tasklet_schedule(&dev->bh); } ret = lan78xx_write_reg(dev, MAF_LO(0), addr_lo); diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 2991d7155540..2bb336cb13ee 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3139,7 +3139,8 @@ static int rtl8152_close(struct net_device *netdev) #ifdef CONFIG_PM_SLEEP unregister_pm_notifier(&tp->pm_notifier); #endif - napi_disable(&tp->napi); + if (!test_bit(RTL8152_UNPLUG, &tp->flags)) + napi_disable(&tp->napi); clear_bit(WORK_ENABLE, &tp->flags); usb_kill_urb(tp->intr_urb); cancel_delayed_work_sync(&tp->schedule); diff --git a/drivers/net/wireless/ath/regd.h b/drivers/net/wireless/ath/regd.h index 37f53bd8fcb1..184b6810cde9 100644 --- a/drivers/net/wireless/ath/regd.h +++ b/drivers/net/wireless/ath/regd.h @@ -68,12 +68,14 @@ enum CountryCode { CTRY_AUSTRALIA = 36, CTRY_AUSTRIA = 40, CTRY_AZERBAIJAN = 31, + CTRY_BAHAMAS = 44, CTRY_BAHRAIN = 48, CTRY_BANGLADESH = 50, CTRY_BARBADOS = 52, CTRY_BELARUS = 112, CTRY_BELGIUM = 56, CTRY_BELIZE = 84, + CTRY_BERMUDA = 60, CTRY_BOLIVIA = 68, CTRY_BOSNIA_HERZ = 70, CTRY_BRAZIL = 76, @@ -159,6 +161,7 @@ enum CountryCode { CTRY_ROMANIA = 642, CTRY_RUSSIA = 643, CTRY_SAUDI_ARABIA = 682, + CTRY_SERBIA = 688, CTRY_SERBIA_MONTENEGRO = 891, CTRY_SINGAPORE = 702, CTRY_SLOVAKIA = 703, @@ -170,11 +173,13 @@ enum CountryCode { CTRY_SWITZERLAND = 756, CTRY_SYRIA = 760, CTRY_TAIWAN = 158, + CTRY_TANZANIA = 834, CTRY_THAILAND = 764, CTRY_TRINIDAD_Y_TOBAGO = 780, CTRY_TUNISIA = 788, CTRY_TURKEY = 792, CTRY_UAE = 784, + CTRY_UGANDA = 800, CTRY_UKRAINE = 804, CTRY_UNITED_KINGDOM = 826, CTRY_UNITED_STATES = 840, diff --git a/drivers/net/wireless/ath/regd_common.h b/drivers/net/wireless/ath/regd_common.h index bdd2b4d61f2f..15bbd1e0d912 100644 --- a/drivers/net/wireless/ath/regd_common.h +++ b/drivers/net/wireless/ath/regd_common.h @@ -35,6 +35,7 @@ enum EnumRd { FRANCE_RES = 0x31, FCC3_FCCA = 0x3A, FCC3_WORLD = 0x3B, + FCC3_ETSIC = 0x3F, ETSI1_WORLD = 0x37, ETSI3_ETSIA = 0x32, @@ -44,6 +45,7 @@ enum EnumRd { ETSI4_ETSIC = 0x38, ETSI5_WORLD = 0x39, ETSI6_WORLD = 0x34, + ETSI8_WORLD = 0x3D, ETSI_RESERVED = 0x33, MKK1_MKKA = 0x40, @@ -59,6 +61,7 @@ enum EnumRd { MKK1_MKKA1 = 0x4A, MKK1_MKKA2 = 0x4B, MKK1_MKKC = 0x4C, + APL2_FCCA = 0x4D, APL3_FCCA = 0x50, APL1_WORLD = 0x52, @@ -67,6 +70,7 @@ enum EnumRd { APL1_ETSIC = 0x55, APL2_ETSIC = 0x56, APL5_WORLD = 0x58, + APL13_WORLD = 0x5A, APL6_WORLD = 0x5B, APL7_FCCA = 0x5C, APL8_WORLD = 0x5D, @@ -168,6 +172,7 @@ static struct reg_dmn_pair_mapping regDomainPairs[] = { {FCC2_ETSIC, CTL_FCC, CTL_ETSI}, {FCC3_FCCA, CTL_FCC, CTL_FCC}, {FCC3_WORLD, CTL_FCC, CTL_ETSI}, + {FCC3_ETSIC, CTL_FCC, CTL_ETSI}, {FCC4_FCCA, CTL_FCC, CTL_FCC}, {FCC5_FCCA, CTL_FCC, CTL_FCC}, {FCC6_FCCA, CTL_FCC, CTL_FCC}, @@ -179,6 +184,7 @@ static struct reg_dmn_pair_mapping regDomainPairs[] = { {ETSI4_WORLD, CTL_ETSI, CTL_ETSI}, {ETSI5_WORLD, CTL_ETSI, CTL_ETSI}, {ETSI6_WORLD, CTL_ETSI, CTL_ETSI}, + {ETSI8_WORLD, CTL_ETSI, CTL_ETSI}, /* XXX: For ETSI3_ETSIA, Was NO_CTL meant for the 2 GHz band ? */ {ETSI3_ETSIA, CTL_ETSI, CTL_ETSI}, @@ -188,9 +194,11 @@ static struct reg_dmn_pair_mapping regDomainPairs[] = { {FCC1_FCCA, CTL_FCC, CTL_FCC}, {APL1_WORLD, CTL_FCC, CTL_ETSI}, {APL2_WORLD, CTL_FCC, CTL_ETSI}, + {APL2_FCCA, CTL_FCC, CTL_FCC}, {APL3_WORLD, CTL_FCC, CTL_ETSI}, {APL4_WORLD, CTL_FCC, CTL_ETSI}, {APL5_WORLD, CTL_FCC, CTL_ETSI}, + {APL13_WORLD, CTL_ETSI, CTL_ETSI}, {APL6_WORLD, CTL_ETSI, CTL_ETSI}, {APL8_WORLD, CTL_ETSI, CTL_ETSI}, {APL9_WORLD, CTL_ETSI, CTL_ETSI}, @@ -298,6 +306,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_AUSTRALIA2, FCC6_WORLD, "AU"}, {CTRY_AUSTRIA, ETSI1_WORLD, "AT"}, {CTRY_AZERBAIJAN, ETSI4_WORLD, "AZ"}, + {CTRY_BAHAMAS, FCC3_WORLD, "BS"}, {CTRY_BAHRAIN, APL6_WORLD, "BH"}, {CTRY_BANGLADESH, NULL1_WORLD, "BD"}, {CTRY_BARBADOS, FCC2_WORLD, "BB"}, @@ -305,6 +314,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_BELGIUM, ETSI1_WORLD, "BE"}, {CTRY_BELGIUM2, ETSI4_WORLD, "BL"}, {CTRY_BELIZE, APL1_ETSIC, "BZ"}, + {CTRY_BERMUDA, FCC3_FCCA, "BM"}, {CTRY_BOLIVIA, APL1_ETSIC, "BO"}, {CTRY_BOSNIA_HERZ, ETSI1_WORLD, "BA"}, {CTRY_BRAZIL, FCC3_WORLD, "BR"}, @@ -444,6 +454,7 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_ROMANIA, NULL1_WORLD, "RO"}, {CTRY_RUSSIA, NULL1_WORLD, "RU"}, {CTRY_SAUDI_ARABIA, NULL1_WORLD, "SA"}, + {CTRY_SERBIA, ETSI1_WORLD, "RS"}, {CTRY_SERBIA_MONTENEGRO, ETSI1_WORLD, "CS"}, {CTRY_SINGAPORE, APL6_WORLD, "SG"}, {CTRY_SLOVAKIA, ETSI1_WORLD, "SK"}, @@ -455,10 +466,12 @@ static struct country_code_to_enum_rd allCountries[] = { {CTRY_SWITZERLAND, ETSI1_WORLD, "CH"}, {CTRY_SYRIA, NULL1_WORLD, "SY"}, {CTRY_TAIWAN, APL3_FCCA, "TW"}, + {CTRY_TANZANIA, APL1_WORLD, "TZ"}, {CTRY_THAILAND, FCC3_WORLD, "TH"}, {CTRY_TRINIDAD_Y_TOBAGO, FCC3_WORLD, "TT"}, {CTRY_TUNISIA, ETSI3_WORLD, "TN"}, {CTRY_TURKEY, ETSI3_WORLD, "TR"}, + {CTRY_UGANDA, FCC3_WORLD, "UG"}, {CTRY_UKRAINE, NULL1_WORLD, "UA"}, {CTRY_UAE, NULL1_WORLD, "AE"}, {CTRY_UNITED_KINGDOM, ETSI1_WORLD, "GB"}, diff --git a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c index 59cef6c69fe8..91da67657f81 100644 --- a/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c +++ b/drivers/net/wireless/brcm80211/brcmfmac/bcmsdh.c @@ -1109,6 +1109,7 @@ static const struct sdio_device_id brcmf_sdmmc_ids[] = { BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43340), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43341), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43362), + BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43364), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_4335_4339), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_43430), BRCMF_SDIO_DEVICE(SDIO_DEVICE_ID_BROADCOM_4345), diff --git a/drivers/net/wireless/iwlwifi/pcie/rx.c b/drivers/net/wireless/iwlwifi/pcie/rx.c index e06591f625c4..d6f9858ff2de 100644 --- a/drivers/net/wireless/iwlwifi/pcie/rx.c +++ b/drivers/net/wireless/iwlwifi/pcie/rx.c @@ -713,6 +713,8 @@ int iwl_pcie_rx_init(struct iwl_trans *trans) WQ_HIGHPRI | WQ_UNBOUND, 1); INIT_WORK(&rba->rx_alloc, iwl_pcie_rx_allocator_work); + cancel_work_sync(&rba->rx_alloc); + spin_lock(&rba->lock); atomic_set(&rba->req_pending, 0); atomic_set(&rba->req_ready, 0); diff --git a/drivers/net/wireless/mwifiex/usb.c b/drivers/net/wireless/mwifiex/usb.c index e43aff932360..1a1b1de87583 100644 --- a/drivers/net/wireless/mwifiex/usb.c +++ b/drivers/net/wireless/mwifiex/usb.c @@ -624,6 +624,9 @@ static void mwifiex_usb_disconnect(struct usb_interface *intf) MWIFIEX_FUNC_SHUTDOWN); } + if (adapter->workqueue) + flush_workqueue(adapter->workqueue); + mwifiex_usb_free(card); mwifiex_dbg(adapter, FATAL, diff --git a/drivers/net/wireless/mwifiex/util.c b/drivers/net/wireless/mwifiex/util.c index 0cec8a64473e..eb5ffa5b1c6c 100644 --- a/drivers/net/wireless/mwifiex/util.c +++ b/drivers/net/wireless/mwifiex/util.c @@ -702,12 +702,14 @@ void mwifiex_hist_data_set(struct mwifiex_private *priv, u8 rx_rate, s8 snr, s8 nflr) { struct mwifiex_histogram_data *phist_data = priv->hist_data; + s8 nf = -nflr; + s8 rssi = snr - nflr; atomic_inc(&phist_data->num_samples); atomic_inc(&phist_data->rx_rate[rx_rate]); - atomic_inc(&phist_data->snr[snr]); - atomic_inc(&phist_data->noise_flr[128 + nflr]); - atomic_inc(&phist_data->sig_str[nflr - snr]); + atomic_inc(&phist_data->snr[snr + 128]); + atomic_inc(&phist_data->noise_flr[nf + 128]); + atomic_inc(&phist_data->sig_str[rssi + 128]); } /* function to reset histogram data during init/reset */ diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c b/drivers/net/wireless/realtek/rtlwifi/core.c index 8b537a5a4b01..8006f0972ad1 100644 --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -135,7 +135,6 @@ found_alt: firmware->size); rtlpriv->rtlhal.wowlan_fwsize = firmware->size; } - rtlpriv->rtlhal.fwsize = firmware->size; release_firmware(firmware); } diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c b/drivers/net/wireless/rsi/rsi_91x_sdio.c index 8428858204a6..fc895b466ebb 100644 --- a/drivers/net/wireless/rsi/rsi_91x_sdio.c +++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c @@ -155,7 +155,6 @@ static void rsi_reset_card(struct sdio_func *pfunction) int err; struct mmc_card *card = pfunction->card; struct mmc_host *host = card->host; - s32 bit = (fls(host->ocr_avail) - 1); u8 cmd52_resp; u32 clock, resp, i; u16 rca; @@ -175,7 +174,6 @@ static void rsi_reset_card(struct sdio_func *pfunction) msleep(20); /* Initialize the SDIO card */ - host->ios.vdd = bit; host->ios.chip_select = MMC_CS_DONTCARE; host->ios.bus_mode = MMC_BUSMODE_OPENDRAIN; host->ios.power_mode = MMC_POWER_UP; diff --git a/drivers/net/wireless/ti/wlcore/sdio.c b/drivers/net/wireless/ti/wlcore/sdio.c index c172da56b550..e4a8280cea83 100644 --- a/drivers/net/wireless/ti/wlcore/sdio.c +++ b/drivers/net/wireless/ti/wlcore/sdio.c @@ -388,6 +388,11 @@ static int wl1271_suspend(struct device *dev) mmc_pm_flag_t sdio_flags; int ret = 0; + if (!wl) { + dev_err(dev, "no wilink module was probed\n"); + goto out; + } + dev_dbg(dev, "wl1271 suspend. wow_enabled: %d\n", wl->wow_enabled); diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c index a0de2453fa09..bec9f099573b 100644 --- a/drivers/net/xen-netfront.c +++ b/drivers/net/xen-netfront.c @@ -86,6 +86,7 @@ struct netfront_cb { /* IRQ name is queue name with "-tx" or "-rx" appended */ #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3) +static DECLARE_WAIT_QUEUE_HEAD(module_load_q); static DECLARE_WAIT_QUEUE_HEAD(module_unload_q); struct netfront_stats { @@ -238,7 +239,7 @@ static void rx_refill_timeout(unsigned long data) static int netfront_tx_slot_available(struct netfront_queue *queue) { return (queue->tx.req_prod_pvt - queue->tx.rsp_cons) < - (NET_TX_RING_SIZE - MAX_SKB_FRAGS - 2); + (NET_TX_RING_SIZE - XEN_NETIF_NR_SLOTS_MIN - 1); } static void xennet_maybe_wake_tx(struct netfront_queue *queue) @@ -775,7 +776,7 @@ static int xennet_get_responses(struct netfront_queue *queue, RING_IDX cons = queue->rx.rsp_cons; struct sk_buff *skb = xennet_get_rx_skb(queue, cons); grant_ref_t ref = xennet_get_rx_ref(queue, cons); - int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD); + int max = XEN_NETIF_NR_SLOTS_MIN + (rx->status <= RX_COPY_THRESHOLD); int slots = 1; int err = 0; unsigned long ret; @@ -1335,6 +1336,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev) netif_carrier_off(netdev); xenbus_switch_state(dev, XenbusStateInitialising); + wait_event(module_load_q, + xenbus_read_driver_state(dev->otherend) != + XenbusStateClosed && + xenbus_read_driver_state(dev->otherend) != + XenbusStateUnknown); return netdev; exit: diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index ec91cd17bf34..5fb4ed6ea322 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -180,13 +180,16 @@ static ssize_t enable_store(struct device *dev, struct device_attribute *attr, if (!capable(CAP_SYS_ADMIN)) return -EPERM; - if (!val) { - if (pci_is_enabled(pdev)) - pci_disable_device(pdev); - else - result = -EIO; - } else + device_lock(dev); + if (dev->driver) + result = -EBUSY; + else if (val) result = pci_enable_device(pdev); + else if (pci_is_enabled(pdev)) + pci_disable_device(pdev); + else + result = -EIO; + device_unlock(dev); return result < 0 ? result : count; } diff --git a/drivers/pinctrl/pinctrl-at91-pio4.c b/drivers/pinctrl/pinctrl-at91-pio4.c index 271cca63e9bd..9aa82a4e9e25 100644 --- a/drivers/pinctrl/pinctrl-at91-pio4.c +++ b/drivers/pinctrl/pinctrl-at91-pio4.c @@ -568,8 +568,10 @@ static int atmel_pctl_dt_node_to_map(struct pinctrl_dev *pctldev, for_each_child_of_node(np_config, np) { ret = atmel_pctl_dt_subnode_to_map(pctldev, np, map, &reserved_maps, num_maps); - if (ret < 0) + if (ret < 0) { + of_node_put(np); break; + } } } diff --git a/drivers/platform/msm/ipa/ipa_v2/ipa_debugfs.c b/drivers/platform/msm/ipa/ipa_v2/ipa_debugfs.c index e17526e46323..9c3c5b8be16e 100644 --- a/drivers/platform/msm/ipa/ipa_v2/ipa_debugfs.c +++ b/drivers/platform/msm/ipa/ipa_v2/ipa_debugfs.c @@ -89,6 +89,7 @@ const char *ipa_event_name[] = { __stringify(DEL_L2TP_VLAN_MAPPING), __stringify(IPA_PER_CLIENT_STATS_CONNECT_EVENT), __stringify(IPA_PER_CLIENT_STATS_DISCONNECT_EVENT), + __stringify(WLAN_FWR_SSR_BEFORE_SHUTDOWN), }; const char *ipa_hdr_l2_type_name[] = { diff --git a/drivers/platform/msm/ipa/ipa_v2/ipa_flt.c b/drivers/platform/msm/ipa/ipa_v2/ipa_flt.c index 6392c379b026..c29cbdf95057 100644 --- a/drivers/platform/msm/ipa/ipa_v2/ipa_flt.c +++ b/drivers/platform/msm/ipa/ipa_v2/ipa_flt.c @@ -1495,8 +1495,16 @@ int ipa2_reset_flt(enum ipa_ip_type ip, bool user_only) } } } - mutex_unlock(&ipa_ctx->lock); + /* commit the change to IPA-HW */ + if (ipa_ctx->ctrl->ipa_commit_flt(IPA_IP_v4) || + ipa_ctx->ctrl->ipa_commit_flt(IPA_IP_v6)) { + IPAERR_RL("fail to commit flt-rule\n"); + WARN_ON_RATELIMIT_IPA(1); + mutex_unlock(&ipa_ctx->lock); + return -EPERM; + } + mutex_unlock(&ipa_ctx->lock); return 0; } diff --git a/drivers/platform/msm/ipa/ipa_v2/ipa_hdr.c b/drivers/platform/msm/ipa/ipa_v2/ipa_hdr.c index e0868ad83bf4..83b05473e473 100644 --- a/drivers/platform/msm/ipa/ipa_v2/ipa_hdr.c +++ b/drivers/platform/msm/ipa/ipa_v2/ipa_hdr.c @@ -1249,8 +1249,9 @@ int ipa2_reset_hdr(bool user_only) struct ipa_hdr_offset_entry *off_next; struct ipa_hdr_proc_ctx_offset_entry *ctx_off_entry; struct ipa_hdr_proc_ctx_offset_entry *ctx_off_next; - int i, end = 0; - bool user_rule = false; + struct ipa_hdr_tbl *htbl = &ipa_ctx->hdr_tbl; + struct ipa_hdr_proc_ctx_tbl *htbl_proc = &ipa_ctx->hdr_proc_ctx_tbl; + int i; /* * issue a reset on the routing module since routing rules point to @@ -1288,9 +1289,6 @@ int ipa2_reset_hdr(bool user_only) return -EFAULT; } - if (entry->ipacm_installed) - user_rule = true; - if (!user_only || entry->ipacm_installed) { if (entry->is_hdr_proc_ctx) { dma_unmap_single(ipa_ctx->pdev, @@ -1298,9 +1296,15 @@ int ipa2_reset_hdr(bool user_only) entry->hdr_len, DMA_TO_DEVICE); entry->proc_ctx = NULL; + } else { + /* move the offset entry to free list */ + entry->offset_entry->ipacm_installed = 0; + list_move(&entry->offset_entry->link, + &htbl->head_free_offset_list[ + entry->offset_entry->bin]); } list_del(&entry->link); - ipa_ctx->hdr_tbl.hdr_cnt--; + htbl->hdr_cnt--; entry->ref_cnt = 0; entry->cookie = 0; @@ -1309,53 +1313,37 @@ int ipa2_reset_hdr(bool user_only) kmem_cache_free(ipa_ctx->hdr_cache, entry); } } - for (i = 0; i < IPA_HDR_BIN_MAX; i++) { - list_for_each_entry_safe(off_entry, off_next, - &ipa_ctx->hdr_tbl.head_offset_list[i], - link) { - /* - * do not remove the default exception header which is - * at offset 0 - */ - if (off_entry->offset == 0) - continue; - - if (!user_only || - off_entry->ipacm_installed) { + /* only clean up offset_list and free_offset_list on global reset */ + if (!user_only) { + for (i = 0; i < IPA_HDR_BIN_MAX; i++) { + list_for_each_entry_safe(off_entry, off_next, + &ipa_ctx->hdr_tbl.head_offset_list[i], + link) { + /** + * do not remove the default exception + * header which is at offset 0 + */ + if (off_entry->offset == 0) + continue; list_del(&off_entry->link); kmem_cache_free(ipa_ctx->hdr_offset_cache, off_entry); - } else { - if (off_entry->offset + - ipa_hdr_bin_sz[off_entry->bin] > end) { - end = off_entry->offset + - ipa_hdr_bin_sz[off_entry->bin]; - IPADBG("replace end = %d\n", end); - } } - } - list_for_each_entry_safe(off_entry, off_next, + list_for_each_entry_safe(off_entry, off_next, &ipa_ctx->hdr_tbl.head_free_offset_list[i], link) { - - if (!user_only || - off_entry->ipacm_installed) { list_del(&off_entry->link); kmem_cache_free(ipa_ctx->hdr_offset_cache, off_entry); } } + /* there is one header of size 8 */ + ipa_ctx->hdr_tbl.end = 8; + ipa_ctx->hdr_tbl.hdr_cnt = 1; } - IPADBG("hdr_tbl.end = %d\n", end); - if (user_rule) { - ipa_ctx->hdr_tbl.end = end; - IPADBG("hdr_tbl.end = %d\n", end); - } IPADBG("reset hdr proc ctx\n"); - user_rule = false; - end = 0; list_for_each_entry_safe( ctx_entry, ctx_next, @@ -1364,17 +1352,18 @@ int ipa2_reset_hdr(bool user_only) if (ipa_id_find(ctx_entry->id) == NULL) { mutex_unlock(&ipa_ctx->lock); - WARN_ON(1); + WARN_ON_RATELIMIT_IPA(1); return -EFAULT; } - if (entry->ipacm_installed) - user_rule = true; - if (!user_only || ctx_entry->ipacm_installed) { + /* move the offset entry to appropriate free list */ + list_move(&ctx_entry->offset_entry->link, + &htbl_proc->head_free_offset_list[ + ctx_entry->offset_entry->bin]); list_del(&ctx_entry->link); - ipa_ctx->hdr_proc_ctx_tbl.proc_ctx_cnt--; + htbl_proc->proc_ctx_cnt--; ctx_entry->ref_cnt = 0; ctx_entry->cookie = 0; @@ -1384,48 +1373,39 @@ int ipa2_reset_hdr(bool user_only) ctx_entry); } } - for (i = 0; i < IPA_HDR_PROC_CTX_BIN_MAX; i++) { - list_for_each_entry_safe(ctx_off_entry, ctx_off_next, + /* only clean up offset_list and free_offset_list on global reset */ + if (!user_only) { + for (i = 0; i < IPA_HDR_PROC_CTX_BIN_MAX; i++) { + list_for_each_entry_safe(ctx_off_entry, ctx_off_next, &ipa_ctx->hdr_proc_ctx_tbl.head_offset_list[i], link) { - - if (!user_only || - ctx_off_entry->ipacm_installed) { list_del(&ctx_off_entry->link); kmem_cache_free( ipa_ctx->hdr_proc_ctx_offset_cache, ctx_off_entry); - } else { - if (ctx_off_entry->offset + - ipa_hdr_bin_sz[ctx_off_entry->bin] - > end) { - end = ctx_off_entry->offset + - ipa_hdr_bin_sz[ctx_off_entry->bin]; - IPADBG("replace hdr_proc as %d\n", end); - } } - } - list_for_each_entry_safe(ctx_off_entry, ctx_off_next, - &ipa_ctx->hdr_proc_ctx_tbl.head_free_offset_list[i], - link) { - - if (!user_only || - ctx_off_entry->ipacm_installed) { + list_for_each_entry_safe(ctx_off_entry, ctx_off_next, + &ipa_ctx->hdr_proc_ctx_tbl. + head_free_offset_list[i], link) { list_del(&ctx_off_entry->link); kmem_cache_free( ipa_ctx->hdr_proc_ctx_offset_cache, ctx_off_entry); } } + ipa_ctx->hdr_proc_ctx_tbl.end = 0; + ipa_ctx->hdr_proc_ctx_tbl.proc_ctx_cnt = 0; } - IPADBG("hdr_proc_tbl.end = %d\n", end); - if (user_rule) { - ipa_ctx->hdr_proc_ctx_tbl.end = end; - IPADBG("hdr_proc_tbl.end = %d\n", end); + /* commit the change to IPA-HW */ + if (ipa_ctx->ctrl->ipa_commit_hdr()) { + IPAERR_RL("fail to commit hdr\n"); + WARN_ON_RATELIMIT_IPA(1); + mutex_unlock(&ipa_ctx->lock); + return -EFAULT; } - mutex_unlock(&ipa_ctx->lock); + mutex_unlock(&ipa_ctx->lock); return 0; } diff --git a/drivers/platform/msm/ipa/ipa_v2/ipa_rt.c b/drivers/platform/msm/ipa/ipa_v2/ipa_rt.c index e4a3a72ee670..5bab6d0c626a 100644 --- a/drivers/platform/msm/ipa/ipa_v2/ipa_rt.c +++ b/drivers/platform/msm/ipa/ipa_v2/ipa_rt.c @@ -1443,6 +1443,15 @@ int ipa2_reset_rt(enum ipa_ip_type ip, bool user_only) } } } + + /* commit the change to IPA-HW */ + if (ipa_ctx->ctrl->ipa_commit_rt(IPA_IP_v4) || + ipa_ctx->ctrl->ipa_commit_rt(IPA_IP_v6)) { + IPAERR("fail to commit rt-rule\n"); + WARN_ON_RATELIMIT_IPA(1); + mutex_unlock(&ipa_ctx->lock); + return -EPERM; + } mutex_unlock(&ipa_ctx->lock); return 0; diff --git a/drivers/platform/msm/ipa/ipa_v3/ipa_debugfs.c b/drivers/platform/msm/ipa/ipa_v3/ipa_debugfs.c index 0dd5b8165ac1..d9b698c98d7c 100644 --- a/drivers/platform/msm/ipa/ipa_v3/ipa_debugfs.c +++ b/drivers/platform/msm/ipa/ipa_v3/ipa_debugfs.c @@ -70,6 +70,7 @@ const char *ipa3_event_name[] = { __stringify(DEL_L2TP_VLAN_MAPPING), __stringify(IPA_PER_CLIENT_STATS_CONNECT_EVENT), __stringify(IPA_PER_CLIENT_STATS_DISCONNECT_EVENT), + __stringify(WLAN_FWR_SSR_BEFORE_SHUTDOWN), }; const char *ipa3_hdr_l2_type_name[] = { diff --git a/drivers/platform/msm/ipa/ipa_v3/ipa_flt.c b/drivers/platform/msm/ipa/ipa_v3/ipa_flt.c index f7d314ad7335..9afdfdb9292a 100644 --- a/drivers/platform/msm/ipa/ipa_v3/ipa_flt.c +++ b/drivers/platform/msm/ipa/ipa_v3/ipa_flt.c @@ -1420,8 +1420,16 @@ int ipa3_reset_flt(enum ipa_ip_type ip, bool user_only) } } } - mutex_unlock(&ipa3_ctx->lock); + /* commit the change to IPA-HW */ + if (ipa3_ctx->ctrl->ipa3_commit_flt(IPA_IP_v4) || + ipa3_ctx->ctrl->ipa3_commit_flt(IPA_IP_v6)) { + IPAERR("fail to commit flt-rule\n"); + WARN_ON_RATELIMIT_IPA(1); + mutex_unlock(&ipa3_ctx->lock); + return -EPERM; + } + mutex_unlock(&ipa3_ctx->lock); return 0; } diff --git a/drivers/platform/msm/ipa/ipa_v3/ipa_hdr.c b/drivers/platform/msm/ipa/ipa_v3/ipa_hdr.c index 63e3f8578219..7909282c1cf1 100644 --- a/drivers/platform/msm/ipa/ipa_v3/ipa_hdr.c +++ b/drivers/platform/msm/ipa/ipa_v3/ipa_hdr.c @@ -993,8 +993,9 @@ int ipa3_reset_hdr(bool user_only) struct ipa_hdr_offset_entry *off_next; struct ipa3_hdr_proc_ctx_offset_entry *ctx_off_entry; struct ipa3_hdr_proc_ctx_offset_entry *ctx_off_next; - int i, end = 0; - bool user_rule = false; + struct ipa3_hdr_tbl *htbl = &ipa3_ctx->hdr_tbl; + struct ipa3_hdr_proc_ctx_tbl *htbl_proc = &ipa3_ctx->hdr_proc_ctx_tbl; + int i; /* * issue a reset on the routing module since routing rules point to @@ -1032,9 +1033,6 @@ int ipa3_reset_hdr(bool user_only) return -EFAULT; } - if (entry->ipacm_installed) - user_rule = true; - if (!user_only || entry->ipacm_installed) { if (entry->is_hdr_proc_ctx) { dma_unmap_single(ipa3_ctx->pdev, @@ -1042,9 +1040,15 @@ int ipa3_reset_hdr(bool user_only) entry->hdr_len, DMA_TO_DEVICE); entry->proc_ctx = NULL; + } else { + /* move the offset entry to free list */ + entry->offset_entry->ipacm_installed = 0; + list_move(&entry->offset_entry->link, + &htbl->head_free_offset_list[ + entry->offset_entry->bin]); } list_del(&entry->link); - ipa3_ctx->hdr_tbl.hdr_cnt--; + htbl->hdr_cnt--; entry->ref_cnt = 0; entry->cookie = 0; @@ -1053,53 +1057,37 @@ int ipa3_reset_hdr(bool user_only) kmem_cache_free(ipa3_ctx->hdr_cache, entry); } } - for (i = 0; i < IPA_HDR_BIN_MAX; i++) { - list_for_each_entry_safe(off_entry, off_next, + + /* only clean up offset_list and free_offset_list on global reset */ + if (!user_only) { + for (i = 0; i < IPA_HDR_BIN_MAX; i++) { + list_for_each_entry_safe(off_entry, off_next, &ipa3_ctx->hdr_tbl.head_offset_list[i], link) { - - /* - * do not remove the default exception header which is - * at offset 0 - */ - if (off_entry->offset == 0) - continue; - - if (!user_only || - off_entry->ipacm_installed) { + /** + * do not remove the default exception + * header which is at offset 0 + */ + if (off_entry->offset == 0) + continue; list_del(&off_entry->link); kmem_cache_free(ipa3_ctx->hdr_offset_cache, off_entry); - } else { - if (off_entry->offset + - ipa_hdr_bin_sz[off_entry->bin] > end) { - end = off_entry->offset + - ipa_hdr_bin_sz[off_entry->bin]; - IPADBG("replace end = %d\n", end); - } } - } - list_for_each_entry_safe(off_entry, off_next, + list_for_each_entry_safe(off_entry, off_next, &ipa3_ctx->hdr_tbl.head_free_offset_list[i], link) { - - if (!user_only || - off_entry->ipacm_installed) { list_del(&off_entry->link); kmem_cache_free(ipa3_ctx->hdr_offset_cache, off_entry); } } + /* there is one header of size 8 */ + ipa3_ctx->hdr_tbl.end = 8; + ipa3_ctx->hdr_tbl.hdr_cnt = 1; } - IPADBG("hdr_tbl.end = %d\n", end); - if (user_rule) { - ipa3_ctx->hdr_tbl.end = end; - IPADBG("hdr_tbl.end = %d\n", end); - } IPADBG("reset hdr proc ctx\n"); - user_rule = false; - end = 0; list_for_each_entry_safe( ctx_entry, ctx_next, @@ -1112,13 +1100,14 @@ int ipa3_reset_hdr(bool user_only) return -EFAULT; } - if (entry->ipacm_installed) - user_rule = true; - if (!user_only || ctx_entry->ipacm_installed) { + /* move the offset entry to appropriate free list */ + list_move(&ctx_entry->offset_entry->link, + &htbl_proc->head_free_offset_list[ + ctx_entry->offset_entry->bin]); list_del(&ctx_entry->link); - ipa3_ctx->hdr_proc_ctx_tbl.proc_ctx_cnt--; + htbl_proc->proc_ctx_cnt--; ctx_entry->ref_cnt = 0; ctx_entry->cookie = 0; @@ -1128,48 +1117,39 @@ int ipa3_reset_hdr(bool user_only) ctx_entry); } } - for (i = 0; i < IPA_HDR_PROC_CTX_BIN_MAX; i++) { - list_for_each_entry_safe(ctx_off_entry, ctx_off_next, + /* only clean up offset_list and free_offset_list on global reset */ + if (!user_only) { + for (i = 0; i < IPA_HDR_PROC_CTX_BIN_MAX; i++) { + list_for_each_entry_safe(ctx_off_entry, ctx_off_next, &ipa3_ctx->hdr_proc_ctx_tbl.head_offset_list[i], link) { - - if (!user_only || - ctx_off_entry->ipacm_installed) { list_del(&ctx_off_entry->link); kmem_cache_free( ipa3_ctx->hdr_proc_ctx_offset_cache, ctx_off_entry); - } else { - if (ctx_off_entry->offset + - ipa_hdr_bin_sz[ctx_off_entry->bin] - > end) { - end = ctx_off_entry->offset + - ipa_hdr_bin_sz[ctx_off_entry->bin]; - IPADBG("replace hdr_proc as %d\n", end); - } } - } - list_for_each_entry_safe(ctx_off_entry, ctx_off_next, - &ipa3_ctx->hdr_proc_ctx_tbl.head_free_offset_list[i], - link) { - - if (!user_only || - ctx_off_entry->ipacm_installed) { + list_for_each_entry_safe(ctx_off_entry, ctx_off_next, + &ipa3_ctx->hdr_proc_ctx_tbl. + head_free_offset_list[i], link) { list_del(&ctx_off_entry->link); kmem_cache_free( ipa3_ctx->hdr_proc_ctx_offset_cache, ctx_off_entry); } } + ipa3_ctx->hdr_proc_ctx_tbl.end = 0; + ipa3_ctx->hdr_proc_ctx_tbl.proc_ctx_cnt = 0; } - IPADBG("hdr_proc_tbl.end = %d\n", end); - if (user_rule) { - ipa3_ctx->hdr_proc_ctx_tbl.end = end; - IPADBG("hdr_proc_tbl.end = %d\n", end); + /* commit the change to IPA-HW */ + if (ipa3_ctx->ctrl->ipa3_commit_hdr()) { + IPAERR("fail to commit hdr\n"); + WARN_ON_RATELIMIT_IPA(1); + mutex_unlock(&ipa3_ctx->lock); + return -EFAULT; } - mutex_unlock(&ipa3_ctx->lock); + mutex_unlock(&ipa3_ctx->lock); return 0; } diff --git a/drivers/platform/msm/ipa/ipa_v3/ipa_rt.c b/drivers/platform/msm/ipa/ipa_v3/ipa_rt.c index 203fd142b213..0e75d10bae54 100644 --- a/drivers/platform/msm/ipa/ipa_v3/ipa_rt.c +++ b/drivers/platform/msm/ipa/ipa_v3/ipa_rt.c @@ -1576,6 +1576,15 @@ int ipa3_reset_rt(enum ipa_ip_type ip, bool user_only) } } } + + /* commit the change to IPA-HW */ + if (ipa3_ctx->ctrl->ipa3_commit_rt(IPA_IP_v4) || + ipa3_ctx->ctrl->ipa3_commit_rt(IPA_IP_v6)) { + IPAERR("fail to commit rt-rule\n"); + WARN_ON_RATELIMIT_IPA(1); + mutex_unlock(&ipa3_ctx->lock); + return -EPERM; + } mutex_unlock(&ipa3_ctx->lock); return 0; diff --git a/drivers/power/supply/qcom/qpnp-smb2.c b/drivers/power/supply/qcom/qpnp-smb2.c index 8e57bf9d2c31..a7c206c42418 100644 --- a/drivers/power/supply/qcom/qpnp-smb2.c +++ b/drivers/power/supply/qcom/qpnp-smb2.c @@ -945,6 +945,8 @@ static enum power_supply_property smb2_batt_props[] = { POWER_SUPPLY_PROP_DP_DM, POWER_SUPPLY_PROP_CHARGE_COUNTER, POWER_SUPPLY_PROP_FCC_STEPPER_ENABLE, + POWER_SUPPLY_PROP_CHARGE_FULL, + POWER_SUPPLY_PROP_CYCLE_COUNT, }; static int smb2_batt_get_prop(struct power_supply *psy, @@ -999,9 +1001,6 @@ static int smb2_batt_get_prop(struct power_supply *psy, case POWER_SUPPLY_PROP_SW_JEITA_ENABLED: val->intval = chg->sw_jeita_enabled; break; - case POWER_SUPPLY_PROP_VOLTAGE_NOW: - rc = smblib_get_prop_batt_voltage_now(chg, val); - break; case POWER_SUPPLY_PROP_VOLTAGE_MAX: val->intval = get_client_vote(chg->fv_votable, BATT_PROFILE_VOTER); @@ -1013,9 +1012,6 @@ static int smb2_batt_get_prop(struct power_supply *psy, val->intval = get_client_vote_locked(chg->fv_votable, QNOVO_VOTER); break; - case POWER_SUPPLY_PROP_CURRENT_NOW: - rc = smblib_get_prop_batt_current_now(chg, val); - break; case POWER_SUPPLY_PROP_CURRENT_QNOVO: val->intval = get_client_vote_locked(chg->fcc_votable, QNOVO_VOTER); @@ -1024,9 +1020,6 @@ static int smb2_batt_get_prop(struct power_supply *psy, val->intval = get_client_vote(chg->fcc_votable, BATT_PROFILE_VOTER); break; - case POWER_SUPPLY_PROP_TEMP: - rc = smblib_get_prop_batt_temp(chg, val); - break; case POWER_SUPPLY_PROP_TECHNOLOGY: val->intval = POWER_SUPPLY_TECHNOLOGY_LION; break; @@ -1051,7 +1044,12 @@ static int smb2_batt_get_prop(struct power_supply *psy, val->intval = 0; break; case POWER_SUPPLY_PROP_CHARGE_COUNTER: - rc = smblib_get_prop_batt_charge_counter(chg, val); + case POWER_SUPPLY_PROP_CHARGE_FULL: + case POWER_SUPPLY_PROP_CYCLE_COUNT: + case POWER_SUPPLY_PROP_VOLTAGE_NOW: + case POWER_SUPPLY_PROP_CURRENT_NOW: + case POWER_SUPPLY_PROP_TEMP: + rc = smblib_get_prop_from_bms(chg, psp, val); break; case POWER_SUPPLY_PROP_FCC_STEPPER_ENABLE: val->intval = chg->fcc_stepper_mode; diff --git a/drivers/power/supply/qcom/smb-lib.c b/drivers/power/supply/qcom/smb-lib.c index e96523a4d43e..81623c65ea8e 100644 --- a/drivers/power/supply/qcom/smb-lib.c +++ b/drivers/power/supply/qcom/smb-lib.c @@ -1717,7 +1717,8 @@ int smblib_get_prop_batt_health(struct smb_charger *chg, stat); if (stat & CHARGER_ERROR_STATUS_BAT_OV_BIT) { - rc = smblib_get_prop_batt_voltage_now(chg, &pval); + rc = smblib_get_prop_from_bms(chg, + POWER_SUPPLY_PROP_VOLTAGE_NOW, &pval); if (!rc) { /* * If Vbatt is within 40mV above Vfloat, then don't @@ -1775,45 +1776,6 @@ int smblib_get_prop_input_current_limited(struct smb_charger *chg, return 0; } -int smblib_get_prop_batt_voltage_now(struct smb_charger *chg, - union power_supply_propval *val) -{ - int rc; - - if (!chg->bms_psy) - return -EINVAL; - - rc = power_supply_get_property(chg->bms_psy, - POWER_SUPPLY_PROP_VOLTAGE_NOW, val); - return rc; -} - -int smblib_get_prop_batt_current_now(struct smb_charger *chg, - union power_supply_propval *val) -{ - int rc; - - if (!chg->bms_psy) - return -EINVAL; - - rc = power_supply_get_property(chg->bms_psy, - POWER_SUPPLY_PROP_CURRENT_NOW, val); - return rc; -} - -int smblib_get_prop_batt_temp(struct smb_charger *chg, - union power_supply_propval *val) -{ - int rc; - - if (!chg->bms_psy) - return -EINVAL; - - rc = power_supply_get_property(chg->bms_psy, - POWER_SUPPLY_PROP_TEMP, val); - return rc; -} - int smblib_get_prop_batt_charge_done(struct smb_charger *chg, union power_supply_propval *val) { @@ -1849,16 +1811,17 @@ int smblib_get_prop_charge_qnovo_enable(struct smb_charger *chg, return 0; } -int smblib_get_prop_batt_charge_counter(struct smb_charger *chg, - union power_supply_propval *val) +int smblib_get_prop_from_bms(struct smb_charger *chg, + enum power_supply_property psp, + union power_supply_propval *val) { int rc; if (!chg->bms_psy) return -EINVAL; - rc = power_supply_get_property(chg->bms_psy, - POWER_SUPPLY_PROP_CHARGE_COUNTER, val); + rc = power_supply_get_property(chg->bms_psy, psp, val); + return rc; } diff --git a/drivers/power/supply/qcom/smb-lib.h b/drivers/power/supply/qcom/smb-lib.h index 4475ccc21a2a..5ca5e923e651 100644 --- a/drivers/power/supply/qcom/smb-lib.h +++ b/drivers/power/supply/qcom/smb-lib.h @@ -434,14 +434,6 @@ int smblib_get_prop_system_temp_level(struct smb_charger *chg, union power_supply_propval *val); int smblib_get_prop_input_current_limited(struct smb_charger *chg, union power_supply_propval *val); -int smblib_get_prop_batt_voltage_now(struct smb_charger *chg, - union power_supply_propval *val); -int smblib_get_prop_batt_current_now(struct smb_charger *chg, - union power_supply_propval *val); -int smblib_get_prop_batt_temp(struct smb_charger *chg, - union power_supply_propval *val); -int smblib_get_prop_batt_charge_counter(struct smb_charger *chg, - union power_supply_propval *val); int smblib_set_prop_input_suspend(struct smb_charger *chg, const union power_supply_propval *val); int smblib_set_prop_batt_capacity(struct smb_charger *chg, @@ -529,6 +521,9 @@ int smblib_get_icl_current(struct smb_charger *chg, int *icl_ua); int smblib_get_charge_current(struct smb_charger *chg, int *total_current_ua); int smblib_get_prop_pr_swap_in_progress(struct smb_charger *chg, union power_supply_propval *val); +int smblib_get_prop_from_bms(struct smb_charger *chg, + enum power_supply_property psp, + union power_supply_propval *val); int smblib_set_prop_pr_swap_in_progress(struct smb_charger *chg, const union power_supply_propval *val); void smblib_usb_typec_change(struct smb_charger *chg); diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c index da7bae991552..d877ff124365 100644 --- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -88,6 +88,7 @@ int ptp_set_pinfunc(struct ptp_clock *ptp, unsigned int pin, case PTP_PF_PHYSYNC: if (chan != 0) return -EINVAL; + break; default: return -EINVAL; } diff --git a/drivers/regulator/pfuze100-regulator.c b/drivers/regulator/pfuze100-regulator.c index 2a44e5dd9c2a..c68556bf6f39 100644 --- a/drivers/regulator/pfuze100-regulator.c +++ b/drivers/regulator/pfuze100-regulator.c @@ -152,6 +152,7 @@ static struct regulator_ops pfuze100_sw_regulator_ops = { static struct regulator_ops pfuze100_swb_regulator_ops = { .enable = regulator_enable_regmap, .disable = regulator_disable_regmap, + .is_enabled = regulator_is_enabled_regmap, .list_voltage = regulator_list_voltage_table, .map_voltage = regulator_map_voltage_ascend, .set_voltage_sel = regulator_set_voltage_sel_regmap, diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index 9473715725df..9cad172618eb 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -349,6 +349,11 @@ int rtc_set_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm) { int err; + if (!rtc->ops) + return -ENODEV; + else if (!rtc->ops->set_alarm) + return -EINVAL; + err = rtc_valid_tm(&alarm->time); if (err != 0) return err; diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c index a56a7b243e91..5466246c69b4 100644 --- a/drivers/scsi/3w-9xxx.c +++ b/drivers/scsi/3w-9xxx.c @@ -889,6 +889,11 @@ static int twa_chrdev_open(struct inode *inode, struct file *file) unsigned int minor_number; int retval = TW_IOCTL_ERROR_OS_ENODEV; + if (!capable(CAP_SYS_ADMIN)) { + retval = -EACCES; + goto out; + } + minor_number = iminor(inode); if (minor_number >= twa_device_extension_count) goto out; diff --git a/drivers/scsi/3w-xxxx.c b/drivers/scsi/3w-xxxx.c index 2940bd769936..14af38036287 100644 --- a/drivers/scsi/3w-xxxx.c +++ b/drivers/scsi/3w-xxxx.c @@ -1034,6 +1034,9 @@ static int tw_chrdev_open(struct inode *inode, struct file *file) dprintk(KERN_WARNING "3w-xxxx: tw_ioctl_open()\n"); + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + minor_number = iminor(inode); if (minor_number >= tw_device_extension_count) return -ENODEV; diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 9d05302a3bcd..19bffe0b2cc0 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -4197,6 +4197,9 @@ megaraid_probe_one(struct pci_dev *pdev, const struct pci_device_id *id) int irq, i, j; int error = -ENODEV; + if (hba_count >= MAX_CONTROLLERS) + goto out; + if (pci_enable_device(pdev)) goto out; pci_set_master(pdev); diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 96007633ad39..213944ed64d9 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -1886,6 +1886,9 @@ megasas_build_syspd_fusion(struct megasas_instance *instance, pRAID_Context->timeoutValue = cpu_to_le16(os_timeout_value); pRAID_Context->VirtualDiskTgtId = cpu_to_le16(device_id); } else { + if (os_timeout_value) + os_timeout_value++; + /* system pd Fast Path */ io_request->Function = MPI2_FUNCTION_SCSI_IO_REQUEST; timeout_limit = (scmd->device->type == TYPE_DISK) ? diff --git a/drivers/scsi/scsi_dh.c b/drivers/scsi/scsi_dh.c index 5711d58f9e81..a8ebaeace154 100644 --- a/drivers/scsi/scsi_dh.c +++ b/drivers/scsi/scsi_dh.c @@ -58,7 +58,10 @@ static const struct scsi_dh_blist scsi_dh_blist[] = { {"IBM", "3526", "rdac", }, {"IBM", "3542", "rdac", }, {"IBM", "3552", "rdac", }, - {"SGI", "TP9", "rdac", }, + {"SGI", "TP9300", "rdac", }, + {"SGI", "TP9400", "rdac", }, + {"SGI", "TP9500", "rdac", }, + {"SGI", "TP9700", "rdac", }, {"SGI", "IS", "rdac", }, {"STK", "OPENstorage", "rdac", }, {"STK", "FLEXLINE 380", "rdac", }, diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c index 74761424c6a9..c0a57c8ead22 100644 --- a/drivers/scsi/sg.c +++ b/drivers/scsi/sg.c @@ -2195,6 +2195,7 @@ sg_add_sfp(Sg_device * sdp) write_lock_irqsave(&sdp->sfd_lock, iflags); if (atomic_read(&sdp->detaching)) { write_unlock_irqrestore(&sdp->sfd_lock, iflags); + kfree(sfp); return ERR_PTR(-ENODEV); } list_add_tail(&sfp->sfd_siblings, &sdp->sfds); diff --git a/drivers/soc/qcom/pil-msa.c b/drivers/soc/qcom/pil-msa.c index dfb934ae37b9..f7d183a7bbcc 100644 --- a/drivers/soc/qcom/pil-msa.c +++ b/drivers/soc/qcom/pil-msa.c @@ -331,6 +331,9 @@ int pil_mss_shutdown(struct pil_desc *pil) drv->is_booted = false; } + if (drv->mx_spike_wa && drv->ahb_clk_vote) + clk_disable_unprepare(drv->ahb_clk); + return ret; } diff --git a/drivers/soc/qcom/pil-q6v5-mss.c b/drivers/soc/qcom/pil-q6v5-mss.c index 1177cac25ffa..c4c64409cfda 100644 --- a/drivers/soc/qcom/pil-q6v5-mss.c +++ b/drivers/soc/qcom/pil-q6v5-mss.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2012-2017, The Linux Foundation. All rights reserved. + * Copyright (c) 2012-2018, The Linux Foundation. All rights reserved. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 and @@ -41,7 +41,6 @@ #define PROXY_TIMEOUT_MS 10000 #define MAX_SSR_REASON_LEN 130U #define STOP_ACK_TIMEOUT_MS 1000 -#define QDSP6SS_NMI_STATUS 0x44 #define subsys_to_drv(d) container_of(d, struct modem_data, subsys_desc) @@ -78,17 +77,11 @@ static void restart_modem(struct modem_data *drv) static irqreturn_t modem_err_fatal_intr_handler(int irq, void *dev_id) { struct modem_data *drv = subsys_to_drv(dev_id); - u32 nmi_status = readl_relaxed(drv->q6->reg_base + QDSP6SS_NMI_STATUS); /* Ignore if we're the one that set the force stop GPIO */ if (drv->crash_shutdown) return IRQ_HANDLED; - if (nmi_status & 0x04) - pr_err("%s: Fatal error on the modem due to TZ NMI\n", - __func__); - else - pr_err("%s: Fatal error on the modem\n", __func__); subsys_set_crash_status(drv->subsys, CRASH_STATUS_ERR_FATAL); restart_modem(drv); return IRQ_HANDLED; diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c index fa61eff88496..16d45a25284f 100644 --- a/drivers/thermal/samsung/exynos_tmu.c +++ b/drivers/thermal/samsung/exynos_tmu.c @@ -585,6 +585,7 @@ static int exynos5433_tmu_initialize(struct platform_device *pdev) threshold_code = temp_to_code(data, temp); rising_threshold = readl(data->base + rising_reg_offset); + rising_threshold &= ~(0xff << j * 8); rising_threshold |= (threshold_code << j * 8); writel(rising_threshold, data->base + rising_reg_offset); diff --git a/drivers/tty/hvc/hvc_opal.c b/drivers/tty/hvc/hvc_opal.c index 47b54c6aefd2..9f660e55d1ba 100644 --- a/drivers/tty/hvc/hvc_opal.c +++ b/drivers/tty/hvc/hvc_opal.c @@ -323,7 +323,6 @@ static void udbg_init_opal_common(void) udbg_putc = udbg_opal_putc; udbg_getc = udbg_opal_getc; udbg_getc_poll = udbg_opal_getc_poll; - tb_ticks_per_usec = 0x200; /* Make udelay not suck */ } void __init hvc_opal_init_early(void) diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c index 96aa0ad32497..c8a2e5b0eff7 100644 --- a/drivers/tty/pty.c +++ b/drivers/tty/pty.c @@ -106,16 +106,19 @@ static void pty_unthrottle(struct tty_struct *tty) static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c) { struct tty_struct *to = tty->link; + unsigned long flags; if (tty->stopped) return 0; if (c > 0) { + spin_lock_irqsave(&to->port->lock, flags); /* Stuff the data into the input queue of the other end */ c = tty_insert_flip_string(to->port, buf, c); /* And shovel */ if (c) tty_flip_buffer_push(to->port); + spin_unlock_irqrestore(&to->port->lock, flags); } return c; } diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 7ed30d0b5273..a501f3ba6a3f 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -1771,6 +1771,9 @@ static const struct usb_device_id acm_ids[] = { { USB_DEVICE(0x09d8, 0x0320), /* Elatec GmbH TWN3 */ .driver_info = NO_UNION_NORMAL, /* has misplaced union descriptor */ }, + { USB_DEVICE(0x0ca6, 0xa050), /* Castles VEGA3000 */ + .driver_info = NO_UNION_NORMAL, /* reports zero length descriptor */ + }, { USB_DEVICE(0x2912, 0x0001), /* ATOL FPrint */ .driver_info = CLEAR_HALT_CONDITIONS, diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 4ecbb36e8252..1e25aa251c05 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1134,10 +1134,14 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) if (!udev || udev->state == USB_STATE_NOTATTACHED) { /* Tell hub_wq to disconnect the device or - * check for a new connection + * check for a new connection or over current condition. + * Based on USB2.0 Spec Section 11.12.5, + * C_PORT_OVER_CURRENT could be set while + * PORT_OVER_CURRENT is not. So check for any of them. */ if (udev || (portstatus & USB_PORT_STAT_CONNECTION) || - (portstatus & USB_PORT_STAT_OVERCURRENT)) + (portstatus & USB_PORT_STAT_OVERCURRENT) || + (portchange & USB_PORT_STAT_C_OVERCURRENT)) set_bit(port1, hub->change_bits); } else if (portstatus & USB_PORT_STAT_ENABLE) { @@ -3315,6 +3319,10 @@ static int wait_for_ss_port_enable(struct usb_device *udev, while (delay_ms < 2000) { if (status || *portstatus & USB_PORT_STAT_CONNECTION) break; + if (!port_is_power_on(hub, *portstatus)) { + status = -ENODEV; + break; + } msleep(20); delay_ms += 20; status = hub_port_status(hub, *port1, portstatus, portchange); diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 5ccc09888345..f97193011bed 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -3565,7 +3565,7 @@ static int ffs_func_setup(struct usb_function *f, ffs_log("exit"); - return USB_GADGET_DELAYED_STATUS; + return creq->wLength == 0 ? USB_GADGET_DELAYED_STATUS : 0; } static void ffs_func_suspend(struct usb_function *f) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 50a1b0a34617..dfc02a6c6d55 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -909,6 +909,41 @@ static void xhci_disable_port_wake_on_bits(struct xhci_hcd *xhci) spin_unlock_irqrestore(&xhci->lock, flags); } +static bool xhci_pending_portevent(struct xhci_hcd *xhci) +{ + __le32 __iomem **port_array; + int port_index; + u32 status; + u32 portsc; + + status = readl(&xhci->op_regs->status); + if (status & STS_EINT) + return true; + /* + * Checking STS_EINT is not enough as there is a lag between a change + * bit being set and the Port Status Change Event that it generated + * being written to the Event Ring. See note in xhci 1.1 section 4.19.2. + */ + + port_index = xhci->num_usb2_ports; + port_array = xhci->usb2_ports; + while (port_index--) { + portsc = readl(port_array[port_index]); + if (portsc & PORT_CHANGE_MASK || + (portsc & PORT_PLS_MASK) == XDEV_RESUME) + return true; + } + port_index = xhci->num_usb3_ports; + port_array = xhci->usb3_ports; + while (port_index--) { + portsc = readl(port_array[port_index]); + if (portsc & PORT_CHANGE_MASK || + (portsc & PORT_PLS_MASK) == XDEV_RESUME) + return true; + } + return false; +} + /* * Stop HC (not bus-specific) * @@ -1006,7 +1041,7 @@ EXPORT_SYMBOL_GPL(xhci_suspend); */ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) { - u32 command, temp = 0, status; + u32 command, temp = 0; struct usb_hcd *hcd = xhci_to_hcd(xhci); struct usb_hcd *secondary_hcd; int retval = 0; @@ -1128,8 +1163,7 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated) done: if (retval == 0) { /* Resume root hubs only when have pending events. */ - status = readl(&xhci->op_regs->status); - if (status & STS_EINT) { + if (xhci_pending_portevent(xhci)) { usb_hcd_resume_root_hub(xhci->shared_hcd); usb_hcd_resume_root_hub(hcd); } diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index c665806983be..72beaa47c15b 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -382,6 +382,10 @@ struct xhci_op_regs { #define PORT_PLC (1 << 22) /* port configure error change - port failed to configure its link partner */ #define PORT_CEC (1 << 23) +#define PORT_CHANGE_MASK (PORT_CSC | PORT_PEC | PORT_WRC | PORT_OCC | \ + PORT_RC | PORT_PLC | PORT_CEC) + + /* Cold Attach Status - xHC can set this bit to report device attached during * Sx state. Warm port reset should be perfomed to clear this bit and move port * to connected state. diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 44a5a8777053..645b2197930e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -955,7 +955,8 @@ err_used: if (ubufs) vhost_net_ubuf_put_wait_and_free(ubufs); err_ubufs: - sockfd_put(sock); + if (sock) + sockfd_put(sock); err_vq: mutex_unlock(&vq->mutex); err: diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index f978f258eab8..7d4c7f35e5cf 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -484,7 +484,9 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info, tell_host(vb, vb->inflate_vq); /* balloon's page migration 2nd step -- deflate "page" */ + spin_lock_irqsave(&vb_dev_info->pages_lock, flags); balloon_page_delete(page); + spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags); vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE; set_page_pfns(vb, vb->pfns, page); tell_host(vb, vb->deflate_vq); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 88bee6703cc0..7de8d545f4d6 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3932,8 +3932,8 @@ retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); while (!done && !nr_to_write_done && (index <= end) && - (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { + (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag))) { unsigned i; scanned = 1; @@ -3943,11 +3943,6 @@ retry: if (!PagePrivate(page)) continue; - if (!wbc->range_cyclic && page->index > end) { - done = 1; - break; - } - spin_lock(&mapping->private_lock); if (!PagePrivate(page)) { spin_unlock(&mapping->private_lock); @@ -4076,8 +4071,8 @@ retry: if (wbc->sync_mode == WB_SYNC_ALL) tag_pages_for_writeback(mapping, index, end); while (!done && !nr_to_write_done && (index <= end) && - (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { + (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, + &index, end, tag))) { unsigned i; scanned = 1; @@ -4101,12 +4096,6 @@ retry: continue; } - if (!wbc->range_cyclic && page->index > end) { - done = 1; - unlock_page(page); - continue; - } - if (wbc->sync_mode != WB_SYNC_NONE) { if (PageWriteback(page)) flush_fn(data); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 88d9b66e2207..a751937dded5 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -2186,6 +2186,21 @@ void assert_qgroups_uptodate(struct btrfs_trans_handle *trans) } /* + * Check if the leaf is the last leaf. Which means all node pointers + * are at their last position. + */ +static bool is_last_leaf(struct btrfs_path *path) +{ + int i; + + for (i = 1; i < BTRFS_MAX_LEVEL && path->nodes[i]; i++) { + if (path->slots[i] != btrfs_header_nritems(path->nodes[i]) - 1) + return false; + } + return true; +} + +/* * returns < 0 on error, 0 when more leafs are to be scanned. * returns 1 when done. */ @@ -2198,6 +2213,7 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, struct ulist *roots = NULL; struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem); u64 num_bytes; + bool done; int slot; int ret; @@ -2225,6 +2241,7 @@ qgroup_rescan_leaf(struct btrfs_fs_info *fs_info, struct btrfs_path *path, mutex_unlock(&fs_info->qgroup_rescan_lock); return ret; } + done = is_last_leaf(path); btrfs_item_key_to_cpu(path->nodes[0], &found, btrfs_header_nritems(path->nodes[0]) - 1); @@ -2271,6 +2288,8 @@ out: } btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem); + if (done && !ret) + ret = 1; return ret; } diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 738f5d6beb95..2c7f9a5f8717 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2961,8 +2961,11 @@ out_wake_log_root: mutex_unlock(&log_root_tree->log_mutex); /* - * The barrier before waitqueue_active is implied by mutex_unlock + * The barrier before waitqueue_active is needed so all the updates + * above are seen by the woken threads. It might not be necessary, but + * proving that seems to be hard. */ + smp_mb(); if (waitqueue_active(&log_root_tree->log_commit_wait[index2])) wake_up(&log_root_tree->log_commit_wait[index2]); out: @@ -2973,8 +2976,11 @@ out: mutex_unlock(&root->log_mutex); /* - * The barrier before waitqueue_active is implied by mutex_unlock + * The barrier before waitqueue_active is needed so all the updates + * above are seen by the woken threads. It might not be necessary, but + * proving that seems to be hard. */ + smp_mb(); if (waitqueue_active(&root->log_commit_wait[index1])) wake_up(&root->log_commit_wait[index1]); return ret; diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 22bae2b434e2..26607401edfe 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -786,8 +786,7 @@ retry: struct page **pages = NULL; mempool_t *pool = NULL; /* Becomes non-null if mempool used */ struct page *page; - int want; - u64 offset, len; + u64 offset = 0, len = 0; long writeback_stat; next = 0; @@ -796,14 +795,9 @@ retry: get_more_pages: first = -1; - want = min(end - index, - min((pgoff_t)PAGEVEC_SIZE, - max_pages - (pgoff_t)locked_pages) - 1) - + 1; - pvec_pages = pagevec_lookup_tag(&pvec, mapping, &index, - PAGECACHE_TAG_DIRTY, - want); - dout("pagevec_lookup_tag got %d\n", pvec_pages); + pvec_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, + end, PAGECACHE_TAG_DIRTY); + dout("pagevec_lookup_range_tag got %d\n", pvec_pages); if (!pvec_pages && !locked_pages) break; for (i = 0; i < pvec_pages && locked_pages < max_pages; i++) { diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index 0758d32ad01b..0f46cf550907 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -162,12 +162,8 @@ int fscrypt_do_page_crypto(const struct inode *inode, fscrypt_direction_t rw, } req = skcipher_request_alloc(tfm, gfp_flags); - if (!req) { - printk_ratelimited(KERN_ERR - "%s: crypto_request_alloc() failed\n", - __func__); + if (!req) return -ENOMEM; - } skcipher_request_set_callback( req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, @@ -184,9 +180,10 @@ int fscrypt_do_page_crypto(const struct inode *inode, fscrypt_direction_t rw, res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); skcipher_request_free(req); if (res) { - printk_ratelimited(KERN_ERR - "%s: crypto_skcipher_encrypt() returned %d\n", - __func__, res); + fscrypt_err(inode->i_sb, + "%scryption failed for inode %lu, block %llu: %d", + (rw == FS_DECRYPT ? "de" : "en"), + inode->i_ino, lblk_num, res); return res; } return 0; @@ -332,7 +329,6 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) return 0; } - /* this should eventually be an flag in d_flags */ spin_lock(&dentry->d_lock); cached_with_key = dentry->d_flags & DCACHE_ENCRYPTED_WITH_KEY; spin_unlock(&dentry->d_lock); @@ -359,7 +355,6 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags) const struct dentry_operations fscrypt_d_ops = { .d_revalidate = fscrypt_d_revalidate, }; -EXPORT_SYMBOL(fscrypt_d_ops); void fscrypt_restore_control_page(struct page *page) { @@ -428,13 +423,43 @@ fail: return res; } +void fscrypt_msg(struct super_block *sb, const char *level, + const char *fmt, ...) +{ + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + struct va_format vaf; + va_list args; + + if (!__ratelimit(&rs)) + return; + + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; + if (sb) + printk("%sfscrypt (%s): %pV\n", level, sb->s_id, &vaf); + else + printk("%sfscrypt: %pV\n", level, &vaf); + va_end(args); +} + /** * fscrypt_init() - Set up for fs encryption. */ static int __init fscrypt_init(void) { + /* + * Use an unbound workqueue to allow bios to be decrypted in parallel + * even when they happen to complete on the same CPU. This sacrifices + * locality, but it's worthwhile since decryption is CPU-intensive. + * + * Also use a high-priority workqueue to prioritize decryption work, + * which blocks reads from completing, over regular application tasks. + */ fscrypt_read_workqueue = alloc_workqueue("fscrypt_read_queue", - WQ_HIGHPRI, 0); + WQ_UNBOUND | WQ_HIGHPRI, + num_online_cpus()); if (!fscrypt_read_workqueue) goto fail; diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c index b18fa323d1d9..1bdb9f226eec 100644 --- a/fs/crypto/fname.c +++ b/fs/crypto/fname.c @@ -58,11 +58,8 @@ int fname_encrypt(struct inode *inode, const struct qstr *iname, /* Set up the encryption request */ req = skcipher_request_alloc(tfm, GFP_NOFS); - if (!req) { - printk_ratelimited(KERN_ERR - "%s: skcipher_request_alloc() failed\n", __func__); + if (!req) return -ENOMEM; - } skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, crypto_req_done, &wait); @@ -73,8 +70,9 @@ int fname_encrypt(struct inode *inode, const struct qstr *iname, res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); skcipher_request_free(req); if (res < 0) { - printk_ratelimited(KERN_ERR - "%s: Error (error code %d)\n", __func__, res); + fscrypt_err(inode->i_sb, + "Filename encryption failed for inode %lu: %d", + inode->i_ino, res); return res; } @@ -95,23 +93,14 @@ static int fname_decrypt(struct inode *inode, struct skcipher_request *req = NULL; DECLARE_CRYPTO_WAIT(wait); struct scatterlist src_sg, dst_sg; - struct fscrypt_info *ci = inode->i_crypt_info; - struct crypto_skcipher *tfm = ci->ci_ctfm; + struct crypto_skcipher *tfm = inode->i_crypt_info->ci_ctfm; int res = 0; char iv[FS_CRYPTO_BLOCK_SIZE]; - unsigned lim; - - lim = inode->i_sb->s_cop->max_namelen(inode); - if (iname->len <= 0 || iname->len > lim) - return -EIO; /* Allocate request */ req = skcipher_request_alloc(tfm, GFP_NOFS); - if (!req) { - printk_ratelimited(KERN_ERR - "%s: crypto_request_alloc() failed\n", __func__); + if (!req) return -ENOMEM; - } skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, crypto_req_done, &wait); @@ -126,8 +115,9 @@ static int fname_decrypt(struct inode *inode, res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait); skcipher_request_free(req); if (res < 0) { - printk_ratelimited(KERN_ERR - "%s: Error (error code %d)\n", __func__, res); + fscrypt_err(inode->i_sb, + "Filename decryption failed for inode %lu: %d", + inode->i_ino, res); return res; } @@ -340,12 +330,12 @@ int fscrypt_setup_filename(struct inode *dir, const struct qstr *iname, return 0; } ret = fscrypt_get_encryption_info(dir); - if (ret && ret != -EOPNOTSUPP) + if (ret) return ret; if (dir->i_crypt_info) { if (!fscrypt_fname_encrypted_size(dir, iname->len, - dir->i_sb->s_cop->max_namelen(dir), + dir->i_sb->s_cop->max_namelen, &fname->crypto_buf.len)) return -ENAMETOOLONG; fname->crypto_buf.name = kmalloc(fname->crypto_buf.len, diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h index fe6f6524c1aa..ea372cd53ab6 100644 --- a/fs/crypto/fscrypt_private.h +++ b/fs/crypto/fscrypt_private.h @@ -17,15 +17,7 @@ /* Encryption parameters */ #define FS_IV_SIZE 16 -#define FS_AES_128_ECB_KEY_SIZE 16 -#define FS_AES_128_CBC_KEY_SIZE 16 -#define FS_AES_128_CTS_KEY_SIZE 16 -#define FS_AES_256_GCM_KEY_SIZE 32 -#define FS_AES_256_CBC_KEY_SIZE 32 -#define FS_AES_256_CTS_KEY_SIZE 32 -#define FS_AES_256_XTS_KEY_SIZE 64 - -#define FS_KEY_DERIVATION_NONCE_SIZE 16 +#define FS_KEY_DERIVATION_NONCE_SIZE 16 /** * Encryption context for inode @@ -119,6 +111,15 @@ extern int fscrypt_do_page_crypto(const struct inode *inode, gfp_t gfp_flags); extern struct page *fscrypt_alloc_bounce_page(struct fscrypt_ctx *ctx, gfp_t gfp_flags); +extern const struct dentry_operations fscrypt_d_ops; + +extern void __printf(3, 4) __cold +fscrypt_msg(struct super_block *sb, const char *level, const char *fmt, ...); + +#define fscrypt_warn(sb, fmt, ...) \ + fscrypt_msg(sb, KERN_WARNING, fmt, ##__VA_ARGS__) +#define fscrypt_err(sb, fmt, ...) \ + fscrypt_msg(sb, KERN_ERR, fmt, ##__VA_ARGS__) /* fname.c */ extern int fname_encrypt(struct inode *inode, const struct qstr *iname, diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c index bc010e4609ef..b5328a0c6364 100644 --- a/fs/crypto/hooks.c +++ b/fs/crypto/hooks.c @@ -39,8 +39,9 @@ int fscrypt_file_open(struct inode *inode, struct file *filp) dir = dget_parent(file_dentry(filp)); if (IS_ENCRYPTED(d_inode(dir)) && !fscrypt_has_permitted_context(d_inode(dir), inode)) { - pr_warn_ratelimited("fscrypt: inconsistent encryption contexts: %lu/%lu", - d_inode(dir)->i_ino, inode->i_ino); + fscrypt_warn(inode->i_sb, + "inconsistent encryption contexts: %lu/%lu", + d_inode(dir)->i_ino, inode->i_ino); err = -EPERM; } dput(dir); diff --git a/fs/crypto/keyinfo.c b/fs/crypto/keyinfo.c index 472f69188a96..382e828f2f9a 100644 --- a/fs/crypto/keyinfo.c +++ b/fs/crypto/keyinfo.c @@ -18,17 +18,16 @@ static struct crypto_shash *essiv_hash_tfm; -/** - * derive_key_aes() - Derive a key using AES-128-ECB - * @deriving_key: Encryption key used for derivation. - * @source_key: Source key to which to apply derivation. - * @derived_raw_key: Derived raw key. +/* + * Key derivation function. This generates the derived key by encrypting the + * master key with AES-128-ECB using the inode's nonce as the AES key. * - * Return: Zero on success; non-zero otherwise. + * The master key must be at least as long as the derived key. If the master + * key is longer, then only the first 'derived_keysize' bytes are used. */ -static int derive_key_aes(u8 deriving_key[FS_AES_128_ECB_KEY_SIZE], - const struct fscrypt_key *source_key, - u8 derived_raw_key[FS_MAX_KEY_SIZE]) +static int derive_key_aes(const u8 *master_key, + const struct fscrypt_context *ctx, + u8 *derived_key, unsigned int derived_keysize) { int res = 0; struct skcipher_request *req = NULL; @@ -50,14 +49,13 @@ static int derive_key_aes(u8 deriving_key[FS_AES_128_ECB_KEY_SIZE], skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP, crypto_req_done, &wait); - res = crypto_skcipher_setkey(tfm, deriving_key, - FS_AES_128_ECB_KEY_SIZE); + res = crypto_skcipher_setkey(tfm, ctx->nonce, sizeof(ctx->nonce)); if (res < 0) goto out; - sg_init_one(&src_sg, source_key->raw, source_key->size); - sg_init_one(&dst_sg, derived_raw_key, source_key->size); - skcipher_request_set_crypt(req, &src_sg, &dst_sg, source_key->size, + sg_init_one(&src_sg, master_key, derived_keysize); + sg_init_one(&dst_sg, derived_key, derived_keysize); + skcipher_request_set_crypt(req, &src_sg, &dst_sg, derived_keysize, NULL); res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait); out: @@ -66,103 +64,147 @@ out: return res; } -static int validate_user_key(struct fscrypt_info *crypt_info, - struct fscrypt_context *ctx, u8 *raw_key, - const char *prefix, int min_keysize) +/* + * Search the current task's subscribed keyrings for a "logon" key with + * description prefix:descriptor, and if found acquire a read lock on it and + * return a pointer to its validated payload in *payload_ret. + */ +static struct key * +find_and_lock_process_key(const char *prefix, + const u8 descriptor[FS_KEY_DESCRIPTOR_SIZE], + unsigned int min_keysize, + const struct fscrypt_key **payload_ret) { char *description; - struct key *keyring_key; - struct fscrypt_key *master_key; + struct key *key; const struct user_key_payload *ukp; - int res; + const struct fscrypt_key *payload; description = kasprintf(GFP_NOFS, "%s%*phN", prefix, - FS_KEY_DESCRIPTOR_SIZE, - ctx->master_key_descriptor); + FS_KEY_DESCRIPTOR_SIZE, descriptor); if (!description) - return -ENOMEM; + return ERR_PTR(-ENOMEM); - keyring_key = request_key(&key_type_logon, description, NULL); + key = request_key(&key_type_logon, description, NULL); kfree(description); - if (IS_ERR(keyring_key)) - return PTR_ERR(keyring_key); - down_read(&keyring_key->sem); - - if (keyring_key->type != &key_type_logon) { - printk_once(KERN_WARNING - "%s: key type must be logon\n", __func__); - res = -ENOKEY; - goto out; - } - ukp = user_key_payload(keyring_key); - if (!ukp) { - /* key was revoked before we acquired its semaphore */ - res = -EKEYREVOKED; - goto out; + if (IS_ERR(key)) + return key; + + down_read(&key->sem); + ukp = user_key_payload(key); + + if (!ukp) /* was the key revoked before we acquired its semaphore? */ + goto invalid; + + payload = (const struct fscrypt_key *)ukp->data; + + if (ukp->datalen != sizeof(struct fscrypt_key) || + payload->size < 1 || payload->size > FS_MAX_KEY_SIZE) { + fscrypt_warn(NULL, + "key with description '%s' has invalid payload", + key->description); + goto invalid; } - if (ukp->datalen != sizeof(struct fscrypt_key)) { - res = -EINVAL; - goto out; + + if (payload->size < min_keysize) { + fscrypt_warn(NULL, + "key with description '%s' is too short (got %u bytes, need %u+ bytes)", + key->description, payload->size, min_keysize); + goto invalid; } - master_key = (struct fscrypt_key *)ukp->data; - BUILD_BUG_ON(FS_AES_128_ECB_KEY_SIZE != FS_KEY_DERIVATION_NONCE_SIZE); - - if (master_key->size < min_keysize || master_key->size > FS_MAX_KEY_SIZE - || master_key->size % AES_BLOCK_SIZE != 0) { - printk_once(KERN_WARNING - "%s: key size incorrect: %d\n", - __func__, master_key->size); - res = -ENOKEY; - goto out; + + *payload_ret = payload; + return key; + +invalid: + up_read(&key->sem); + key_put(key); + return ERR_PTR(-ENOKEY); +} + +/* Find the master key, then derive the inode's actual encryption key */ +static int find_and_derive_key(const struct inode *inode, + const struct fscrypt_context *ctx, + u8 *derived_key, unsigned int derived_keysize) +{ + struct key *key; + const struct fscrypt_key *payload; + int err; + + key = find_and_lock_process_key(FS_KEY_DESC_PREFIX, + ctx->master_key_descriptor, + derived_keysize, &payload); + if (key == ERR_PTR(-ENOKEY) && inode->i_sb->s_cop->key_prefix) { + key = find_and_lock_process_key(inode->i_sb->s_cop->key_prefix, + ctx->master_key_descriptor, + derived_keysize, &payload); } - res = derive_key_aes(ctx->nonce, master_key, raw_key); -out: - up_read(&keyring_key->sem); - key_put(keyring_key); - return res; + if (IS_ERR(key)) + return PTR_ERR(key); + err = derive_key_aes(payload->raw, ctx, derived_key, derived_keysize); + up_read(&key->sem); + key_put(key); + return err; } -static const struct { +static struct fscrypt_mode { + const char *friendly_name; const char *cipher_str; int keysize; + bool logged_impl_name; } available_modes[] = { - [FS_ENCRYPTION_MODE_AES_256_XTS] = { "xts(aes)", - FS_AES_256_XTS_KEY_SIZE }, - [FS_ENCRYPTION_MODE_AES_256_CTS] = { "cts(cbc(aes))", - FS_AES_256_CTS_KEY_SIZE }, - [FS_ENCRYPTION_MODE_AES_128_CBC] = { "cbc(aes)", - FS_AES_128_CBC_KEY_SIZE }, - [FS_ENCRYPTION_MODE_AES_128_CTS] = { "cts(cbc(aes))", - FS_AES_128_CTS_KEY_SIZE }, - [FS_ENCRYPTION_MODE_SPECK128_256_XTS] = { "xts(speck128)", 64 }, - [FS_ENCRYPTION_MODE_SPECK128_256_CTS] = { "cts(cbc(speck128))", 32 }, + [FS_ENCRYPTION_MODE_AES_256_XTS] = { + .friendly_name = "AES-256-XTS", + .cipher_str = "xts(aes)", + .keysize = 64, + }, + [FS_ENCRYPTION_MODE_AES_256_CTS] = { + .friendly_name = "AES-256-CTS-CBC", + .cipher_str = "cts(cbc(aes))", + .keysize = 32, + }, + [FS_ENCRYPTION_MODE_AES_128_CBC] = { + .friendly_name = "AES-128-CBC", + .cipher_str = "cbc(aes)", + .keysize = 16, + }, + [FS_ENCRYPTION_MODE_AES_128_CTS] = { + .friendly_name = "AES-128-CTS-CBC", + .cipher_str = "cts(cbc(aes))", + .keysize = 16, + }, + [FS_ENCRYPTION_MODE_SPECK128_256_XTS] = { + .friendly_name = "Speck128/256-XTS", + .cipher_str = "xts(speck128)", + .keysize = 64, + }, + [FS_ENCRYPTION_MODE_SPECK128_256_CTS] = { + .friendly_name = "Speck128/256-CTS-CBC", + .cipher_str = "cts(cbc(speck128))", + .keysize = 32, + }, }; -static int determine_cipher_type(struct fscrypt_info *ci, struct inode *inode, - const char **cipher_str_ret, int *keysize_ret) +static struct fscrypt_mode * +select_encryption_mode(const struct fscrypt_info *ci, const struct inode *inode) { - u32 mode; - if (!fscrypt_valid_enc_modes(ci->ci_data_mode, ci->ci_filename_mode)) { - pr_warn_ratelimited("fscrypt: inode %lu uses unsupported encryption modes (contents mode %d, filenames mode %d)\n", - inode->i_ino, - ci->ci_data_mode, ci->ci_filename_mode); - return -EINVAL; + fscrypt_warn(inode->i_sb, + "inode %lu uses unsupported encryption modes (contents mode %d, filenames mode %d)", + inode->i_ino, ci->ci_data_mode, + ci->ci_filename_mode); + return ERR_PTR(-EINVAL); } - if (S_ISREG(inode->i_mode)) { - mode = ci->ci_data_mode; - } else if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) { - mode = ci->ci_filename_mode; - } else { - WARN_ONCE(1, "fscrypt: filesystem tried to load encryption info for inode %lu, which is not encryptable (file type %d)\n", - inode->i_ino, (inode->i_mode & S_IFMT)); - return -EINVAL; - } + if (S_ISREG(inode->i_mode)) + return &available_modes[ci->ci_data_mode]; + + if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) + return &available_modes[ci->ci_filename_mode]; - *cipher_str_ret = available_modes[mode].cipher_str; - *keysize_ret = available_modes[mode].keysize; - return 0; + WARN_ONCE(1, "fscrypt: filesystem tried to load encryption info for inode %lu, which is not encryptable (file type %d)\n", + inode->i_ino, (inode->i_mode & S_IFMT)); + return ERR_PTR(-EINVAL); } static void put_crypt_info(struct fscrypt_info *ci) @@ -185,8 +227,9 @@ static int derive_essiv_salt(const u8 *key, int keysize, u8 *salt) tfm = crypto_alloc_shash("sha256", 0, 0); if (IS_ERR(tfm)) { - pr_warn_ratelimited("fscrypt: error allocating SHA-256 transform: %ld\n", - PTR_ERR(tfm)); + fscrypt_warn(NULL, + "error allocating SHA-256 transform: %ld", + PTR_ERR(tfm)); return PTR_ERR(tfm); } prev_tfm = cmpxchg(&essiv_hash_tfm, NULL, tfm); @@ -246,8 +289,7 @@ int fscrypt_get_encryption_info(struct inode *inode) struct fscrypt_info *crypt_info; struct fscrypt_context ctx; struct crypto_skcipher *ctfm; - const char *cipher_str; - int keysize; + struct fscrypt_mode *mode; u8 *raw_key = NULL; int res; @@ -291,57 +333,59 @@ int fscrypt_get_encryption_info(struct inode *inode) memcpy(crypt_info->ci_master_key, ctx.master_key_descriptor, sizeof(crypt_info->ci_master_key)); - res = determine_cipher_type(crypt_info, inode, &cipher_str, &keysize); - if (res) + mode = select_encryption_mode(crypt_info, inode); + if (IS_ERR(mode)) { + res = PTR_ERR(mode); goto out; + } /* * This cannot be a stack buffer because it is passed to the scatterlist * crypto API as part of key derivation. */ res = -ENOMEM; - raw_key = kmalloc(FS_MAX_KEY_SIZE, GFP_NOFS); + raw_key = kmalloc(mode->keysize, GFP_NOFS); if (!raw_key) goto out; - res = validate_user_key(crypt_info, &ctx, raw_key, FS_KEY_DESC_PREFIX, - keysize); - if (res && inode->i_sb->s_cop->key_prefix) { - int res2 = validate_user_key(crypt_info, &ctx, raw_key, - inode->i_sb->s_cop->key_prefix, - keysize); - if (res2) { - if (res2 == -ENOKEY) - res = -ENOKEY; - goto out; - } - } else if (res) { + res = find_and_derive_key(inode, &ctx, raw_key, mode->keysize); + if (res) goto out; - } - ctfm = crypto_alloc_skcipher(cipher_str, 0, 0); - if (!ctfm || IS_ERR(ctfm)) { - res = ctfm ? PTR_ERR(ctfm) : -ENOMEM; - pr_debug("%s: error %d (inode %lu) allocating crypto tfm\n", - __func__, res, inode->i_ino); + + ctfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0); + if (IS_ERR(ctfm)) { + res = PTR_ERR(ctfm); + fscrypt_warn(inode->i_sb, + "error allocating '%s' transform for inode %lu: %d", + mode->cipher_str, inode->i_ino, res); goto out; } + if (unlikely(!mode->logged_impl_name)) { + /* + * fscrypt performance can vary greatly depending on which + * crypto algorithm implementation is used. Help people debug + * performance problems by logging the ->cra_driver_name the + * first time a mode is used. Note that multiple threads can + * race here, but it doesn't really matter. + */ + mode->logged_impl_name = true; + pr_info("fscrypt: %s using implementation \"%s\"\n", + mode->friendly_name, + crypto_skcipher_alg(ctfm)->base.cra_driver_name); + } crypt_info->ci_ctfm = ctfm; - crypto_skcipher_clear_flags(ctfm, ~0); crypto_skcipher_set_flags(ctfm, CRYPTO_TFM_REQ_WEAK_KEY); - /* - * if the provided key is longer than keysize, we use the first - * keysize bytes of the derived key only - */ - res = crypto_skcipher_setkey(ctfm, raw_key, keysize); + res = crypto_skcipher_setkey(ctfm, raw_key, mode->keysize); if (res) goto out; if (S_ISREG(inode->i_mode) && crypt_info->ci_data_mode == FS_ENCRYPTION_MODE_AES_128_CBC) { - res = init_essiv_generator(crypt_info, raw_key, keysize); + res = init_essiv_generator(crypt_info, raw_key, mode->keysize); if (res) { - pr_debug("%s: error %d (inode %lu) allocating essiv tfm\n", - __func__, res, inode->i_ino); + fscrypt_warn(inode->i_sb, + "error initializing ESSIV generator for inode %lu: %d", + inode->i_ino, res); goto out; } } diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index 092da164bdc0..e0fb7cdcee89 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -378,6 +378,8 @@ static int ext4_validate_block_bitmap(struct super_block *sb, return -EFSCORRUPTED; ext4_lock_group(sb, block_group); + if (buffer_verified(bh)) + goto verified; if (unlikely(!ext4_block_bitmap_csum_verify(sb, block_group, desc, bh))) { ext4_unlock_group(sb, block_group); @@ -400,6 +402,7 @@ static int ext4_validate_block_bitmap(struct super_block *sb, return -EFSCORRUPTED; } set_buffer_verified(bh); +verified: ext4_unlock_group(sb, block_group); return 0; } diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index 48d818eba9c3..0963213e9cd3 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -88,6 +88,8 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, return -EFSCORRUPTED; ext4_lock_group(sb, block_group); + if (buffer_verified(bh)) + goto verified; blk = ext4_inode_bitmap(sb, desc); if (!ext4_inode_bitmap_csum_verify(sb, block_group, desc, bh, EXT4_INODES_PER_GROUP(sb) / 8)) { @@ -105,6 +107,7 @@ static int ext4_validate_inode_bitmap(struct super_block *sb, return -EFSBADCRC; } set_buffer_verified(bh); +verified: ext4_unlock_group(sb, block_group); return 0; } @@ -1305,7 +1308,10 @@ int ext4_init_inode_table(struct super_block *sb, ext4_group_t group, ext4_itable_unused_count(sb, gdp)), sbi->s_inodes_per_block); - if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group)) { + if ((used_blks < 0) || (used_blks > sbi->s_itb_per_group) || + ((group == 0) && ((EXT4_INODES_PER_GROUP(sb) - + ext4_itable_unused_count(sb, gdp)) < + EXT4_FIRST_INO(sb)))) { ext4_error(sb, "Something is wrong with group %u: " "used itable blocks: %d; " "itable unused count: %u", diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c index dc209c9a11f0..4aca27940b44 100644 --- a/fs/ext4/inline.c +++ b/fs/ext4/inline.c @@ -692,6 +692,10 @@ int ext4_try_to_write_inline_data(struct address_space *mapping, goto convert; } + ret = ext4_journal_get_write_access(handle, iloc.bh); + if (ret) + goto out; + flags |= AOP_FLAG_NOFS; page = grab_cache_page_write_begin(mapping, 0, flags); @@ -720,7 +724,7 @@ int ext4_try_to_write_inline_data(struct address_space *mapping, out_up_read: up_read(&EXT4_I(inode)->xattr_sem); out: - if (handle) + if (handle && (ret != 1)) ext4_journal_stop(handle); brelse(iloc.bh); return ret; @@ -762,6 +766,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, ext4_write_unlock_xattr(inode, &no_expand); brelse(iloc.bh); + mark_inode_dirty(inode); out: return copied; } @@ -908,7 +913,6 @@ retry_journal: goto out; } - page = grab_cache_page_write_begin(mapping, 0, flags); if (!page) { ret = -ENOMEM; @@ -926,6 +930,9 @@ retry_journal: if (ret < 0) goto out_release_page; } + ret = ext4_journal_get_write_access(handle, iloc.bh); + if (ret) + goto out_release_page; up_read(&EXT4_I(inode)->xattr_sem); *pagep = page; @@ -946,7 +953,6 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied, struct page *page) { - int i_size_changed = 0; int ret; ret = ext4_write_inline_data_end(inode, pos, len, copied, page); @@ -964,10 +970,8 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * But it's important to update i_size while still holding page lock: * page writeout could otherwise come in and zero beyond i_size. */ - if (pos+copied > inode->i_size) { + if (pos+copied > inode->i_size) i_size_write(inode, pos+copied); - i_size_changed = 1; - } unlock_page(page); page_cache_release(page); @@ -977,8 +981,7 @@ int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos, * ordering of page lock and transaction start for journaling * filesystems. */ - if (i_size_changed) - mark_inode_dirty(inode); + mark_inode_dirty(inode); return copied; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 19a0fa07c0fd..12ca9da02fdc 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1192,10 +1192,11 @@ static int ext4_write_end(struct file *file, loff_t old_size = inode->i_size; int ret = 0, ret2; int i_size_changed = 0; + int inline_data = ext4_has_inline_data(inode); trace_android_fs_datawrite_end(inode, pos, len); trace_ext4_write_end(inode, pos, len, copied); - if (ext4_has_inline_data(inode)) { + if (inline_data) { ret = ext4_write_inline_data_end(inode, pos, len, copied, page); if (ret < 0) { @@ -1223,7 +1224,7 @@ static int ext4_write_end(struct file *file, * ordering of page lock and transaction start for journaling * filesystems. */ - if (i_size_changed) + if (i_size_changed || inline_data) ext4_mark_inode_dirty(handle, inode); if (pos + len > inode->i_size && ext4_can_truncate(inode)) @@ -1297,6 +1298,7 @@ static int ext4_journalled_write_end(struct file *file, int partial = 0; unsigned from, to; int size_changed = 0; + int inline_data = ext4_has_inline_data(inode); trace_android_fs_datawrite_end(inode, pos, len); trace_ext4_journalled_write_end(inode, pos, len, copied); @@ -1305,7 +1307,7 @@ static int ext4_journalled_write_end(struct file *file, BUG_ON(!ext4_handle_valid(handle)); - if (ext4_has_inline_data(inode)) { + if (inline_data) { ret = ext4_write_inline_data_end(inode, pos, len, copied, page); if (ret < 0) { @@ -1336,7 +1338,7 @@ static int ext4_journalled_write_end(struct file *file, if (old_size < pos) pagecache_isize_extended(inode, old_size, pos); - if (size_changed) { + if (size_changed || inline_data) { ret2 = ext4_mark_inode_dirty(handle, inode); if (!ret) ret = ret2; @@ -1834,11 +1836,7 @@ static int __ext4_journalled_writepage(struct page *page, } if (inline_data) { - BUFFER_TRACE(inode_bh, "get write access"); - ret = ext4_journal_get_write_access(handle, inode_bh); - - err = ext4_handle_dirty_metadata(handle, inode, inode_bh); - + ret = ext4_mark_inode_dirty(handle, inode); } else { ret = ext4_walk_page_buffers(handle, page_bufs, 0, len, NULL, do_journal_get_write_access); @@ -2430,8 +2428,8 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) mpd->map.m_len = 0; mpd->next_page = index; while (index <= end) { - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag); if (nr_pages == 0) goto out; @@ -2439,16 +2437,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) struct page *page = pvec.pages[i]; /* - * At this point, the page may be truncated or - * invalidated (changing page->mapping to NULL), or - * even swizzled back from swapper_space to tmpfs file - * mapping. However, page->index will not change - * because we have a reference on the page. - */ - if (page->index > end) - goto out; - - /* * Accumulated enough dirty pages? This doesn't apply * to WB_SYNC_ALL mode. For integrity sync we have to * keep going because someone may be concurrently diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c5ba22bdeaa9..53d28a0ca6eb 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2874,14 +2874,8 @@ static ext4_group_t ext4_has_uninit_itable(struct super_block *sb) if (!gdp) continue; - if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED)) - continue; - if (group != 0) + if (!(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) break; - ext4_error(sb, "Inode table for bg 0 marked as " - "needing zeroing"); - if (sb->s_flags & MS_RDONLY) - return ngroups; } return group; diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 760d1ad22722..178623c15765 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -24,7 +24,7 @@ #include <trace/events/f2fs.h> static struct kmem_cache *ino_entry_slab; -struct kmem_cache *inode_entry_slab; +struct kmem_cache *f2fs_inode_entry_slab; void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io) { @@ -36,7 +36,7 @@ void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io) /* * We guarantee no failure on the returned page. */ -struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) +struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) { struct address_space *mapping = META_MAPPING(sbi); struct page *page = NULL; @@ -100,24 +100,27 @@ repeat: * readonly and make sure do not write checkpoint with non-uptodate * meta page. */ - if (unlikely(!PageUptodate(page))) + if (unlikely(!PageUptodate(page))) { + memset(page_address(page), 0, PAGE_SIZE); f2fs_stop_checkpoint(sbi, false); + } out: return page; } -struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) +struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index) { return __get_meta_page(sbi, index, true); } /* for POR only */ -struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index) +struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index) { return __get_meta_page(sbi, index, false); } -bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) +bool f2fs_is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type) { switch (type) { case META_NAT: @@ -151,7 +154,7 @@ bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) /* * Readahead CP/NAT/SIT/SSA pages */ -int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, +int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type, bool sync) { struct page *page; @@ -174,7 +177,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, blk_start_plug(&plug); for (; nrpages-- > 0; blkno++) { - if (!is_valid_blkaddr(sbi, blkno, type)) + if (!f2fs_is_valid_meta_blkaddr(sbi, blkno, type)) goto out; switch (type) { @@ -218,7 +221,7 @@ out: return blkno - start; } -void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index) +void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index) { struct page *page; bool readahead = false; @@ -229,7 +232,7 @@ void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index) f2fs_put_page(page, 0); if (readahead) - ra_meta_pages(sbi, index, BIO_MAX_PAGES, META_POR, true); + f2fs_ra_meta_pages(sbi, index, BIO_MAX_PAGES, META_POR, true); } static int __f2fs_write_meta_page(struct page *page, @@ -250,7 +253,7 @@ static int __f2fs_write_meta_page(struct page *page, if (wbc->for_reclaim && page->index < GET_SUM_BLOCK(sbi, 0)) goto redirty_out; - write_meta_page(sbi, page, io_type); + f2fs_do_write_meta_page(sbi, page, io_type); dec_page_count(sbi, F2FS_DIRTY_META); if (wbc->for_reclaim) @@ -295,7 +298,7 @@ static int f2fs_write_meta_pages(struct address_space *mapping, trace_f2fs_writepages(mapping->host, wbc, META); diff = nr_pages_to_write(sbi, META, wbc); - written = sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO); + written = f2fs_sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO); mutex_unlock(&sbi->cp_mutex); wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff); return 0; @@ -306,13 +309,14 @@ skip_write: return 0; } -long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, +long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, long nr_to_write, enum iostat_type io_type) { struct address_space *mapping = META_MAPPING(sbi); - pgoff_t index = 0, end = ULONG_MAX, prev = ULONG_MAX; + pgoff_t index = 0, prev = ULONG_MAX; struct pagevec pvec; long nwritten = 0; + int nr_pages; struct writeback_control wbc = { .for_reclaim = 0, }; @@ -322,13 +326,9 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, blk_start_plug(&plug); - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (unlikely(nr_pages == 0)) - break; + while ((nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, + PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; @@ -459,20 +459,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) spin_unlock(&im->ino_lock); } -void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) +void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) { /* add new dirty ino entry into list */ __add_ino_entry(sbi, ino, 0, type); } -void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) +void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) { /* remove dirty ino entry from list */ __remove_ino_entry(sbi, ino, type); } /* mode should be APPEND_INO or UPDATE_INO */ -bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) +bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) { struct inode_management *im = &sbi->im[mode]; struct ino_entry *e; @@ -483,7 +483,7 @@ bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode) return e ? true : false; } -void release_ino_entry(struct f2fs_sb_info *sbi, bool all) +void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all) { struct ino_entry *e, *tmp; int i; @@ -502,13 +502,13 @@ void release_ino_entry(struct f2fs_sb_info *sbi, bool all) } } -void set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, +void f2fs_set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, unsigned int devidx, int type) { __add_ino_entry(sbi, ino, devidx, type); } -bool is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, +bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, unsigned int devidx, int type) { struct inode_management *im = &sbi->im[type]; @@ -523,7 +523,7 @@ bool is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, return is_dirty; } -int acquire_orphan_inode(struct f2fs_sb_info *sbi) +int f2fs_acquire_orphan_inode(struct f2fs_sb_info *sbi) { struct inode_management *im = &sbi->im[ORPHAN_INO]; int err = 0; @@ -546,7 +546,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *sbi) return err; } -void release_orphan_inode(struct f2fs_sb_info *sbi) +void f2fs_release_orphan_inode(struct f2fs_sb_info *sbi) { struct inode_management *im = &sbi->im[ORPHAN_INO]; @@ -556,14 +556,14 @@ void release_orphan_inode(struct f2fs_sb_info *sbi) spin_unlock(&im->ino_lock); } -void add_orphan_inode(struct inode *inode) +void f2fs_add_orphan_inode(struct inode *inode) { /* add new orphan ino entry into list */ __add_ino_entry(F2FS_I_SB(inode), inode->i_ino, 0, ORPHAN_INO); - update_inode_page(inode); + f2fs_update_inode_page(inode); } -void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) +void f2fs_remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) { /* remove orphan entry from orphan list */ __remove_ino_entry(sbi, ino, ORPHAN_INO); @@ -573,7 +573,7 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) { struct inode *inode; struct node_info ni; - int err = acquire_orphan_inode(sbi); + int err = f2fs_acquire_orphan_inode(sbi); if (err) goto err_out; @@ -591,16 +591,17 @@ static int recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino) } err = dquot_initialize(inode); - if (err) + if (err) { + iput(inode); goto err_out; + } - dquot_initialize(inode); clear_nlink(inode); /* truncate all the data during iput */ iput(inode); - get_node_info(sbi, ino, &ni); + f2fs_get_node_info(sbi, ino, &ni); /* ENOMEM was fully retried in f2fs_evict_inode. */ if (ni.blk_addr != NULL_ADDR) { @@ -618,7 +619,7 @@ err_out: return err; } -int recover_orphan_inodes(struct f2fs_sb_info *sbi) +int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi) { block_t start_blk, orphan_blocks, i, j; unsigned int s_flags = sbi->sb->s_flags; @@ -646,10 +647,10 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi) start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi); orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi); - ra_meta_pages(sbi, start_blk, orphan_blocks, META_CP, true); + f2fs_ra_meta_pages(sbi, start_blk, orphan_blocks, META_CP, true); for (i = 0; i < orphan_blocks; i++) { - struct page *page = get_meta_page(sbi, start_blk + i); + struct page *page = f2fs_get_meta_page(sbi, start_blk + i); struct f2fs_orphan_block *orphan_blk; orphan_blk = (struct f2fs_orphan_block *)page_address(page); @@ -699,7 +700,7 @@ static void write_orphan_inodes(struct f2fs_sb_info *sbi, block_t start_blk) /* loop for each orphan inode entry and write them in Jornal block */ list_for_each_entry(orphan, head, list) { if (!page) { - page = grab_meta_page(sbi, start_blk++); + page = f2fs_grab_meta_page(sbi, start_blk++); orphan_blk = (struct f2fs_orphan_block *)page_address(page); memset(orphan_blk, 0, sizeof(*orphan_blk)); @@ -741,7 +742,7 @@ static int get_checkpoint_version(struct f2fs_sb_info *sbi, block_t cp_addr, size_t crc_offset = 0; __u32 crc = 0; - *cp_page = get_meta_page(sbi, cp_addr); + *cp_page = f2fs_get_meta_page(sbi, cp_addr); *cp_block = (struct f2fs_checkpoint *)page_address(*cp_page); crc_offset = le32_to_cpu((*cp_block)->checksum_offset); @@ -794,7 +795,7 @@ invalid_cp1: return NULL; } -int get_valid_checkpoint(struct f2fs_sb_info *sbi) +int f2fs_get_valid_checkpoint(struct f2fs_sb_info *sbi) { struct f2fs_checkpoint *cp_block; struct f2fs_super_block *fsb = sbi->raw_super; @@ -806,7 +807,8 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) block_t cp_blk_no; int i; - sbi->ckpt = f2fs_kzalloc(sbi, cp_blks * blk_size, GFP_KERNEL); + sbi->ckpt = f2fs_kzalloc(sbi, array_size(blk_size, cp_blks), + GFP_KERNEL); if (!sbi->ckpt) return -ENOMEM; /* @@ -838,7 +840,7 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) memcpy(sbi->ckpt, cp_block, blk_size); /* Sanity checking of checkpoint */ - if (sanity_check_ckpt(sbi)) + if (f2fs_sanity_check_ckpt(sbi)) goto free_fail_no_cp; if (cur_page == cp1) @@ -857,7 +859,7 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) void *sit_bitmap_ptr; unsigned char *ckpt = (unsigned char *)sbi->ckpt; - cur_page = get_meta_page(sbi, cp_blk_no + i); + cur_page = f2fs_get_meta_page(sbi, cp_blk_no + i); sit_bitmap_ptr = page_address(cur_page); memcpy(ckpt + i * blk_size, sit_bitmap_ptr, blk_size); f2fs_put_page(cur_page, 1); @@ -902,7 +904,7 @@ static void __remove_dirty_inode(struct inode *inode, enum inode_type type) stat_dec_dirty_inode(F2FS_I_SB(inode), type); } -void update_dirty_page(struct inode *inode, struct page *page) +void f2fs_update_dirty_page(struct inode *inode, struct page *page) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE; @@ -921,7 +923,7 @@ void update_dirty_page(struct inode *inode, struct page *page) f2fs_trace_pid(page); } -void remove_dirty_inode(struct inode *inode) +void f2fs_remove_dirty_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE; @@ -938,7 +940,7 @@ void remove_dirty_inode(struct inode *inode) spin_unlock(&sbi->inode_lock[type]); } -int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type) +int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type) { struct list_head *head; struct inode *inode; @@ -1021,7 +1023,7 @@ int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi) /* it's on eviction */ if (is_inode_flag_set(inode, FI_DIRTY_INODE)) - update_inode_page(inode); + f2fs_update_inode_page(inode); iput(inode); } } @@ -1061,7 +1063,7 @@ retry_flush_dents: /* write all the dirty dentry pages */ if (get_pages(sbi, F2FS_DIRTY_DENTS)) { f2fs_unlock_all(sbi); - err = sync_dirty_inodes(sbi, DIR_INODE); + err = f2fs_sync_dirty_inodes(sbi, DIR_INODE); if (err) goto out; cond_resched(); @@ -1089,7 +1091,9 @@ retry_flush_nodes: if (get_pages(sbi, F2FS_DIRTY_NODES)) { up_write(&sbi->node_write); - err = sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); + atomic_inc(&sbi->wb_sync_req[NODE]); + err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); + atomic_dec(&sbi->wb_sync_req[NODE]); if (err) { up_write(&sbi->node_change); f2fs_unlock_all(sbi); @@ -1183,10 +1187,10 @@ static void commit_checkpoint(struct f2fs_sb_info *sbi, /* * pagevec_lookup_tag and lock_page again will take - * some extra time. Therefore, update_meta_pages and - * sync_meta_pages are combined in this function. + * some extra time. Therefore, f2fs_update_meta_pages and + * f2fs_sync_meta_pages are combined in this function. */ - struct page *page = grab_meta_page(sbi, blk_addr); + struct page *page = f2fs_grab_meta_page(sbi, blk_addr); int err; memcpy(page_address(page), src, PAGE_SIZE); @@ -1224,7 +1228,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) /* Flush all the NAT/SIT pages */ while (get_pages(sbi, F2FS_DIRTY_META)) { - sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); + f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); if (unlikely(f2fs_cp_error(sbi))) return -EIO; } @@ -1233,7 +1237,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) * modify checkpoint * version number is already updated */ - ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi)); + ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi, true)); ckpt->free_segment_count = cpu_to_le32(free_segments(sbi)); for (i = 0; i < NR_CURSEG_NODE_TYPE; i++) { ckpt->cur_node_segno[i] = @@ -1253,7 +1257,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) } /* 2 cp + n data seg summary + orphan inode blocks */ - data_sum_blocks = npages_for_summary_flush(sbi, false); + data_sum_blocks = f2fs_npages_for_summary_flush(sbi, false); spin_lock_irqsave(&sbi->cp_lock, flags); if (data_sum_blocks < NR_CURSEG_DATA_TYPE) __set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG); @@ -1298,22 +1302,23 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) blk = start_blk + sbi->blocks_per_seg - nm_i->nat_bits_blocks; for (i = 0; i < nm_i->nat_bits_blocks; i++) - update_meta_page(sbi, nm_i->nat_bits + + f2fs_update_meta_page(sbi, nm_i->nat_bits + (i << F2FS_BLKSIZE_BITS), blk + i); /* Flush all the NAT BITS pages */ while (get_pages(sbi, F2FS_DIRTY_META)) { - sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); + f2fs_sync_meta_pages(sbi, META, LONG_MAX, + FS_CP_META_IO); if (unlikely(f2fs_cp_error(sbi))) return -EIO; } } /* write out checkpoint buffer at block 0 */ - update_meta_page(sbi, ckpt, start_blk++); + f2fs_update_meta_page(sbi, ckpt, start_blk++); for (i = 1; i < 1 + cp_payload_blks; i++) - update_meta_page(sbi, (char *)ckpt + i * F2FS_BLKSIZE, + f2fs_update_meta_page(sbi, (char *)ckpt + i * F2FS_BLKSIZE, start_blk++); if (orphan_num) { @@ -1321,7 +1326,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) start_blk += orphan_blocks; } - write_data_summaries(sbi, start_blk); + f2fs_write_data_summaries(sbi, start_blk); start_blk += data_sum_blocks; /* Record write statistics in the hot node summary */ @@ -1332,7 +1337,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) seg_i->journal->info.kbytes_written = cpu_to_le64(kbytes_written); if (__remain_node_summaries(cpc->reason)) { - write_node_summaries(sbi, start_blk); + f2fs_write_node_summaries(sbi, start_blk); start_blk += NR_CURSEG_NODE_TYPE; } @@ -1341,7 +1346,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) percpu_counter_set(&sbi->alloc_valid_block_count, 0); /* Here, we have one bio having CP pack except cp pack 2 page */ - sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); + f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO); /* wait for previous submitted meta pages writeback */ wait_on_all_pages_writeback(sbi); @@ -1358,7 +1363,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) commit_checkpoint(sbi, ckpt, start_blk); wait_on_all_pages_writeback(sbi); - release_ino_entry(sbi, false); + f2fs_release_ino_entry(sbi, false); if (unlikely(f2fs_cp_error(sbi))) return -EIO; @@ -1383,7 +1388,7 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) /* * We guarantee that this checkpoint procedure will not fail. */ -int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) +int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); unsigned long long ckpt_ver; @@ -1416,7 +1421,7 @@ int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) /* this is the case of multiple fstrims without any changes */ if (cpc->reason & CP_DISCARD) { - if (!exist_trim_candidates(sbi, cpc)) { + if (!f2fs_exist_trim_candidates(sbi, cpc)) { unblock_operations(sbi); goto out; } @@ -1424,8 +1429,8 @@ int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) if (NM_I(sbi)->dirty_nat_cnt == 0 && SIT_I(sbi)->dirty_sentries == 0 && prefree_segments(sbi) == 0) { - flush_sit_entries(sbi, cpc); - clear_prefree_segments(sbi, cpc); + f2fs_flush_sit_entries(sbi, cpc); + f2fs_clear_prefree_segments(sbi, cpc); unblock_operations(sbi); goto out; } @@ -1440,15 +1445,15 @@ int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc) ckpt->checkpoint_ver = cpu_to_le64(++ckpt_ver); /* write cached NAT/SIT entries to NAT/SIT area */ - flush_nat_entries(sbi, cpc); - flush_sit_entries(sbi, cpc); + f2fs_flush_nat_entries(sbi, cpc); + f2fs_flush_sit_entries(sbi, cpc); /* unlock all the fs_lock[] in do_checkpoint() */ err = do_checkpoint(sbi, cpc); if (err) - release_discard_addrs(sbi); + f2fs_release_discard_addrs(sbi); else - clear_prefree_segments(sbi, cpc); + f2fs_clear_prefree_segments(sbi, cpc); unblock_operations(sbi); stat_inc_cp_count(sbi->stat_info); @@ -1465,7 +1470,7 @@ out: return err; } -void init_ino_entry_info(struct f2fs_sb_info *sbi) +void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi) { int i; @@ -1483,23 +1488,23 @@ void init_ino_entry_info(struct f2fs_sb_info *sbi) F2FS_ORPHANS_PER_BLOCK; } -int __init create_checkpoint_caches(void) +int __init f2fs_create_checkpoint_caches(void) { ino_entry_slab = f2fs_kmem_cache_create("f2fs_ino_entry", sizeof(struct ino_entry)); if (!ino_entry_slab) return -ENOMEM; - inode_entry_slab = f2fs_kmem_cache_create("f2fs_inode_entry", + f2fs_inode_entry_slab = f2fs_kmem_cache_create("f2fs_inode_entry", sizeof(struct inode_entry)); - if (!inode_entry_slab) { + if (!f2fs_inode_entry_slab) { kmem_cache_destroy(ino_entry_slab); return -ENOMEM; } return 0; } -void destroy_checkpoint_caches(void) +void f2fs_destroy_checkpoint_caches(void) { kmem_cache_destroy(ino_entry_slab); - kmem_cache_destroy(inode_entry_slab); + kmem_cache_destroy(f2fs_inode_entry_slab); } diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 3d846b027fa1..3cc679a1d53b 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -48,6 +48,8 @@ static bool __is_cp_guaranteed(struct page *page) if (inode->i_ino == F2FS_META_INO(sbi) || inode->i_ino == F2FS_NODE_INO(sbi) || S_ISDIR(inode->i_mode) || + (S_ISREG(inode->i_mode) && + is_inode_flag_set(inode, FI_ATOMIC_FILE)) || is_cold_data(page)) return true; return false; @@ -244,7 +246,7 @@ static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr, } else { bio->bi_end_io = f2fs_write_end_io; bio->bi_private = sbi; - bio->bi_write_hint = io_type_to_rw_hint(sbi, type, temp); + bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, type, temp); } if (wbc) wbc_init_bio(wbc, bio); @@ -459,13 +461,12 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) return 0; } -int f2fs_submit_page_write(struct f2fs_io_info *fio) +void f2fs_submit_page_write(struct f2fs_io_info *fio) { struct f2fs_sb_info *sbi = fio->sbi; enum page_type btype = PAGE_TYPE_OF_BIO(fio->type); struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp; struct page *bio_page; - int err = 0; f2fs_bug_on(sbi, is_read_io(fio->op)); @@ -475,7 +476,7 @@ next: spin_lock(&io->io_lock); if (list_empty(&io->io_list)) { spin_unlock(&io->io_lock); - goto out_fail; + goto out; } fio = list_first_entry(&io->io_list, struct f2fs_io_info, list); @@ -483,7 +484,7 @@ next: spin_unlock(&io->io_lock); } - if (fio->old_blkaddr != NEW_ADDR) + if (is_valid_blkaddr(fio->old_blkaddr)) verify_block_addr(fio, fio->old_blkaddr); verify_block_addr(fio, fio->new_blkaddr); @@ -502,9 +503,9 @@ alloc_new: if (io->bio == NULL) { if ((fio->type == DATA || fio->type == NODE) && fio->new_blkaddr & F2FS_IO_SIZE_MASK(sbi)) { - err = -EAGAIN; dec_page_count(sbi, WB_DATA_TYPE(bio_page)); - goto out_fail; + fio->retry = true; + goto skip; } io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc, BIO_MAX_PAGES, false, @@ -524,12 +525,11 @@ alloc_new: f2fs_trace_ios(fio, 0); trace_f2fs_submit_page_write(fio->page, fio); - +skip: if (fio->in_list) goto next; -out_fail: +out: up_write(&io->io_rwsem); - return err; } static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr, @@ -603,7 +603,7 @@ static void __set_data_blkaddr(struct dnode_of_data *dn) * ->node_page * update block addresses in the node page */ -void set_data_blkaddr(struct dnode_of_data *dn) +void f2fs_set_data_blkaddr(struct dnode_of_data *dn) { f2fs_wait_on_page_writeback(dn->node_page, NODE, true); __set_data_blkaddr(dn); @@ -614,12 +614,12 @@ void set_data_blkaddr(struct dnode_of_data *dn) void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr) { dn->data_blkaddr = blkaddr; - set_data_blkaddr(dn); + f2fs_set_data_blkaddr(dn); f2fs_update_extent_cache(dn); } /* dn->ofs_in_node will be returned with up-to-date last block pointer */ -int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count) +int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); int err; @@ -653,12 +653,12 @@ int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count) } /* Should keep dn->ofs_in_node unchanged */ -int reserve_new_block(struct dnode_of_data *dn) +int f2fs_reserve_new_block(struct dnode_of_data *dn) { unsigned int ofs_in_node = dn->ofs_in_node; int ret; - ret = reserve_new_blocks(dn, 1); + ret = f2fs_reserve_new_blocks(dn, 1); dn->ofs_in_node = ofs_in_node; return ret; } @@ -668,12 +668,12 @@ int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index) bool need_put = dn->inode_page ? false : true; int err; - err = get_dnode_of_data(dn, index, ALLOC_NODE); + err = f2fs_get_dnode_of_data(dn, index, ALLOC_NODE); if (err) return err; if (dn->data_blkaddr == NULL_ADDR) - err = reserve_new_block(dn); + err = f2fs_reserve_new_block(dn); if (err || need_put) f2fs_put_dnode(dn); return err; @@ -692,7 +692,7 @@ int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index) return f2fs_reserve_block(dn, index); } -struct page *get_read_data_page(struct inode *inode, pgoff_t index, +struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index, int op_flags, bool for_write) { struct address_space *mapping = inode->i_mapping; @@ -711,7 +711,7 @@ struct page *get_read_data_page(struct inode *inode, pgoff_t index, } set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, index, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE); if (err) goto put_err; f2fs_put_dnode(&dn); @@ -730,7 +730,8 @@ got_it: * A new dentry page is allocated but not able to be written, since its * new inode page couldn't be allocated due to -ENOSPC. * In such the case, its blkaddr can be remained as NEW_ADDR. - * see, f2fs_add_link -> get_new_data_page -> init_inode_metadata. + * see, f2fs_add_link -> f2fs_get_new_data_page -> + * f2fs_init_inode_metadata. */ if (dn.data_blkaddr == NEW_ADDR) { zero_user_segment(page, 0, PAGE_SIZE); @@ -750,7 +751,7 @@ put_err: return ERR_PTR(err); } -struct page *find_data_page(struct inode *inode, pgoff_t index) +struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index) { struct address_space *mapping = inode->i_mapping; struct page *page; @@ -760,7 +761,7 @@ struct page *find_data_page(struct inode *inode, pgoff_t index) return page; f2fs_put_page(page, 0); - page = get_read_data_page(inode, index, REQ_SYNC, false); + page = f2fs_get_read_data_page(inode, index, REQ_SYNC, false); if (IS_ERR(page)) return page; @@ -780,13 +781,13 @@ struct page *find_data_page(struct inode *inode, pgoff_t index) * Because, the callers, functions in dir.c and GC, should be able to know * whether this page exists or not. */ -struct page *get_lock_data_page(struct inode *inode, pgoff_t index, +struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index, bool for_write) { struct address_space *mapping = inode->i_mapping; struct page *page; repeat: - page = get_read_data_page(inode, index, REQ_SYNC, for_write); + page = f2fs_get_read_data_page(inode, index, REQ_SYNC, for_write); if (IS_ERR(page)) return page; @@ -812,7 +813,7 @@ repeat: * Note that, ipage is set only by make_empty_dir, and if any error occur, * ipage should be released by this function. */ -struct page *get_new_data_page(struct inode *inode, +struct page *f2fs_get_new_data_page(struct inode *inode, struct page *ipage, pgoff_t index, bool new_i_size) { struct address_space *mapping = inode->i_mapping; @@ -851,7 +852,7 @@ struct page *get_new_data_page(struct inode *inode, /* if ipage exists, blkaddr should be NEW_ADDR */ f2fs_bug_on(F2FS_I_SB(inode), ipage); - page = get_lock_data_page(inode, index, true); + page = f2fs_get_lock_data_page(inode, index, true); if (IS_ERR(page)) return page; } @@ -883,15 +884,15 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type) return err; alloc: - get_node_info(sbi, dn->nid, &ni); + f2fs_get_node_info(sbi, dn->nid, &ni); set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version); - allocate_data_block(sbi, NULL, dn->data_blkaddr, &dn->data_blkaddr, + f2fs_allocate_data_block(sbi, NULL, dn->data_blkaddr, &dn->data_blkaddr, &sum, seg_type, NULL, false); - set_data_blkaddr(dn); + f2fs_set_data_blkaddr(dn); /* update i_size */ - fofs = start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + + fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + dn->ofs_in_node; if (i_size_read(dn->inode) < ((loff_t)(fofs + 1) << PAGE_SHIFT)) f2fs_i_size_write(dn->inode, @@ -929,7 +930,7 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) map.m_seg_type = NO_CHECK_TYPE; if (direct_io) { - map.m_seg_type = rw_hint_to_seg_type(iocb->ki_hint); + map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint); flag = f2fs_force_buffered_io(inode, WRITE) ? F2FS_GET_BLOCK_PRE_AIO : F2FS_GET_BLOCK_PRE_DIO; @@ -1019,7 +1020,7 @@ next_dnode: /* When reading holes, we need its node page */ set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, pgofs, mode); + err = f2fs_get_dnode_of_data(&dn, pgofs, mode); if (err) { if (flag == F2FS_GET_BLOCK_BMAP) map->m_pblk = 0; @@ -1027,10 +1028,10 @@ next_dnode: err = 0; if (map->m_next_pgofs) *map->m_next_pgofs = - get_next_page_offset(&dn, pgofs); + f2fs_get_next_page_offset(&dn, pgofs); if (map->m_next_extent) *map->m_next_extent = - get_next_page_offset(&dn, pgofs); + f2fs_get_next_page_offset(&dn, pgofs); } goto unlock_out; } @@ -1043,7 +1044,7 @@ next_dnode: next_block: blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); - if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) { + if (!is_valid_blkaddr(blkaddr)) { if (create) { if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; @@ -1116,7 +1117,7 @@ skip: (pgofs == end || dn.ofs_in_node == end_offset)) { dn.ofs_in_node = ofs_in_node; - err = reserve_new_blocks(&dn, prealloc); + err = f2fs_reserve_new_blocks(&dn, prealloc); if (err) goto sync_out; @@ -1235,7 +1236,7 @@ static int get_data_block_dio(struct inode *inode, sector_t iblock, { return __get_data_block(inode, iblock, bh_result, create, F2FS_GET_BLOCK_DEFAULT, NULL, - rw_hint_to_seg_type( + f2fs_rw_hint_to_seg_type( inode->i_write_hint)); } @@ -1280,7 +1281,7 @@ static int f2fs_xattr_fiemap(struct inode *inode, if (!page) return -ENOMEM; - get_node_info(sbi, inode->i_ino, &ni); + f2fs_get_node_info(sbi, inode->i_ino, &ni); phys = (__u64)blk_to_logical(inode, ni.blk_addr); offset = offsetof(struct f2fs_inode, i_addr) + @@ -1307,7 +1308,7 @@ static int f2fs_xattr_fiemap(struct inode *inode, if (!page) return -ENOMEM; - get_node_info(sbi, xnid, &ni); + f2fs_get_node_info(sbi, xnid, &ni); phys = (__u64)blk_to_logical(inode, ni.blk_addr); len = inode->i_sb->s_blocksize; @@ -1610,12 +1611,12 @@ static inline bool check_inplace_update_policy(struct inode *inode, if (policy & (0x1 << F2FS_IPU_FORCE)) return true; - if (policy & (0x1 << F2FS_IPU_SSR) && need_SSR(sbi)) + if (policy & (0x1 << F2FS_IPU_SSR) && f2fs_need_SSR(sbi)) return true; if (policy & (0x1 << F2FS_IPU_UTIL) && utilization(sbi) > SM_I(sbi)->min_ipu_util) return true; - if (policy & (0x1 << F2FS_IPU_SSR_UTIL) && need_SSR(sbi) && + if (policy & (0x1 << F2FS_IPU_SSR_UTIL) && f2fs_need_SSR(sbi) && utilization(sbi) > SM_I(sbi)->min_ipu_util) return true; @@ -1636,7 +1637,7 @@ static inline bool check_inplace_update_policy(struct inode *inode, return false; } -bool should_update_inplace(struct inode *inode, struct f2fs_io_info *fio) +bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio) { if (f2fs_is_pinned_file(inode)) return true; @@ -1648,7 +1649,7 @@ bool should_update_inplace(struct inode *inode, struct f2fs_io_info *fio) return check_inplace_update_policy(inode, fio); } -bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) +bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -1671,22 +1672,13 @@ static inline bool need_inplace_update(struct f2fs_io_info *fio) { struct inode *inode = fio->page->mapping->host; - if (should_update_outplace(inode, fio)) + if (f2fs_should_update_outplace(inode, fio)) return false; - return should_update_inplace(inode, fio); -} - -static inline bool valid_ipu_blkaddr(struct f2fs_io_info *fio) -{ - if (fio->old_blkaddr == NEW_ADDR) - return false; - if (fio->old_blkaddr == NULL_ADDR) - return false; - return true; + return f2fs_should_update_inplace(inode, fio); } -int do_write_data_page(struct f2fs_io_info *fio) +int f2fs_do_write_data_page(struct f2fs_io_info *fio) { struct page *page = fio->page; struct inode *inode = page->mapping->host; @@ -1700,7 +1692,7 @@ int do_write_data_page(struct f2fs_io_info *fio) f2fs_lookup_extent_cache(inode, page->index, &ei)) { fio->old_blkaddr = ei.blk + page->index - ei.fofs; - if (valid_ipu_blkaddr(fio)) { + if (is_valid_blkaddr(fio->old_blkaddr)) { ipu_force = true; fio->need_lock = LOCK_DONE; goto got_it; @@ -1711,7 +1703,7 @@ int do_write_data_page(struct f2fs_io_info *fio) if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi)) return -EAGAIN; - err = get_dnode_of_data(&dn, page->index, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, page->index, LOOKUP_NODE); if (err) goto out; @@ -1727,7 +1719,8 @@ got_it: * If current allocation needs SSR, * it had better in-place writes for updated data. */ - if (ipu_force || (valid_ipu_blkaddr(fio) && need_inplace_update(fio))) { + if (ipu_force || (is_valid_blkaddr(fio->old_blkaddr) && + need_inplace_update(fio))) { err = encrypt_one_page(fio); if (err) goto out_writepage; @@ -1737,7 +1730,7 @@ got_it: f2fs_put_dnode(&dn); if (fio->need_lock == LOCK_REQ) f2fs_unlock_op(fio->sbi); - err = rewrite_data_page(fio); + err = f2fs_inplace_write_data(fio); trace_f2fs_do_write_data_page(fio->page, IPU); set_inode_flag(inode, FI_UPDATE_WRITE); return err; @@ -1759,7 +1752,7 @@ got_it: ClearPageError(page); /* LFS mode write path */ - write_data_page(&dn, fio); + f2fs_outplace_write_data(&dn, fio); trace_f2fs_do_write_data_page(page, OPU); set_inode_flag(inode, FI_APPEND_WRITE); if (page->index == 0) @@ -1805,6 +1798,12 @@ static int __write_data_page(struct page *page, bool *submitted, /* we should bypass data pages to proceed the kworkder jobs */ if (unlikely(f2fs_cp_error(sbi))) { mapping_set_error(page->mapping, -EIO); + /* + * don't drop any dirty dentry pages for keeping lastest + * directory structure. + */ + if (S_ISDIR(inode->i_mode)) + goto redirty_out; goto out; } @@ -1829,13 +1828,13 @@ write: /* we should not write 0'th page having journal header */ if (f2fs_is_volatile_file(inode) && (!page->index || (!wbc->for_reclaim && - available_free_memory(sbi, BASE_CHECK)))) + f2fs_available_free_memory(sbi, BASE_CHECK)))) goto redirty_out; /* Dentry blocks are controlled by checkpoint */ if (S_ISDIR(inode->i_mode)) { fio.need_lock = LOCK_DONE; - err = do_write_data_page(&fio); + err = f2fs_do_write_data_page(&fio); goto done; } @@ -1854,10 +1853,10 @@ write: } if (err == -EAGAIN) { - err = do_write_data_page(&fio); + err = f2fs_do_write_data_page(&fio); if (err == -EAGAIN) { fio.need_lock = LOCK_REQ; - err = do_write_data_page(&fio); + err = f2fs_do_write_data_page(&fio); } } @@ -1882,7 +1881,7 @@ out: if (wbc->for_reclaim) { f2fs_submit_merged_write_cond(sbi, inode, 0, page->index, DATA); clear_inode_flag(inode, FI_HOT_DATA); - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); submitted = NULL; } @@ -1932,6 +1931,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping, int ret = 0; int done = 0; struct pagevec pvec; + struct f2fs_sb_info *sbi = F2FS_M_SB(mapping); int nr_pages; pgoff_t uninitialized_var(writeback_index); pgoff_t index; @@ -1976,8 +1976,8 @@ retry: while (!done && (index <= end)) { int i; - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1); + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag); if (nr_pages == 0) break; @@ -1985,7 +1985,9 @@ retry: struct page *page = pvec.pages[i]; bool submitted = false; - if (page->index > end) { + /* give a priority to WB_SYNC threads */ + if (atomic_read(&sbi->wb_sync_req[DATA]) && + wbc->sync_mode == WB_SYNC_NONE) { done = 1; break; } @@ -2044,9 +2046,7 @@ continue_unlock: last_idx = page->index; } - /* give a priority to WB_SYNC threads */ - if ((atomic_read(&F2FS_M_SB(mapping)->wb_sync_req) || - --wbc->nr_to_write <= 0) && + if (--wbc->nr_to_write <= 0 && wbc->sync_mode == WB_SYNC_NONE) { done = 1; break; @@ -2072,7 +2072,7 @@ continue_unlock: return ret; } -int __f2fs_write_data_pages(struct address_space *mapping, +static int __f2fs_write_data_pages(struct address_space *mapping, struct writeback_control *wbc, enum iostat_type io_type) { @@ -2095,7 +2095,7 @@ int __f2fs_write_data_pages(struct address_space *mapping, if (S_ISDIR(inode->i_mode) && wbc->sync_mode == WB_SYNC_NONE && get_dirty_pages(inode) < nr_pages_to_skip(sbi, DATA) && - available_free_memory(sbi, DIRTY_DENTS)) + f2fs_available_free_memory(sbi, DIRTY_DENTS)) goto skip_write; /* skip writing during file defragment */ @@ -2106,8 +2106,8 @@ int __f2fs_write_data_pages(struct address_space *mapping, /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */ if (wbc->sync_mode == WB_SYNC_ALL) - atomic_inc(&sbi->wb_sync_req); - else if (atomic_read(&sbi->wb_sync_req)) + atomic_inc(&sbi->wb_sync_req[DATA]); + else if (atomic_read(&sbi->wb_sync_req[DATA])) goto skip_write; blk_start_plug(&plug); @@ -2115,13 +2115,13 @@ int __f2fs_write_data_pages(struct address_space *mapping, blk_finish_plug(&plug); if (wbc->sync_mode == WB_SYNC_ALL) - atomic_dec(&sbi->wb_sync_req); + atomic_dec(&sbi->wb_sync_req[DATA]); /* * if some pages were truncated, we cannot guarantee its mapping->host * to detect pending bios. */ - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); return ret; skip_write: @@ -2148,7 +2148,7 @@ static void f2fs_write_failed(struct address_space *mapping, loff_t to) if (to > i_size) { down_write(&F2FS_I(inode)->i_mmap_sem); truncate_pagecache(inode, i_size); - truncate_blocks(inode, i_size, true); + f2fs_truncate_blocks(inode, i_size, true); up_write(&F2FS_I(inode)->i_mmap_sem); } } @@ -2180,7 +2180,7 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi, } restart: /* check inline_data */ - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(ipage)) { err = PTR_ERR(ipage); goto unlock_out; @@ -2190,7 +2190,7 @@ restart: if (f2fs_has_inline_data(inode)) { if (pos + len <= MAX_INLINE_DATA(inode)) { - read_inline_data(page, ipage); + f2fs_do_read_inline_data(page, ipage); set_inode_flag(inode, FI_DATA_EXIST); if (inode->i_nlink) set_inline_node(ipage); @@ -2208,7 +2208,7 @@ restart: dn.data_blkaddr = ei.blk + index - ei.fofs; } else { /* hole case */ - err = get_dnode_of_data(&dn, index, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE); if (err || dn.data_blkaddr == NULL_ADDR) { f2fs_put_dnode(&dn); __do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO, @@ -2255,7 +2255,7 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping, trace_f2fs_write_begin(inode, pos, len, flags); if (f2fs_is_atomic_file(inode) && - !available_free_memory(sbi, INMEM_PAGES)) { + !f2fs_available_free_memory(sbi, INMEM_PAGES)) { err = -ENOMEM; drop_atomic = true; goto fail; @@ -2339,7 +2339,7 @@ fail: f2fs_put_page(page, 1); f2fs_write_failed(mapping, pos + len); if (drop_atomic) - drop_inmem_pages_all(sbi); + f2fs_drop_inmem_pages_all(sbi, false); return err; } @@ -2437,17 +2437,17 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter, if (rw == WRITE && whint_mode == WHINT_MODE_OFF) iocb->ki_hint = WRITE_LIFE_NOT_SET; - if (!down_read_trylock(&F2FS_I(inode)->dio_rwsem[rw])) { + if (!down_read_trylock(&F2FS_I(inode)->i_gc_rwsem[rw])) { if (iocb->ki_flags & IOCB_NOWAIT) { iocb->ki_hint = hint; err = -EAGAIN; goto out; } - down_read(&F2FS_I(inode)->dio_rwsem[rw]); + down_read(&F2FS_I(inode)->i_gc_rwsem[rw]); } err = blockdev_direct_IO(iocb, inode, iter, offset, get_data_block_dio); - up_read(&F2FS_I(inode)->dio_rwsem[rw]); + up_read(&F2FS_I(inode)->i_gc_rwsem[rw]); if (rw == WRITE) { if (whint_mode == WHINT_MODE_OFF) @@ -2490,13 +2490,13 @@ void f2fs_invalidate_page(struct page *page, unsigned int offset, dec_page_count(sbi, F2FS_DIRTY_NODES); } else { inode_dec_dirty_pages(inode); - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); } } /* This is atomic written page, keep Private */ if (IS_ATOMIC_WRITTEN_PAGE(page)) - return drop_inmem_page(inode, page); + return f2fs_drop_inmem_page(inode, page); set_page_private(page, 0); ClearPagePrivate(page); @@ -2529,7 +2529,7 @@ static int f2fs_set_data_page_dirty(struct page *page) if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) { if (!IS_ATOMIC_WRITTEN_PAGE(page)) { - register_inmem_page(inode, page); + f2fs_register_inmem_page(inode, page); return 1; } /* @@ -2541,7 +2541,7 @@ static int f2fs_set_data_page_dirty(struct page *page) if (!PageDirty(page)) { __set_page_dirty_nobuffers(page); - update_dirty_page(inode, page); + f2fs_update_dirty_page(inode, page); return 1; } return 0; @@ -2634,6 +2634,17 @@ const struct address_space_operations f2fs_dblock_aops = { #endif }; +void f2fs_clear_radix_tree_dirty_tag(struct page *page) +{ + struct address_space *mapping = page_mapping(page); + unsigned long flags; + + spin_lock_irqsave(&mapping->tree_lock, flags); + radix_tree_tag_clear(&mapping->page_tree, page_index(page), + PAGECACHE_TAG_DIRTY); + spin_unlock_irqrestore(&mapping->tree_lock, flags); +} + int __init f2fs_init_post_read_processing(void) { bio_post_read_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0); diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c index a66107b5cfff..2d65e77ae5cf 100644 --- a/fs/f2fs/debug.c +++ b/fs/f2fs/debug.c @@ -104,6 +104,8 @@ static void update_general_status(struct f2fs_sb_info *sbi) si->avail_nids = NM_I(sbi)->available_nids; si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID]; si->bg_gc = sbi->bg_gc; + si->skipped_atomic_files[BG_GC] = sbi->skipped_atomic_files[BG_GC]; + si->skipped_atomic_files[FG_GC] = sbi->skipped_atomic_files[FG_GC]; si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg) * 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg) / 2; @@ -342,6 +344,10 @@ static int stat_show(struct seq_file *s, void *v) si->bg_data_blks); seq_printf(s, " - node blocks : %d (%d)\n", si->node_blks, si->bg_node_blks); + seq_printf(s, "Skipped : atomic write %llu (%llu)\n", + si->skipped_atomic_files[BG_GC] + + si->skipped_atomic_files[FG_GC], + si->skipped_atomic_files[BG_GC]); seq_puts(s, "\nExtent Cache:\n"); seq_printf(s, " - Hit Count: L1-1:%llu L1-2:%llu L2:%llu\n", si->hit_largest, si->hit_cached, diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 41d32171bd52..a7feed756592 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -60,12 +60,12 @@ static unsigned char f2fs_type_by_mode[S_IFMT >> S_SHIFT] = { [S_IFLNK >> S_SHIFT] = F2FS_FT_SYMLINK, }; -void set_de_type(struct f2fs_dir_entry *de, umode_t mode) +static void set_de_type(struct f2fs_dir_entry *de, umode_t mode) { de->file_type = f2fs_type_by_mode[(mode & S_IFMT) >> S_SHIFT]; } -unsigned char get_de_type(struct f2fs_dir_entry *de) +unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de) { if (de->file_type < F2FS_FT_MAX) return f2fs_filetype_table[de->file_type]; @@ -97,14 +97,14 @@ static struct f2fs_dir_entry *find_in_block(struct page *dentry_page, dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page); make_dentry_ptr_block(NULL, &d, dentry_blk); - de = find_target_dentry(fname, namehash, max_slots, &d); + de = f2fs_find_target_dentry(fname, namehash, max_slots, &d); if (de) *res_page = dentry_page; return de; } -struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *fname, +struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname, f2fs_hash_t namehash, int *max_slots, struct f2fs_dentry_ptr *d) { @@ -171,7 +171,7 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir, for (; bidx < end_block; bidx++) { /* no need to allocate new dentry pages to all the indices */ - dentry_page = find_data_page(dir, bidx); + dentry_page = f2fs_find_data_page(dir, bidx); if (IS_ERR(dentry_page)) { if (PTR_ERR(dentry_page) == -ENOENT) { room = true; @@ -210,7 +210,7 @@ struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir, if (f2fs_has_inline_dentry(dir)) { *res_page = NULL; - de = find_in_inline_dir(dir, fname, res_page); + de = f2fs_find_in_inline_dir(dir, fname, res_page); goto out; } @@ -319,7 +319,7 @@ static void init_dent_inode(const struct qstr *name, struct page *ipage) set_page_dirty(ipage); } -void do_make_empty_dir(struct inode *inode, struct inode *parent, +void f2fs_do_make_empty_dir(struct inode *inode, struct inode *parent, struct f2fs_dentry_ptr *d) { struct qstr dot = QSTR_INIT(".", 1); @@ -340,23 +340,23 @@ static int make_empty_dir(struct inode *inode, struct f2fs_dentry_ptr d; if (f2fs_has_inline_dentry(inode)) - return make_empty_inline_dir(inode, parent, page); + return f2fs_make_empty_inline_dir(inode, parent, page); - dentry_page = get_new_data_page(inode, page, 0, true); + dentry_page = f2fs_get_new_data_page(inode, page, 0, true); if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); dentry_blk = page_address(dentry_page); make_dentry_ptr_block(NULL, &d, dentry_blk); - do_make_empty_dir(inode, parent, &d); + f2fs_do_make_empty_dir(inode, parent, &d); set_page_dirty(dentry_page); f2fs_put_page(dentry_page, 1); return 0; } -struct page *init_inode_metadata(struct inode *inode, struct inode *dir, +struct page *f2fs_init_inode_metadata(struct inode *inode, struct inode *dir, const struct qstr *new_name, const struct qstr *orig_name, struct page *dpage) { @@ -365,7 +365,7 @@ struct page *init_inode_metadata(struct inode *inode, struct inode *dir, int err; if (is_inode_flag_set(inode, FI_NEW_INODE)) { - page = new_inode_page(inode); + page = f2fs_new_inode_page(inode); if (IS_ERR(page)) return page; @@ -395,7 +395,7 @@ struct page *init_inode_metadata(struct inode *inode, struct inode *dir, goto put_error; } } else { - page = get_node_page(F2FS_I_SB(dir), inode->i_ino); + page = f2fs_get_node_page(F2FS_I_SB(dir), inode->i_ino); if (IS_ERR(page)) return page; } @@ -418,19 +418,19 @@ struct page *init_inode_metadata(struct inode *inode, struct inode *dir, * we should remove this inode from orphan list. */ if (inode->i_nlink == 0) - remove_orphan_inode(F2FS_I_SB(dir), inode->i_ino); + f2fs_remove_orphan_inode(F2FS_I_SB(dir), inode->i_ino); f2fs_i_links_write(inode, true); } return page; put_error: clear_nlink(inode); - update_inode(inode, page); + f2fs_update_inode(inode, page); f2fs_put_page(page, 1); return ERR_PTR(err); } -void update_parent_metadata(struct inode *dir, struct inode *inode, +void f2fs_update_parent_metadata(struct inode *dir, struct inode *inode, unsigned int current_depth) { if (inode && is_inode_flag_set(inode, FI_NEW_INODE)) { @@ -448,7 +448,7 @@ void update_parent_metadata(struct inode *dir, struct inode *inode, clear_inode_flag(inode, FI_INC_LINK); } -int room_for_filename(const void *bitmap, int slots, int max_slots) +int f2fs_room_for_filename(const void *bitmap, int slots, int max_slots) { int bit_start = 0; int zero_start, zero_end; @@ -537,12 +537,12 @@ start: (le32_to_cpu(dentry_hash) % nbucket)); for (block = bidx; block <= (bidx + nblock - 1); block++) { - dentry_page = get_new_data_page(dir, NULL, block, true); + dentry_page = f2fs_get_new_data_page(dir, NULL, block, true); if (IS_ERR(dentry_page)) return PTR_ERR(dentry_page); dentry_blk = page_address(dentry_page); - bit_pos = room_for_filename(&dentry_blk->dentry_bitmap, + bit_pos = f2fs_room_for_filename(&dentry_blk->dentry_bitmap, slots, NR_DENTRY_IN_BLOCK); if (bit_pos < NR_DENTRY_IN_BLOCK) goto add_dentry; @@ -558,7 +558,7 @@ add_dentry: if (inode) { down_write(&F2FS_I(inode)->i_sem); - page = init_inode_metadata(inode, dir, new_name, + page = f2fs_init_inode_metadata(inode, dir, new_name, orig_name, NULL); if (IS_ERR(page)) { err = PTR_ERR(page); @@ -576,7 +576,7 @@ add_dentry: f2fs_put_page(page, 1); } - update_parent_metadata(dir, inode, current_depth); + f2fs_update_parent_metadata(dir, inode, current_depth); fail: if (inode) up_write(&F2FS_I(inode)->i_sem); @@ -586,7 +586,7 @@ fail: return err; } -int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname, +int f2fs_add_dentry(struct inode *dir, struct fscrypt_name *fname, struct inode *inode, nid_t ino, umode_t mode) { struct qstr new_name; @@ -610,7 +610,7 @@ int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname, * Caller should grab and release a rwsem by calling f2fs_lock_op() and * f2fs_unlock_op(). */ -int __f2fs_add_link(struct inode *dir, const struct qstr *name, +int f2fs_do_add_link(struct inode *dir, const struct qstr *name, struct inode *inode, nid_t ino, umode_t mode) { struct fscrypt_name fname; @@ -639,7 +639,7 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, } else if (IS_ERR(page)) { err = PTR_ERR(page); } else { - err = __f2fs_do_add_link(dir, &fname, inode, ino, mode); + err = f2fs_add_dentry(dir, &fname, inode, ino, mode); } fscrypt_free_filename(&fname); return err; @@ -651,7 +651,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir) int err = 0; down_write(&F2FS_I(inode)->i_sem); - page = init_inode_metadata(inode, dir, NULL, NULL, NULL); + page = f2fs_init_inode_metadata(inode, dir, NULL, NULL, NULL); if (IS_ERR(page)) { err = PTR_ERR(page); goto fail; @@ -683,9 +683,9 @@ void f2fs_drop_nlink(struct inode *dir, struct inode *inode) up_write(&F2FS_I(inode)->i_sem); if (inode->i_nlink == 0) - add_orphan_inode(inode); + f2fs_add_orphan_inode(inode); else - release_orphan_inode(sbi); + f2fs_release_orphan_inode(sbi); } /* @@ -698,14 +698,12 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, struct f2fs_dentry_block *dentry_blk; unsigned int bit_pos; int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len)); - struct address_space *mapping = page_mapping(page); - unsigned long flags; int i; f2fs_update_time(F2FS_I_SB(dir), REQ_TIME); if (F2FS_OPTION(F2FS_I_SB(dir)).fsync_mode == FSYNC_MODE_STRICT) - add_ino_entry(F2FS_I_SB(dir), dir->i_ino, TRANS_DIR_INO); + f2fs_add_ino_entry(F2FS_I_SB(dir), dir->i_ino, TRANS_DIR_INO); if (f2fs_has_inline_dentry(dir)) return f2fs_delete_inline_entry(dentry, page, dir, inode); @@ -731,17 +729,13 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, f2fs_drop_nlink(dir, inode); if (bit_pos == NR_DENTRY_IN_BLOCK && - !truncate_hole(dir, page->index, page->index + 1)) { - spin_lock_irqsave(&mapping->tree_lock, flags); - radix_tree_tag_clear(&mapping->page_tree, page_index(page), - PAGECACHE_TAG_DIRTY); - spin_unlock_irqrestore(&mapping->tree_lock, flags); - + !f2fs_truncate_hole(dir, page->index, page->index + 1)) { + f2fs_clear_radix_tree_dirty_tag(page); clear_page_dirty_for_io(page); ClearPagePrivate(page); ClearPageUptodate(page); inode_dec_dirty_pages(dir); - remove_dirty_inode(dir); + f2fs_remove_dirty_inode(dir); } f2fs_put_page(page, 1); } @@ -758,7 +752,7 @@ bool f2fs_empty_dir(struct inode *dir) return f2fs_empty_inline_dir(dir); for (bidx = 0; bidx < nblock; bidx++) { - dentry_page = get_lock_data_page(dir, bidx, false); + dentry_page = f2fs_get_lock_data_page(dir, bidx, false); if (IS_ERR(dentry_page)) { if (PTR_ERR(dentry_page) == -ENOENT) continue; @@ -806,7 +800,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, continue; } - d_type = get_de_type(de); + d_type = f2fs_get_de_type(de); de_name.name = d->filename[bit_pos]; de_name.len = le16_to_cpu(de->name_len); @@ -830,7 +824,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, return 1; if (sbi->readdir_ra == 1) - ra_node_page(sbi, le32_to_cpu(de->ino)); + f2fs_ra_node_page(sbi, le32_to_cpu(de->ino)); bit_pos += GET_DENTRY_SLOTS(le16_to_cpu(de->name_len)); ctx->pos = start_pos + bit_pos; @@ -880,7 +874,7 @@ static int f2fs_readdir(struct file *file, struct dir_context *ctx) page_cache_sync_readahead(inode->i_mapping, ra, file, n, min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES)); - dentry_page = get_lock_data_page(inode, n, false); + dentry_page = f2fs_get_lock_data_page(inode, n, false); if (IS_ERR(dentry_page)) { err = PTR_ERR(dentry_page); if (err == -ENOENT) { diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c index d5a861bf2b42..231b77ef5a53 100644 --- a/fs/f2fs/extent_cache.c +++ b/fs/f2fs/extent_cache.c @@ -49,7 +49,7 @@ static struct rb_entry *__lookup_rb_tree_slow(struct rb_root *root, return NULL; } -struct rb_entry *__lookup_rb_tree(struct rb_root *root, +struct rb_entry *f2fs_lookup_rb_tree(struct rb_root *root, struct rb_entry *cached_re, unsigned int ofs) { struct rb_entry *re; @@ -61,7 +61,7 @@ struct rb_entry *__lookup_rb_tree(struct rb_root *root, return re; } -struct rb_node **__lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, +struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, struct rb_root *root, struct rb_node **parent, unsigned int ofs) { @@ -92,7 +92,7 @@ struct rb_node **__lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, * in order to simpfy the insertion after. * tree must stay unchanged between lookup and insertion. */ -struct rb_entry *__lookup_rb_tree_ret(struct rb_root *root, +struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root *root, struct rb_entry *cached_re, unsigned int ofs, struct rb_entry **prev_entry, @@ -159,7 +159,7 @@ lookup_neighbors: return re; } -bool __check_rb_tree_consistence(struct f2fs_sb_info *sbi, +bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi, struct rb_root *root) { #ifdef CONFIG_F2FS_CHECK_FS @@ -390,7 +390,7 @@ static bool f2fs_lookup_extent_tree(struct inode *inode, pgoff_t pgofs, goto out; } - en = (struct extent_node *)__lookup_rb_tree(&et->root, + en = (struct extent_node *)f2fs_lookup_rb_tree(&et->root, (struct rb_entry *)et->cached_en, pgofs); if (!en) goto out; @@ -470,7 +470,7 @@ static struct extent_node *__insert_extent_tree(struct inode *inode, goto do_insert; } - p = __lookup_rb_tree_for_insert(sbi, &et->root, &parent, ei->fofs); + p = f2fs_lookup_rb_tree_for_insert(sbi, &et->root, &parent, ei->fofs); do_insert: en = __attach_extent_node(sbi, et, ei, parent, p); if (!en) @@ -520,7 +520,7 @@ static void f2fs_update_extent_tree_range(struct inode *inode, __drop_largest_extent(inode, fofs, len); /* 1. lookup first extent node in range [fofs, fofs + len - 1] */ - en = (struct extent_node *)__lookup_rb_tree_ret(&et->root, + en = (struct extent_node *)f2fs_lookup_rb_tree_ret(&et->root, (struct rb_entry *)et->cached_en, fofs, (struct rb_entry **)&prev_en, (struct rb_entry **)&next_en, @@ -773,7 +773,7 @@ void f2fs_update_extent_cache(struct dnode_of_data *dn) else blkaddr = dn->data_blkaddr; - fofs = start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + + fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + dn->ofs_in_node; f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, 1); } @@ -788,7 +788,7 @@ void f2fs_update_extent_cache_range(struct dnode_of_data *dn, f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, len); } -void init_extent_cache_info(struct f2fs_sb_info *sbi) +void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi) { INIT_RADIX_TREE(&sbi->extent_tree_root, GFP_NOIO); mutex_init(&sbi->extent_tree_lock); @@ -800,7 +800,7 @@ void init_extent_cache_info(struct f2fs_sb_info *sbi) atomic_set(&sbi->total_ext_node, 0); } -int __init create_extent_cache(void) +int __init f2fs_create_extent_cache(void) { extent_tree_slab = f2fs_kmem_cache_create("f2fs_extent_tree", sizeof(struct extent_tree)); @@ -815,7 +815,7 @@ int __init create_extent_cache(void) return 0; } -void destroy_extent_cache(void) +void f2fs_destroy_extent_cache(void) { kmem_cache_destroy(extent_node_slab); kmem_cache_destroy(extent_tree_slab); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 97c17b3d984c..8f8bb31e9814 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -26,6 +26,7 @@ #include <linux/quotaops.h> #include <crypto/hash.h> #include <linux/writeback.h> +#include <linux/overflow.h> #define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_F2FS_FS_ENCRYPTION) #include <linux/fscrypt.h> @@ -242,6 +243,7 @@ enum { #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */ #define DEF_MAX_DISCARD_LEN 512 /* Max. 2MB per discard */ #define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */ +#define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device busy */ #define DEF_MAX_DISCARD_ISSUE_TIME 60000 /* 60 s, if no candidates */ #define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over 80% */ #define DEF_CP_INTERVAL 60 /* 60 secs */ @@ -344,6 +346,7 @@ enum { struct discard_policy { int type; /* type of discard */ unsigned int min_interval; /* used for candidates exist */ + unsigned int mid_interval; /* used for device busy */ unsigned int max_interval; /* used for candidates not exist */ unsigned int max_requests; /* # of discards issued per round */ unsigned int io_aware_gran; /* minimum granularity discard not be aware of I/O */ @@ -676,15 +679,20 @@ enum { #define DEF_DIR_LEVEL 0 +enum { + GC_FAILURE_PIN, + GC_FAILURE_ATOMIC, + MAX_GC_FAILURE +}; + struct f2fs_inode_info { struct inode vfs_inode; /* serve a vfs inode */ unsigned long i_flags; /* keep an inode flags for ioctl */ unsigned char i_advise; /* use to give file attribute hints */ unsigned char i_dir_level; /* use for dentry level for large dir */ - union { - unsigned int i_current_depth; /* only for directory depth */ - unsigned short i_gc_failures; /* only for regular file */ - }; + unsigned int i_current_depth; /* only for directory depth */ + /* for gc failure statistic */ + unsigned int i_gc_failures[MAX_GC_FAILURE]; unsigned int i_pino; /* parent inode number */ umode_t i_acl_mode; /* keep file acl mode temporarily */ @@ -712,7 +720,9 @@ struct f2fs_inode_info { struct task_struct *inmem_task; /* store inmemory task */ struct mutex inmem_lock; /* lock for inmemory pages */ struct extent_tree *extent_tree; /* cached extent_tree entry */ - struct rw_semaphore dio_rwsem[2];/* avoid racing between dio and gc */ + + /* avoid racing between foreground op and gc */ + struct rw_semaphore i_gc_rwsem[2]; struct rw_semaphore i_mmap_sem; struct rw_semaphore i_xattr_sem; /* avoid racing between reading and changing EAs */ @@ -1062,6 +1072,7 @@ struct f2fs_io_info { int need_lock; /* indicate we need to lock cp_rwsem */ bool in_list; /* indicate fio is in io_list */ bool is_meta; /* indicate borrow meta inode mapping or not */ + bool retry; /* need to reallocate block address */ enum iostat_type io_type; /* io type */ struct writeback_control *io_wbc; /* writeback control */ }; @@ -1124,6 +1135,13 @@ enum { }; enum { + GC_NORMAL, + GC_IDLE_CB, + GC_IDLE_GREEDY, + GC_URGENT, +}; + +enum { WHINT_MODE_OFF, /* not pass down write hints */ WHINT_MODE_USER, /* try to pass down hints given by users */ WHINT_MODE_FS, /* pass down hints with F2FS policy */ @@ -1171,6 +1189,8 @@ struct f2fs_sb_info { struct f2fs_bio_info *write_io[NR_PAGE_TYPE]; /* for write bios */ struct mutex wio_mutex[NR_PAGE_TYPE - 1][NR_TEMP_TYPE]; /* bio ordering for NODE/DATA */ + /* keep migration IO order for LFS mode */ + struct rw_semaphore io_order_lock; mempool_t *write_io_dummy; /* Dummy pages */ /* for checkpoint */ @@ -1241,7 +1261,7 @@ struct f2fs_sb_info { struct percpu_counter alloc_valid_block_count; /* writeback control */ - atomic_t wb_sync_req; /* count # of WB_SYNC threads */ + atomic_t wb_sync_req[META]; /* count # of WB_SYNC threads */ /* valid inode count */ struct percpu_counter total_valid_inode_count; @@ -1252,9 +1272,9 @@ struct f2fs_sb_info { struct mutex gc_mutex; /* mutex for GC */ struct f2fs_gc_kthread *gc_thread; /* GC thread */ unsigned int cur_victim_sec; /* current victim section num */ - - /* threshold for converting bg victims for fg */ - u64 fggc_threshold; + unsigned int gc_mode; /* current GC state */ + /* for skip statistic */ + unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */ /* threshold for gc trials on pinned files */ u64 gc_pin_file_threshold; @@ -1645,18 +1665,6 @@ static inline bool __exist_node_summaries(struct f2fs_sb_info *sbi) } /* - * Check whether the given nid is within node id range. - */ -static inline int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid) -{ - if (unlikely(nid < F2FS_ROOT_INO(sbi))) - return -EINVAL; - if (unlikely(nid >= NM_I(sbi)->max_nid)) - return -EINVAL; - return 0; -} - -/* * Check whether the inode has blocks or not */ static inline int F2FS_HAS_BLOCKS(struct inode *inode) @@ -2214,9 +2222,60 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) *addr ^= mask; } -#define F2FS_REG_FLMASK (~(FS_DIRSYNC_FL | FS_TOPDIR_FL)) -#define F2FS_OTHER_FLMASK (FS_NODUMP_FL | FS_NOATIME_FL) -#define F2FS_FL_INHERITED (FS_PROJINHERIT_FL) +/* + * Inode flags + */ +#define F2FS_SECRM_FL 0x00000001 /* Secure deletion */ +#define F2FS_UNRM_FL 0x00000002 /* Undelete */ +#define F2FS_COMPR_FL 0x00000004 /* Compress file */ +#define F2FS_SYNC_FL 0x00000008 /* Synchronous updates */ +#define F2FS_IMMUTABLE_FL 0x00000010 /* Immutable file */ +#define F2FS_APPEND_FL 0x00000020 /* writes to file may only append */ +#define F2FS_NODUMP_FL 0x00000040 /* do not dump file */ +#define F2FS_NOATIME_FL 0x00000080 /* do not update atime */ +/* Reserved for compression usage... */ +#define F2FS_DIRTY_FL 0x00000100 +#define F2FS_COMPRBLK_FL 0x00000200 /* One or more compressed clusters */ +#define F2FS_NOCOMPR_FL 0x00000400 /* Don't compress */ +#define F2FS_ENCRYPT_FL 0x00000800 /* encrypted file */ +/* End compression flags --- maybe not all used */ +#define F2FS_INDEX_FL 0x00001000 /* hash-indexed directory */ +#define F2FS_IMAGIC_FL 0x00002000 /* AFS directory */ +#define F2FS_JOURNAL_DATA_FL 0x00004000 /* file data should be journaled */ +#define F2FS_NOTAIL_FL 0x00008000 /* file tail should not be merged */ +#define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */ +#define F2FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ +#define F2FS_HUGE_FILE_FL 0x00040000 /* Set to each huge file */ +#define F2FS_EXTENTS_FL 0x00080000 /* Inode uses extents */ +#define F2FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */ +#define F2FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */ +#define F2FS_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */ +#define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ +#define F2FS_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ + +#define F2FS_FL_USER_VISIBLE 0x304BDFFF /* User visible flags */ +#define F2FS_FL_USER_MODIFIABLE 0x204BC0FF /* User modifiable flags */ + +/* Flags we can manipulate with through F2FS_IOC_FSSETXATTR */ +#define F2FS_FL_XFLAG_VISIBLE (F2FS_SYNC_FL | \ + F2FS_IMMUTABLE_FL | \ + F2FS_APPEND_FL | \ + F2FS_NODUMP_FL | \ + F2FS_NOATIME_FL | \ + F2FS_PROJINHERIT_FL) + +/* Flags that should be inherited by new inodes from their parent. */ +#define F2FS_FL_INHERITED (F2FS_SECRM_FL | F2FS_UNRM_FL | F2FS_COMPR_FL |\ + F2FS_SYNC_FL | F2FS_NODUMP_FL | F2FS_NOATIME_FL |\ + F2FS_NOCOMPR_FL | F2FS_JOURNAL_DATA_FL |\ + F2FS_NOTAIL_FL | F2FS_DIRSYNC_FL |\ + F2FS_PROJINHERIT_FL) + +/* Flags that are appropriate for regular files (all but dir-specific ones). */ +#define F2FS_REG_FLMASK (~(F2FS_DIRSYNC_FL | F2FS_TOPDIR_FL)) + +/* Flags that are appropriate for non-directories/regular files. */ +#define F2FS_OTHER_FLMASK (F2FS_NODUMP_FL | F2FS_NOATIME_FL) static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags) { @@ -2259,6 +2318,7 @@ enum { FI_EXTRA_ATTR, /* indicate file has extra attribute */ FI_PROJ_INHERIT, /* indicate file inherits projectid */ FI_PIN_FILE, /* indicate file should not be gced */ + FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */ }; static inline void __mark_inode_dirty_flag(struct inode *inode, @@ -2357,7 +2417,7 @@ static inline void f2fs_i_depth_write(struct inode *inode, unsigned int depth) static inline void f2fs_i_gc_failures_write(struct inode *inode, unsigned int count) { - F2FS_I(inode)->i_gc_failures = count; + F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] = count; f2fs_mark_inode_dirty_sync(inode, true); } @@ -2646,7 +2706,7 @@ static inline int get_inline_xattr_addrs(struct inode *inode) return F2FS_I(inode)->i_inline_xattr_size; } -#define get_inode_mode(i) \ +#define f2fs_get_inode_mode(i) \ ((is_inode_flag_set(i, FI_ACL_MODE)) ? \ (F2FS_I(i)->i_acl_mode) : ((i)->i_mode)) @@ -2685,18 +2745,25 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi, spin_unlock(&sbi->iostat_lock); } +static inline bool is_valid_blkaddr(block_t blkaddr) +{ + if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) + return false; + return true; +} + /* * file.c */ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync); -void truncate_data_blocks(struct dnode_of_data *dn); -int truncate_blocks(struct inode *inode, u64 from, bool lock); +void f2fs_truncate_data_blocks(struct dnode_of_data *dn); +int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock); int f2fs_truncate(struct inode *inode); int f2fs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat); int f2fs_setattr(struct dentry *dentry, struct iattr *attr); -int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end); -void truncate_data_blocks_range(struct dnode_of_data *dn, int count); +int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end); +void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count); int f2fs_precache_extents(struct inode *inode); long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg); @@ -2710,38 +2777,37 @@ bool f2fs_inode_chksum_verify(struct f2fs_sb_info *sbi, struct page *page); void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page); struct inode *f2fs_iget(struct super_block *sb, unsigned long ino); struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino); -int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink); -void update_inode(struct inode *inode, struct page *node_page); -void update_inode_page(struct inode *inode); +int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink); +void f2fs_update_inode(struct inode *inode, struct page *node_page); +void f2fs_update_inode_page(struct inode *inode); int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc); void f2fs_evict_inode(struct inode *inode); -void handle_failed_inode(struct inode *inode); +void f2fs_handle_failed_inode(struct inode *inode); /* * namei.c */ -int update_extension_list(struct f2fs_sb_info *sbi, const char *name, +int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name, bool hot, bool set); struct dentry *f2fs_get_parent(struct dentry *child); /* * dir.c */ -void set_de_type(struct f2fs_dir_entry *de, umode_t mode); -unsigned char get_de_type(struct f2fs_dir_entry *de); -struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *fname, +unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de); +struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname, f2fs_hash_t namehash, int *max_slots, struct f2fs_dentry_ptr *d); int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d, unsigned int start_pos, struct fscrypt_str *fstr); -void do_make_empty_dir(struct inode *inode, struct inode *parent, +void f2fs_do_make_empty_dir(struct inode *inode, struct inode *parent, struct f2fs_dentry_ptr *d); -struct page *init_inode_metadata(struct inode *inode, struct inode *dir, +struct page *f2fs_init_inode_metadata(struct inode *inode, struct inode *dir, const struct qstr *new_name, const struct qstr *orig_name, struct page *dpage); -void update_parent_metadata(struct inode *dir, struct inode *inode, +void f2fs_update_parent_metadata(struct inode *dir, struct inode *inode, unsigned int current_depth); -int room_for_filename(const void *bitmap, int slots, int max_slots); +int f2fs_room_for_filename(const void *bitmap, int slots, int max_slots); void f2fs_drop_nlink(struct inode *dir, struct inode *inode); struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir, struct fscrypt_name *fname, struct page **res_page); @@ -2758,9 +2824,9 @@ void f2fs_update_dentry(nid_t ino, umode_t mode, struct f2fs_dentry_ptr *d, int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name, const struct qstr *orig_name, struct inode *inode, nid_t ino, umode_t mode); -int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname, +int f2fs_add_dentry(struct inode *dir, struct fscrypt_name *fname, struct inode *inode, nid_t ino, umode_t mode); -int __f2fs_add_link(struct inode *dir, const struct qstr *name, +int f2fs_do_add_link(struct inode *dir, const struct qstr *name, struct inode *inode, nid_t ino, umode_t mode); void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page, struct inode *dir, struct inode *inode); @@ -2769,7 +2835,7 @@ bool f2fs_empty_dir(struct inode *dir); static inline int f2fs_add_link(struct dentry *dentry, struct inode *inode) { - return __f2fs_add_link(d_inode(dentry->d_parent), &dentry->d_name, + return f2fs_do_add_link(d_inode(dentry->d_parent), &dentry->d_name, inode, inode->i_ino, inode->i_mode); } @@ -2784,7 +2850,7 @@ int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover); int f2fs_sync_fs(struct super_block *sb, int sync); extern __printf(3, 4) void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...); -int sanity_check_ckpt(struct f2fs_sb_info *sbi); +int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi); /* * hash.c @@ -2798,138 +2864,146 @@ f2fs_hash_t f2fs_dentry_hash(const struct qstr *name_info, struct dnode_of_data; struct node_info; -bool available_free_memory(struct f2fs_sb_info *sbi, int type); -int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid); -bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid); -bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino); -void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni); -pgoff_t get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs); -int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode); -int truncate_inode_blocks(struct inode *inode, pgoff_t from); -int truncate_xattr_node(struct inode *inode); -int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino); -int remove_inode_page(struct inode *inode); -struct page *new_inode_page(struct inode *inode); -struct page *new_node_page(struct dnode_of_data *dn, unsigned int ofs); -void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); -struct page *get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); -struct page *get_node_page_ra(struct page *parent, int start); -void move_node_page(struct page *node_page, int gc_type); -int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, +int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid); +bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type); +int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid); +bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid); +bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino); +void f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid, + struct node_info *ni); +pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs); +int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode); +int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from); +int f2fs_truncate_xattr_node(struct inode *inode); +int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino); +int f2fs_remove_inode_page(struct inode *inode); +struct page *f2fs_new_inode_page(struct inode *inode); +struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs); +void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid); +struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid); +struct page *f2fs_get_node_page_ra(struct page *parent, int start); +void f2fs_move_node_page(struct page *node_page, int gc_type); +int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, struct writeback_control *wbc, bool atomic); -int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc, +int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, + struct writeback_control *wbc, bool do_balance, enum iostat_type io_type); -void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); -bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); -void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); -void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid); -int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink); -void recover_inline_xattr(struct inode *inode, struct page *page); -int recover_xattr_data(struct inode *inode, struct page *page); -int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page); -void restore_node_summary(struct f2fs_sb_info *sbi, +void f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); +bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); +void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); +void f2fs_alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid); +int f2fs_try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink); +void f2fs_recover_inline_xattr(struct inode *inode, struct page *page); +int f2fs_recover_xattr_data(struct inode *inode, struct page *page); +int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page); +void f2fs_restore_node_summary(struct f2fs_sb_info *sbi, unsigned int segno, struct f2fs_summary_block *sum); -void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); -int build_node_manager(struct f2fs_sb_info *sbi); -void destroy_node_manager(struct f2fs_sb_info *sbi); -int __init create_node_manager_caches(void); -void destroy_node_manager_caches(void); +void f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); +int f2fs_build_node_manager(struct f2fs_sb_info *sbi); +void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi); +int __init f2fs_create_node_manager_caches(void); +void f2fs_destroy_node_manager_caches(void); /* * segment.c */ -bool need_SSR(struct f2fs_sb_info *sbi); -void register_inmem_page(struct inode *inode, struct page *page); -void drop_inmem_pages_all(struct f2fs_sb_info *sbi); -void drop_inmem_pages(struct inode *inode); -void drop_inmem_page(struct inode *inode, struct page *page); -int commit_inmem_pages(struct inode *inode); +bool f2fs_need_SSR(struct f2fs_sb_info *sbi); +void f2fs_register_inmem_page(struct inode *inode, struct page *page); +void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure); +void f2fs_drop_inmem_pages(struct inode *inode); +void f2fs_drop_inmem_page(struct inode *inode, struct page *page); +int f2fs_commit_inmem_pages(struct inode *inode); void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need); void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi); int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino); -int create_flush_cmd_control(struct f2fs_sb_info *sbi); +int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi); int f2fs_flush_device_cache(struct f2fs_sb_info *sbi); -void destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free); -void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr); -bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr); -void drop_discard_cmd(struct f2fs_sb_info *sbi); -void stop_discard_thread(struct f2fs_sb_info *sbi); +void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free); +void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr); +bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr); +void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi); +void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi); bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi); -void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc); -void release_discard_addrs(struct f2fs_sb_info *sbi); -int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); -void allocate_new_segments(struct f2fs_sb_info *sbi); +void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, + struct cp_control *cpc); +void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi); +int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); +void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi); int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range); -bool exist_trim_candidates(struct f2fs_sb_info *sbi, struct cp_control *cpc); -struct page *get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno); -void update_meta_page(struct f2fs_sb_info *sbi, void *src, block_t blk_addr); -void write_meta_page(struct f2fs_sb_info *sbi, struct page *page, +bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi, + struct cp_control *cpc); +struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno); +void f2fs_update_meta_page(struct f2fs_sb_info *sbi, void *src, + block_t blk_addr); +void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page, enum iostat_type io_type); -void write_node_page(unsigned int nid, struct f2fs_io_info *fio); -void write_data_page(struct dnode_of_data *dn, struct f2fs_io_info *fio); -int rewrite_data_page(struct f2fs_io_info *fio); -void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, +void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio); +void f2fs_outplace_write_data(struct dnode_of_data *dn, + struct f2fs_io_info *fio); +int f2fs_inplace_write_data(struct f2fs_io_info *fio); +void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, block_t old_blkaddr, block_t new_blkaddr, bool recover_curseg, bool recover_newaddr); void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn, block_t old_addr, block_t new_addr, unsigned char version, bool recover_curseg, bool recover_newaddr); -void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, +void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, block_t old_blkaddr, block_t *new_blkaddr, struct f2fs_summary *sum, int type, struct f2fs_io_info *fio, bool add_list); void f2fs_wait_on_page_writeback(struct page *page, enum page_type type, bool ordered); void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr); -void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk); -void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk); -int lookup_journal_in_cursum(struct f2fs_journal *journal, int type, +void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk); +void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk); +int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, unsigned int val, int alloc); -void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); -int build_segment_manager(struct f2fs_sb_info *sbi); -void destroy_segment_manager(struct f2fs_sb_info *sbi); -int __init create_segment_manager_caches(void); -void destroy_segment_manager_caches(void); -int rw_hint_to_seg_type(enum rw_hint hint); -enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi, enum page_type type, - enum temp_type temp); +void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); +int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); +void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); +int __init f2fs_create_segment_manager_caches(void); +void f2fs_destroy_segment_manager_caches(void); +int f2fs_rw_hint_to_seg_type(enum rw_hint hint); +enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi, + enum page_type type, enum temp_type temp); /* * checkpoint.c */ void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io); -struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); -struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); -struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index); -bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type); -int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, +struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); +struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); +struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index); +bool f2fs_is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type); +int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type, bool sync); -void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index); -long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, +void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index); +long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, long nr_to_write, enum iostat_type io_type); -void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); -void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); -void release_ino_entry(struct f2fs_sb_info *sbi, bool all); -bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode); -void set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, +void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); +void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type); +void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all); +bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode); +void f2fs_set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, unsigned int devidx, int type); -bool is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, +bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino, unsigned int devidx, int type); int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi); -int acquire_orphan_inode(struct f2fs_sb_info *sbi); -void release_orphan_inode(struct f2fs_sb_info *sbi); -void add_orphan_inode(struct inode *inode); -void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino); -int recover_orphan_inodes(struct f2fs_sb_info *sbi); -int get_valid_checkpoint(struct f2fs_sb_info *sbi); -void update_dirty_page(struct inode *inode, struct page *page); -void remove_dirty_inode(struct inode *inode); -int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type); -int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc); -void init_ino_entry_info(struct f2fs_sb_info *sbi); -int __init create_checkpoint_caches(void); -void destroy_checkpoint_caches(void); +int f2fs_acquire_orphan_inode(struct f2fs_sb_info *sbi); +void f2fs_release_orphan_inode(struct f2fs_sb_info *sbi); +void f2fs_add_orphan_inode(struct inode *inode); +void f2fs_remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino); +int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi); +int f2fs_get_valid_checkpoint(struct f2fs_sb_info *sbi); +void f2fs_update_dirty_page(struct inode *inode, struct page *page); +void f2fs_remove_dirty_inode(struct inode *inode); +int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type); +int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc); +void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi); +int __init f2fs_create_checkpoint_caches(void); +void f2fs_destroy_checkpoint_caches(void); /* * data.c @@ -2942,34 +3016,31 @@ void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi, enum page_type type); void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi); int f2fs_submit_page_bio(struct f2fs_io_info *fio); -int f2fs_submit_page_write(struct f2fs_io_info *fio); +void f2fs_submit_page_write(struct f2fs_io_info *fio); struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi, block_t blk_addr, struct bio *bio); int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr); -void set_data_blkaddr(struct dnode_of_data *dn); +void f2fs_set_data_blkaddr(struct dnode_of_data *dn); void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr); -int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count); -int reserve_new_block(struct dnode_of_data *dn); +int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count); +int f2fs_reserve_new_block(struct dnode_of_data *dn); int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index); int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from); int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index); -struct page *get_read_data_page(struct inode *inode, pgoff_t index, +struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index, int op_flags, bool for_write); -struct page *find_data_page(struct inode *inode, pgoff_t index); -struct page *get_lock_data_page(struct inode *inode, pgoff_t index, +struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index); +struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index, bool for_write); -struct page *get_new_data_page(struct inode *inode, +struct page *f2fs_get_new_data_page(struct inode *inode, struct page *ipage, pgoff_t index, bool new_i_size); -int do_write_data_page(struct f2fs_io_info *fio); +int f2fs_do_write_data_page(struct f2fs_io_info *fio); int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int create, int flag); int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len); -bool should_update_inplace(struct inode *inode, struct f2fs_io_info *fio); -bool should_update_outplace(struct inode *inode, struct f2fs_io_info *fio); -int __f2fs_write_data_pages(struct address_space *mapping, - struct writeback_control *wbc, - enum iostat_type io_type); +bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio); +bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio); void f2fs_invalidate_page(struct page *page, unsigned int offset, unsigned int length); int f2fs_release_page(struct page *page, gfp_t wait); @@ -2978,22 +3049,23 @@ int f2fs_migrate_page(struct address_space *mapping, struct page *newpage, struct page *page, enum migrate_mode mode); #endif bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len); +void f2fs_clear_radix_tree_dirty_tag(struct page *page); /* * gc.c */ -int start_gc_thread(struct f2fs_sb_info *sbi); -void stop_gc_thread(struct f2fs_sb_info *sbi); -block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode); +int f2fs_start_gc_thread(struct f2fs_sb_info *sbi); +void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi); +block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode); int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background, unsigned int segno); -void build_gc_manager(struct f2fs_sb_info *sbi); +void f2fs_build_gc_manager(struct f2fs_sb_info *sbi); /* * recovery.c */ -int recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only); -bool space_for_roll_forward(struct f2fs_sb_info *sbi); +int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only); +bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi); /* * debug.c @@ -3031,6 +3103,7 @@ struct f2fs_stat_info { int bg_node_segs, bg_data_segs; int tot_blks, data_blks, node_blks; int bg_data_blks, bg_node_blks; + unsigned long long skipped_atomic_files[2]; int curseg[NR_CURSEG_TYPE]; int cursec[NR_CURSEG_TYPE]; int curzone[NR_CURSEG_TYPE]; @@ -3197,29 +3270,31 @@ extern const struct inode_operations f2fs_dir_inode_operations; extern const struct inode_operations f2fs_symlink_inode_operations; extern const struct inode_operations f2fs_encrypted_symlink_inode_operations; extern const struct inode_operations f2fs_special_inode_operations; -extern struct kmem_cache *inode_entry_slab; +extern struct kmem_cache *f2fs_inode_entry_slab; /* * inline.c */ bool f2fs_may_inline_data(struct inode *inode); bool f2fs_may_inline_dentry(struct inode *inode); -void read_inline_data(struct page *page, struct page *ipage); -void truncate_inline_inode(struct inode *inode, struct page *ipage, u64 from); +void f2fs_do_read_inline_data(struct page *page, struct page *ipage); +void f2fs_truncate_inline_inode(struct inode *inode, + struct page *ipage, u64 from); int f2fs_read_inline_data(struct inode *inode, struct page *page); int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page); int f2fs_convert_inline_inode(struct inode *inode); int f2fs_write_inline_data(struct inode *inode, struct page *page); -bool recover_inline_data(struct inode *inode, struct page *npage); -struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, +bool f2fs_recover_inline_data(struct inode *inode, struct page *npage); +struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir, struct fscrypt_name *fname, struct page **res_page); -int make_empty_inline_dir(struct inode *inode, struct inode *parent, +int f2fs_make_empty_inline_dir(struct inode *inode, struct inode *parent, struct page *ipage); int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, const struct qstr *orig_name, struct inode *inode, nid_t ino, umode_t mode); -void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry, struct page *page, - struct inode *dir, struct inode *inode); +void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry, + struct page *page, struct inode *dir, + struct inode *inode); bool f2fs_empty_inline_dir(struct inode *dir); int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx, struct fscrypt_str *fstr); @@ -3240,17 +3315,17 @@ void f2fs_leave_shrinker(struct f2fs_sb_info *sbi); /* * extent_cache.c */ -struct rb_entry *__lookup_rb_tree(struct rb_root *root, +struct rb_entry *f2fs_lookup_rb_tree(struct rb_root *root, struct rb_entry *cached_re, unsigned int ofs); -struct rb_node **__lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, +struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi, struct rb_root *root, struct rb_node **parent, unsigned int ofs); -struct rb_entry *__lookup_rb_tree_ret(struct rb_root *root, +struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root *root, struct rb_entry *cached_re, unsigned int ofs, struct rb_entry **prev_entry, struct rb_entry **next_entry, struct rb_node ***insert_p, struct rb_node **insert_parent, bool force); -bool __check_rb_tree_consistence(struct f2fs_sb_info *sbi, +bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi, struct rb_root *root); unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink); bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext); @@ -3262,9 +3337,9 @@ bool f2fs_lookup_extent_cache(struct inode *inode, pgoff_t pgofs, void f2fs_update_extent_cache(struct dnode_of_data *dn); void f2fs_update_extent_cache_range(struct dnode_of_data *dn, pgoff_t fofs, block_t blkaddr, unsigned int len); -void init_extent_cache_info(struct f2fs_sb_info *sbi); -int __init create_extent_cache(void); -void destroy_extent_cache(void); +void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi); +int __init f2fs_create_extent_cache(void); +void f2fs_destroy_extent_cache(void); /* * sysfs.c diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 40d03d58b390..1ada29893092 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -98,7 +98,8 @@ static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma, /* page is wholly or partially inside EOF */ if (((loff_t)(page->index + 1) << PAGE_SHIFT) > i_size_read(inode)) { - unsigned offset; + loff_t offset; + offset = i_size_read(inode) & ~PAGE_MASK; zero_user_segment(page, offset, PAGE_SIZE); } @@ -160,17 +161,18 @@ static inline enum cp_reason_type need_do_checkpoint(struct inode *inode) cp_reason = CP_SB_NEED_CP; else if (file_wrong_pino(inode)) cp_reason = CP_WRONG_PINO; - else if (!space_for_roll_forward(sbi)) + else if (!f2fs_space_for_roll_forward(sbi)) cp_reason = CP_NO_SPC_ROLL; - else if (!is_checkpointed_node(sbi, F2FS_I(inode)->i_pino)) + else if (!f2fs_is_checkpointed_node(sbi, F2FS_I(inode)->i_pino)) cp_reason = CP_NODE_NEED_CP; else if (test_opt(sbi, FASTBOOT)) cp_reason = CP_FASTBOOT_MODE; else if (F2FS_OPTION(sbi).active_logs == 2) cp_reason = CP_SPEC_LOG_NUM; else if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT && - need_dentry_mark(sbi, inode->i_ino) && - exist_written_data(sbi, F2FS_I(inode)->i_pino, TRANS_DIR_INO)) + f2fs_need_dentry_mark(sbi, inode->i_ino) && + f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino, + TRANS_DIR_INO)) cp_reason = CP_RECOVER_DIR; return cp_reason; @@ -181,7 +183,7 @@ static bool need_inode_page_update(struct f2fs_sb_info *sbi, nid_t ino) struct page *i = find_get_page(NODE_MAPPING(sbi), ino); bool ret = false; /* But we need to avoid that there are some inode updates */ - if ((i && PageDirty(i)) || need_inode_block_update(sbi, ino)) + if ((i && PageDirty(i)) || f2fs_need_inode_block_update(sbi, ino)) ret = true; f2fs_put_page(i, 0); return ret; @@ -241,14 +243,14 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, * if there is no written data, don't waste time to write recovery info. */ if (!is_inode_flag_set(inode, FI_APPEND_WRITE) && - !exist_written_data(sbi, ino, APPEND_INO)) { + !f2fs_exist_written_data(sbi, ino, APPEND_INO)) { /* it may call write_inode just prior to fsync */ if (need_inode_page_update(sbi, ino)) goto go_write; if (is_inode_flag_set(inode, FI_UPDATE_WRITE) || - exist_written_data(sbi, ino, UPDATE_INO)) + f2fs_exist_written_data(sbi, ino, UPDATE_INO)) goto flush_out; goto out; } @@ -275,7 +277,9 @@ go_write: goto out; } sync_nodes: - ret = fsync_node_pages(sbi, inode, &wbc, atomic); + atomic_inc(&sbi->wb_sync_req[NODE]); + ret = f2fs_fsync_node_pages(sbi, inode, &wbc, atomic); + atomic_dec(&sbi->wb_sync_req[NODE]); if (ret) goto out; @@ -285,7 +289,7 @@ sync_nodes: goto out; } - if (need_inode_block_update(sbi, ino)) { + if (f2fs_need_inode_block_update(sbi, ino)) { f2fs_mark_inode_dirty_sync(inode, true); f2fs_write_inode(inode, NULL); goto sync_nodes; @@ -300,21 +304,21 @@ sync_nodes: * given fsync mark. */ if (!atomic) { - ret = wait_on_node_pages_writeback(sbi, ino); + ret = f2fs_wait_on_node_pages_writeback(sbi, ino); if (ret) goto out; } /* once recovery info is written, don't need to tack this */ - remove_ino_entry(sbi, ino, APPEND_INO); + f2fs_remove_ino_entry(sbi, ino, APPEND_INO); clear_inode_flag(inode, FI_APPEND_WRITE); flush_out: if (!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER) ret = f2fs_issue_flush(sbi, inode->i_ino); if (!ret) { - remove_ino_entry(sbi, ino, UPDATE_INO); + f2fs_remove_ino_entry(sbi, ino, UPDATE_INO); clear_inode_flag(inode, FI_UPDATE_WRITE); - remove_ino_entry(sbi, ino, FLUSH_INO); + f2fs_remove_ino_entry(sbi, ino, FLUSH_INO); } f2fs_update_time(sbi, REQ_TIME); out: @@ -333,18 +337,19 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) static pgoff_t __get_first_dirty_index(struct address_space *mapping, pgoff_t pgofs, int whence) { - struct pagevec pvec; + struct page *page; int nr_pages; if (whence != SEEK_DATA) return 0; /* find first dirty page index */ - pagevec_init(&pvec, 0); - nr_pages = pagevec_lookup_tag(&pvec, mapping, &pgofs, - PAGECACHE_TAG_DIRTY, 1); - pgofs = nr_pages ? pvec.pages[0]->index : ULONG_MAX; - pagevec_release(&pvec); + nr_pages = find_get_pages_tag(mapping, &pgofs, PAGECACHE_TAG_DIRTY, + 1, &page); + if (!nr_pages) + return ULONG_MAX; + pgofs = page->index; + put_page(page); return pgofs; } @@ -354,7 +359,7 @@ static bool __found_offset(block_t blkaddr, pgoff_t dirty, pgoff_t pgofs, switch (whence) { case SEEK_DATA: if ((blkaddr == NEW_ADDR && dirty == pgofs) || - (blkaddr != NEW_ADDR && blkaddr != NULL_ADDR)) + is_valid_blkaddr(blkaddr)) return true; break; case SEEK_HOLE: @@ -394,13 +399,13 @@ static loff_t f2fs_seek_block(struct file *file, loff_t offset, int whence) for (; data_ofs < isize; data_ofs = (loff_t)pgofs << PAGE_SHIFT) { set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, pgofs, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, pgofs, LOOKUP_NODE); if (err && err != -ENOENT) { goto fail; } else if (err == -ENOENT) { /* direct node does not exists */ if (whence == SEEK_DATA) { - pgofs = get_next_page_offset(&dn, pgofs); + pgofs = f2fs_get_next_page_offset(&dn, pgofs); continue; } else { goto found; @@ -414,6 +419,7 @@ static loff_t f2fs_seek_block(struct file *file, loff_t offset, int whence) dn.ofs_in_node++, pgofs++, data_ofs = (loff_t)pgofs << PAGE_SHIFT) { block_t blkaddr; + blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); @@ -488,7 +494,7 @@ static int f2fs_file_open(struct inode *inode, struct file *filp) return dquot_file_open(inode, filp); } -void truncate_data_blocks_range(struct dnode_of_data *dn, int count) +void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct f2fs_node *raw_node; @@ -504,12 +510,13 @@ void truncate_data_blocks_range(struct dnode_of_data *dn, int count) for (; count > 0; count--, addr++, dn->ofs_in_node++) { block_t blkaddr = le32_to_cpu(*addr); + if (blkaddr == NULL_ADDR) continue; dn->data_blkaddr = NULL_ADDR; - set_data_blkaddr(dn); - invalidate_blocks(sbi, blkaddr); + f2fs_set_data_blkaddr(dn); + f2fs_invalidate_blocks(sbi, blkaddr); if (dn->ofs_in_node == 0 && IS_INODE(dn->node_page)) clear_inode_flag(dn->inode, FI_FIRST_BLOCK_WRITTEN); nr_free++; @@ -521,7 +528,7 @@ void truncate_data_blocks_range(struct dnode_of_data *dn, int count) * once we invalidate valid blkaddr in range [ofs, ofs + count], * we will invalidate all blkaddr in the whole range. */ - fofs = start_bidx_of_node(ofs_of_node(dn->node_page), + fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) + ofs; f2fs_update_extent_cache_range(dn, fofs, 0, len); dec_valid_block_count(sbi, dn->inode, nr_free); @@ -533,15 +540,15 @@ void truncate_data_blocks_range(struct dnode_of_data *dn, int count) dn->ofs_in_node, nr_free); } -void truncate_data_blocks(struct dnode_of_data *dn) +void f2fs_truncate_data_blocks(struct dnode_of_data *dn) { - truncate_data_blocks_range(dn, ADDRS_PER_BLOCK); + f2fs_truncate_data_blocks_range(dn, ADDRS_PER_BLOCK); } static int truncate_partial_data_page(struct inode *inode, u64 from, bool cache_only) { - unsigned offset = from & (PAGE_SIZE - 1); + loff_t offset = from & (PAGE_SIZE - 1); pgoff_t index = from >> PAGE_SHIFT; struct address_space *mapping = inode->i_mapping; struct page *page; @@ -557,7 +564,7 @@ static int truncate_partial_data_page(struct inode *inode, u64 from, return 0; } - page = get_lock_data_page(inode, index, true); + page = f2fs_get_lock_data_page(inode, index, true); if (IS_ERR(page)) return PTR_ERR(page) == -ENOENT ? 0 : PTR_ERR(page); truncate_out: @@ -572,7 +579,7 @@ truncate_out: return 0; } -int truncate_blocks(struct inode *inode, u64 from, bool lock) +int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct dnode_of_data dn; @@ -591,21 +598,21 @@ int truncate_blocks(struct inode *inode, u64 from, bool lock) if (lock) f2fs_lock_op(sbi); - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(ipage)) { err = PTR_ERR(ipage); goto out; } if (f2fs_has_inline_data(inode)) { - truncate_inline_inode(inode, ipage, from); + f2fs_truncate_inline_inode(inode, ipage, from); f2fs_put_page(ipage, 1); truncate_page = true; goto out; } set_new_dnode(&dn, inode, ipage, NULL, 0); - err = get_dnode_of_data(&dn, free_from, LOOKUP_NODE_RA); + err = f2fs_get_dnode_of_data(&dn, free_from, LOOKUP_NODE_RA); if (err) { if (err == -ENOENT) goto free_next; @@ -618,13 +625,13 @@ int truncate_blocks(struct inode *inode, u64 from, bool lock) f2fs_bug_on(sbi, count < 0); if (dn.ofs_in_node || IS_INODE(dn.node_page)) { - truncate_data_blocks_range(&dn, count); + f2fs_truncate_data_blocks_range(&dn, count); free_from += count; } f2fs_put_dnode(&dn); free_next: - err = truncate_inode_blocks(inode, free_from); + err = f2fs_truncate_inode_blocks(inode, free_from); out: if (lock) f2fs_unlock_op(sbi); @@ -663,7 +670,7 @@ int f2fs_truncate(struct inode *inode) return err; } - err = truncate_blocks(inode, i_size_read(inode), true); + err = f2fs_truncate_blocks(inode, i_size_read(inode), true); if (err) return err; @@ -689,16 +696,16 @@ int f2fs_getattr(struct vfsmount *mnt, stat->btime.tv_nsec = fi->i_crtime.tv_nsec; } - flags = fi->i_flags & (FS_FL_USER_VISIBLE | FS_PROJINHERIT_FL); - if (flags & FS_APPEND_FL) + flags = fi->i_flags & F2FS_FL_USER_VISIBLE; + if (flags & F2FS_APPEND_FL) stat->attributes |= STATX_ATTR_APPEND; - if (flags & FS_COMPR_FL) + if (flags & F2FS_COMPR_FL) stat->attributes |= STATX_ATTR_COMPRESSED; if (f2fs_encrypted_inode(inode)) stat->attributes |= STATX_ATTR_ENCRYPTED; - if (flags & FS_IMMUTABLE_FL) + if (flags & F2FS_IMMUTABLE_FL) stat->attributes |= STATX_ATTR_IMMUTABLE; - if (flags & FS_NODUMP_FL) + if (flags & F2FS_NODUMP_FL) stat->attributes |= STATX_ATTR_NODUMP; stat->attributes_mask |= (STATX_ATTR_APPEND | @@ -814,7 +821,7 @@ int f2fs_setattr(struct dentry *dentry, struct iattr *attr) __setattr_copy(inode, attr); if (attr->ia_valid & ATTR_MODE) { - err = posix_acl_chmod(inode, get_inode_mode(inode)); + err = posix_acl_chmod(inode, f2fs_get_inode_mode(inode)); if (err || is_inode_flag_set(inode, FI_ACL_MODE)) { inode->i_mode = F2FS_I(inode)->i_acl_mode; clear_inode_flag(inode, FI_ACL_MODE); @@ -856,7 +863,7 @@ static int fill_zero(struct inode *inode, pgoff_t index, f2fs_balance_fs(sbi, true); f2fs_lock_op(sbi); - page = get_new_data_page(inode, NULL, index, false); + page = f2fs_get_new_data_page(inode, NULL, index, false); f2fs_unlock_op(sbi); if (IS_ERR(page)) @@ -869,7 +876,7 @@ static int fill_zero(struct inode *inode, pgoff_t index, return 0; } -int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) +int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) { int err; @@ -878,10 +885,11 @@ int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) pgoff_t end_offset, count; set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, pg_start, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, pg_start, LOOKUP_NODE); if (err) { if (err == -ENOENT) { - pg_start = get_next_page_offset(&dn, pg_start); + pg_start = f2fs_get_next_page_offset(&dn, + pg_start); continue; } return err; @@ -892,7 +900,7 @@ int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end) f2fs_bug_on(F2FS_I_SB(inode), count == 0 || count > end_offset); - truncate_data_blocks_range(&dn, count); + f2fs_truncate_data_blocks_range(&dn, count); f2fs_put_dnode(&dn); pg_start += count; @@ -948,7 +956,7 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len) blk_end - 1); f2fs_lock_op(sbi); - ret = truncate_hole(inode, pg_start, pg_end); + ret = f2fs_truncate_hole(inode, pg_start, pg_end); f2fs_unlock_op(sbi); up_write(&F2FS_I(inode)->i_mmap_sem); } @@ -966,7 +974,7 @@ static int __read_out_blkaddrs(struct inode *inode, block_t *blkaddr, next_dnode: set_new_dnode(&dn, inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, off, LOOKUP_NODE_RA); + ret = f2fs_get_dnode_of_data(&dn, off, LOOKUP_NODE_RA); if (ret && ret != -ENOENT) { return ret; } else if (ret == -ENOENT) { @@ -983,7 +991,7 @@ next_dnode: for (i = 0; i < done; i++, blkaddr++, do_replace++, dn.ofs_in_node++) { *blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); - if (!is_checkpointed_data(sbi, *blkaddr)) { + if (!f2fs_is_checkpointed_data(sbi, *blkaddr)) { if (test_opt(sbi, LFS)) { f2fs_put_dnode(&dn); @@ -1016,10 +1024,10 @@ static int __roll_back_blkaddrs(struct inode *inode, block_t *blkaddr, continue; set_new_dnode(&dn, inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, off + i, LOOKUP_NODE_RA); + ret = f2fs_get_dnode_of_data(&dn, off + i, LOOKUP_NODE_RA); if (ret) { dec_valid_block_count(sbi, inode, 1); - invalidate_blocks(sbi, *blkaddr); + f2fs_invalidate_blocks(sbi, *blkaddr); } else { f2fs_update_data_blkaddr(&dn, *blkaddr); } @@ -1049,18 +1057,18 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode, pgoff_t ilen; set_new_dnode(&dn, dst_inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, dst + i, ALLOC_NODE); + ret = f2fs_get_dnode_of_data(&dn, dst + i, ALLOC_NODE); if (ret) return ret; - get_node_info(sbi, dn.nid, &ni); + f2fs_get_node_info(sbi, dn.nid, &ni); ilen = min((pgoff_t) ADDRS_PER_PAGE(dn.node_page, dst_inode) - dn.ofs_in_node, len - i); do { dn.data_blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node); - truncate_data_blocks_range(&dn, 1); + f2fs_truncate_data_blocks_range(&dn, 1); if (do_replace[i]) { f2fs_i_blocks_write(src_inode, @@ -1083,10 +1091,11 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode, } else { struct page *psrc, *pdst; - psrc = get_lock_data_page(src_inode, src + i, true); + psrc = f2fs_get_lock_data_page(src_inode, + src + i, true); if (IS_ERR(psrc)) return PTR_ERR(psrc); - pdst = get_new_data_page(dst_inode, NULL, dst + i, + pdst = f2fs_get_new_data_page(dst_inode, NULL, dst + i, true); if (IS_ERR(pdst)) { f2fs_put_page(psrc, 1); @@ -1097,7 +1106,8 @@ static int __clone_blkaddrs(struct inode *src_inode, struct inode *dst_inode, f2fs_put_page(pdst, 1); f2fs_put_page(psrc, 1); - ret = truncate_hole(src_inode, src + i, src + i + 1); + ret = f2fs_truncate_hole(src_inode, + src + i, src + i + 1); if (ret) return ret; i++; @@ -1119,12 +1129,14 @@ static int __exchange_data_block(struct inode *src_inode, olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len); src_blkaddr = f2fs_kvzalloc(F2FS_I_SB(src_inode), - sizeof(block_t) * olen, GFP_KERNEL); + array_size(olen, sizeof(block_t)), + GFP_KERNEL); if (!src_blkaddr) return -ENOMEM; do_replace = f2fs_kvzalloc(F2FS_I_SB(src_inode), - sizeof(int) * olen, GFP_KERNEL); + array_size(olen, sizeof(int)), + GFP_KERNEL); if (!do_replace) { kvfree(src_blkaddr); return -ENOMEM; @@ -1150,7 +1162,7 @@ static int __exchange_data_block(struct inode *src_inode, return 0; roll_back: - __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, len); + __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, olen); kvfree(src_blkaddr); kvfree(do_replace); return ret; @@ -1193,7 +1205,7 @@ static int f2fs_collapse_range(struct inode *inode, loff_t offset, loff_t len) pg_end = (offset + len) >> PAGE_SHIFT; /* avoid gc operation during block exchange */ - down_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); down_write(&F2FS_I(inode)->i_mmap_sem); /* write out all dirty pages from offset */ @@ -1214,12 +1226,12 @@ static int f2fs_collapse_range(struct inode *inode, loff_t offset, loff_t len) new_size = i_size_read(inode) - len; truncate_pagecache(inode, new_size); - ret = truncate_blocks(inode, new_size, true); + ret = f2fs_truncate_blocks(inode, new_size, true); if (!ret) f2fs_i_size_write(inode, new_size); out_unlock: up_write(&F2FS_I(inode)->i_mmap_sem); - up_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); return ret; } @@ -1239,7 +1251,7 @@ static int f2fs_do_zero_range(struct dnode_of_data *dn, pgoff_t start, } dn->ofs_in_node = ofs_in_node; - ret = reserve_new_blocks(dn, count); + ret = f2fs_reserve_new_blocks(dn, count); if (ret) return ret; @@ -1248,7 +1260,7 @@ static int f2fs_do_zero_range(struct dnode_of_data *dn, pgoff_t start, dn->data_blkaddr = datablock_addr(dn->inode, dn->node_page, dn->ofs_in_node); /* - * reserve_new_blocks will not guarantee entire block + * f2fs_reserve_new_blocks will not guarantee entire block * allocation. */ if (dn->data_blkaddr == NULL_ADDR) { @@ -1256,9 +1268,9 @@ static int f2fs_do_zero_range(struct dnode_of_data *dn, pgoff_t start, break; } if (dn->data_blkaddr != NEW_ADDR) { - invalidate_blocks(sbi, dn->data_blkaddr); + f2fs_invalidate_blocks(sbi, dn->data_blkaddr); dn->data_blkaddr = NEW_ADDR; - set_data_blkaddr(dn); + f2fs_set_data_blkaddr(dn); } } @@ -1324,7 +1336,7 @@ static int f2fs_zero_range(struct inode *inode, loff_t offset, loff_t len, f2fs_lock_op(sbi); set_new_dnode(&dn, inode, NULL, NULL, 0); - ret = get_dnode_of_data(&dn, index, ALLOC_NODE); + ret = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE); if (ret) { f2fs_unlock_op(sbi); goto out; @@ -1395,10 +1407,10 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) f2fs_balance_fs(sbi, true); /* avoid gc operation during block exchange */ - down_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); down_write(&F2FS_I(inode)->i_mmap_sem); - ret = truncate_blocks(inode, i_size_read(inode), true); + ret = f2fs_truncate_blocks(inode, i_size_read(inode), true); if (ret) goto out; @@ -1436,7 +1448,7 @@ static int f2fs_insert_range(struct inode *inode, loff_t offset, loff_t len) f2fs_i_size_write(inode, new_size); out: up_write(&F2FS_I(inode)->i_mmap_sem); - up_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); return ret; } @@ -1479,7 +1491,7 @@ static int expand_inode_data(struct inode *inode, loff_t offset, last_off = map.m_lblk + map.m_len - 1; /* update new size to the failed position */ - new_size = (last_off == pg_end) ? offset + len: + new_size = (last_off == pg_end) ? offset + len : (loff_t)(last_off + 1) << PAGE_SHIFT; } else { new_size = ((loff_t)pg_end << PAGE_SHIFT) + off_end; @@ -1559,13 +1571,13 @@ static int f2fs_release_file(struct inode *inode, struct file *filp) /* some remained atomic pages should discarded */ if (f2fs_is_atomic_file(inode)) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); if (f2fs_is_volatile_file(inode)) { - clear_inode_flag(inode, FI_VOLATILE_FILE); - stat_dec_volatile_write(inode); set_inode_flag(inode, FI_DROP_CACHE); filemap_fdatawrite(inode->i_mapping); clear_inode_flag(inode, FI_DROP_CACHE); + clear_inode_flag(inode, FI_VOLATILE_FILE); + stat_dec_volatile_write(inode); } return 0; } @@ -1582,7 +1594,7 @@ static int f2fs_file_flush(struct file *file, fl_owner_t id) */ if (f2fs_is_atomic_file(inode) && F2FS_I(inode)->inmem_task == current) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); return 0; } @@ -1590,7 +1602,15 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg) { struct inode *inode = file_inode(filp); struct f2fs_inode_info *fi = F2FS_I(inode); - unsigned int flags = fi->i_flags & FS_FL_USER_VISIBLE; + unsigned int flags = fi->i_flags; + + if (file_is_encrypt(inode)) + flags |= F2FS_ENCRYPT_FL; + if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) + flags |= F2FS_INLINE_DATA_FL; + + flags &= F2FS_FL_USER_VISIBLE; + return put_user(flags, (int __user *)arg); } @@ -1624,15 +1644,15 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg) oldflags = fi->i_flags; - if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) { + if ((flags ^ oldflags) & (F2FS_APPEND_FL | F2FS_IMMUTABLE_FL)) { if (!capable(CAP_LINUX_IMMUTABLE)) { ret = -EPERM; goto unlock_out; } } - flags = flags & FS_FL_USER_MODIFIABLE; - flags |= oldflags & ~FS_FL_USER_MODIFIABLE; + flags = flags & (F2FS_FL_USER_MODIFIABLE); + flags |= oldflags & ~(F2FS_FL_USER_MODIFIABLE); fi->i_flags = flags; inode->i_ctime = current_time(inode); @@ -1668,6 +1688,8 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) inode_lock(inode); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + if (f2fs_is_atomic_file(inode)) goto out; @@ -1675,28 +1697,25 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) if (ret) goto out; - set_inode_flag(inode, FI_ATOMIC_FILE); - set_inode_flag(inode, FI_HOT_DATA); - f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); - if (!get_dirty_pages(inode)) - goto inc_stat; + goto skip_flush; f2fs_msg(F2FS_I_SB(inode)->sb, KERN_WARNING, "Unexpected flush for atomic writes: ino=%lu, npages=%u", inode->i_ino, get_dirty_pages(inode)); ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); - if (ret) { - clear_inode_flag(inode, FI_ATOMIC_FILE); - clear_inode_flag(inode, FI_HOT_DATA); + if (ret) goto out; - } +skip_flush: + set_inode_flag(inode, FI_ATOMIC_FILE); + clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); -inc_stat: F2FS_I(inode)->inmem_task = current; stat_inc_atomic_write(inode); stat_update_max_atomic_write(inode); out: + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; @@ -1716,27 +1735,33 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp) inode_lock(inode); - down_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); - if (f2fs_is_volatile_file(inode)) + if (f2fs_is_volatile_file(inode)) { + ret = -EINVAL; goto err_out; + } if (f2fs_is_atomic_file(inode)) { - ret = commit_inmem_pages(inode); + ret = f2fs_commit_inmem_pages(inode); if (ret) goto err_out; ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true); if (!ret) { clear_inode_flag(inode, FI_ATOMIC_FILE); - clear_inode_flag(inode, FI_HOT_DATA); + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0; stat_dec_atomic_write(inode); } } else { ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false); } err_out: - up_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) { + clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + ret = -EINVAL; + } + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); inode_unlock(inode); mnt_drop_write_file(filp); return ret; @@ -1821,7 +1846,7 @@ static int f2fs_ioc_abort_volatile_write(struct file *filp) inode_lock(inode); if (f2fs_is_atomic_file(inode)) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); if (f2fs_is_volatile_file(inode)) { clear_inode_flag(inode, FI_VOLATILE_FILE); stat_dec_volatile_write(inode); @@ -1849,9 +1874,11 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) if (get_user(in, (__u32 __user *)arg)) return -EFAULT; - ret = mnt_want_write_file(filp); - if (ret) - return ret; + if (in != F2FS_GOING_DOWN_FULLSYNC) { + ret = mnt_want_write_file(filp); + if (ret) + return ret; + } switch (in) { case F2FS_GOING_DOWN_FULLSYNC: @@ -1876,7 +1903,7 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) f2fs_stop_checkpoint(sbi, false); break; case F2FS_GOING_DOWN_METAFLUSH: - sync_meta_pages(sbi, META, LONG_MAX, FS_META_IO); + f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_META_IO); f2fs_stop_checkpoint(sbi, false); break; default: @@ -1884,15 +1911,16 @@ static int f2fs_ioc_shutdown(struct file *filp, unsigned long arg) goto out; } - stop_gc_thread(sbi); - stop_discard_thread(sbi); + f2fs_stop_gc_thread(sbi); + f2fs_stop_discard_thread(sbi); - drop_discard_cmd(sbi); + f2fs_drop_discard_cmd(sbi); clear_opt(sbi, DISCARD); f2fs_update_time(sbi, REQ_TIME); out: - mnt_drop_write_file(filp); + if (in != F2FS_GOING_DOWN_FULLSYNC) + mnt_drop_write_file(filp); return ret; } @@ -2051,15 +2079,15 @@ static int f2fs_ioc_gc_range(struct file *filp, unsigned long arg) if (f2fs_readonly(sbi->sb)) return -EROFS; + end = range.start + range.len; + if (range.start < MAIN_BLKADDR(sbi) || end >= MAX_BLKADDR(sbi)) { + return -EINVAL; + } + ret = mnt_want_write_file(filp); if (ret) return ret; - end = range.start + range.len; - if (range.start < MAIN_BLKADDR(sbi) || end >= MAX_BLKADDR(sbi)) { - ret = -EINVAL; - goto out; - } do_more: if (!range.sync) { if (!mutex_trylock(&sbi->gc_mutex)) { @@ -2079,7 +2107,7 @@ out: return ret; } -static int f2fs_ioc_write_checkpoint(struct file *filp, unsigned long arg) +static int f2fs_ioc_f2fs_write_checkpoint(struct file *filp, unsigned long arg) { struct inode *inode = file_inode(filp); struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -2108,7 +2136,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, struct inode *inode = file_inode(filp); struct f2fs_map_blocks map = { .m_next_extent = NULL, .m_seg_type = NO_CHECK_TYPE }; - struct extent_info ei = {0,0,0}; + struct extent_info ei = {0, 0, 0}; pgoff_t pg_start, pg_end, next_pgofs; unsigned int blk_per_seg = sbi->blocks_per_seg; unsigned int total = 0, sec_num; @@ -2117,7 +2145,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info *sbi, int err; /* if in-place-update policy is enabled, don't waste time here */ - if (should_update_inplace(inode, NULL)) + if (f2fs_should_update_inplace(inode, NULL)) return -EINVAL; pg_start = range->start >> PAGE_SHIFT; @@ -2212,7 +2240,7 @@ do_map: while (idx < map.m_lblk + map.m_len && cnt < blk_per_seg) { struct page *page; - page = get_lock_data_page(inode, idx, true); + page = f2fs_get_lock_data_page(inode, idx, true); if (IS_ERR(page)) { err = PTR_ERR(page); goto clear_out; @@ -2323,12 +2351,12 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, } inode_lock(src); - down_write(&F2FS_I(src)->dio_rwsem[WRITE]); + down_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); if (src != dst) { ret = -EBUSY; if (!inode_trylock(dst)) goto out; - if (!down_write_trylock(&F2FS_I(dst)->dio_rwsem[WRITE])) { + if (!down_write_trylock(&F2FS_I(dst)->i_gc_rwsem[WRITE])) { inode_unlock(dst); goto out; } @@ -2390,11 +2418,11 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in, f2fs_unlock_op(sbi); out_unlock: if (src != dst) { - up_write(&F2FS_I(dst)->dio_rwsem[WRITE]); + up_write(&F2FS_I(dst)->i_gc_rwsem[WRITE]); inode_unlock(dst); } out: - up_write(&F2FS_I(src)->dio_rwsem[WRITE]); + up_write(&F2FS_I(src)->i_gc_rwsem[WRITE]); inode_unlock(src); return ret; } @@ -2521,12 +2549,14 @@ int f2fs_pin_file_control(struct inode *inode, bool inc) /* Use i_gc_failures for normal file as a risk signal. */ if (inc) - f2fs_i_gc_failures_write(inode, fi->i_gc_failures + 1); + f2fs_i_gc_failures_write(inode, + fi->i_gc_failures[GC_FAILURE_PIN] + 1); - if (fi->i_gc_failures > sbi->gc_pin_file_threshold) { + if (fi->i_gc_failures[GC_FAILURE_PIN] > sbi->gc_pin_file_threshold) { f2fs_msg(sbi->sb, KERN_WARNING, "%s: Enable GC = ino %lx after %x GC trials\n", - __func__, inode->i_ino, fi->i_gc_failures); + __func__, inode->i_ino, + fi->i_gc_failures[GC_FAILURE_PIN]); clear_inode_flag(inode, FI_PIN_FILE); return -EAGAIN; } @@ -2557,14 +2587,14 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg) inode_lock(inode); - if (should_update_outplace(inode, NULL)) { + if (f2fs_should_update_outplace(inode, NULL)) { ret = -EINVAL; goto out; } if (!pin) { clear_inode_flag(inode, FI_PIN_FILE); - F2FS_I(inode)->i_gc_failures = 1; + F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] = 1; goto done; } @@ -2577,7 +2607,7 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg) goto out; set_inode_flag(inode, FI_PIN_FILE); - ret = F2FS_I(inode)->i_gc_failures; + ret = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]; done: f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); out: @@ -2592,7 +2622,7 @@ static int f2fs_ioc_get_pin_file(struct file *filp, unsigned long arg) __u32 pin = 0; if (is_inode_flag_set(inode, FI_PIN_FILE)) - pin = F2FS_I(inode)->i_gc_failures; + pin = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]; return put_user(pin, (u32 __user *)arg); } @@ -2616,9 +2646,9 @@ int f2fs_precache_extents(struct inode *inode) while (map.m_lblk < end) { map.m_len = end - map.m_lblk; - down_write(&fi->dio_rwsem[WRITE]); + down_write(&fi->i_gc_rwsem[WRITE]); err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_PRECACHE); - up_write(&fi->dio_rwsem[WRITE]); + up_write(&fi->i_gc_rwsem[WRITE]); if (err) return err; @@ -2670,7 +2700,7 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) case F2FS_IOC_GARBAGE_COLLECT_RANGE: return f2fs_ioc_gc_range(filp, arg); case F2FS_IOC_WRITE_CHECKPOINT: - return f2fs_ioc_write_checkpoint(filp, arg); + return f2fs_ioc_f2fs_write_checkpoint(filp, arg); case F2FS_IOC_DEFRAGMENT: return f2fs_ioc_defragment(filp, arg); case F2FS_IOC_MOVE_RANGE: @@ -2694,7 +2724,6 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); - struct blk_plug plug; ssize_t ret; if (unlikely(f2fs_cp_error(F2FS_I_SB(inode)))) @@ -2724,6 +2753,8 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) iov_iter_count(from)) || f2fs_has_inline_data(inode) || f2fs_force_buffered_io(inode, WRITE)) { + clear_inode_flag(inode, + FI_NO_PREALLOC); inode_unlock(inode); return -EAGAIN; } @@ -2739,9 +2770,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) return err; } } - blk_start_plug(&plug); ret = __generic_file_write_iter(iocb, from); - blk_finish_plug(&plug); clear_inode_flag(inode, FI_NO_PREALLOC); /* if we couldn't write data, we should deallocate blocks. */ diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index d28d31cbd7d2..772ef64d2035 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -76,7 +76,7 @@ static int gc_thread_func(void *data) * invalidated soon after by user update or deletion. * So, I'd like to wait some time to collect dirty segments. */ - if (gc_th->gc_urgent) { + if (sbi->gc_mode == GC_URGENT) { wait_ms = gc_th->urgent_sleep_time; mutex_lock(&sbi->gc_mutex); goto do_gc; @@ -114,7 +114,7 @@ next: return 0; } -int start_gc_thread(struct f2fs_sb_info *sbi) +int f2fs_start_gc_thread(struct f2fs_sb_info *sbi) { struct f2fs_gc_kthread *gc_th; dev_t dev = sbi->sb->s_bdev->bd_dev; @@ -131,8 +131,6 @@ int start_gc_thread(struct f2fs_sb_info *sbi) gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME; gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME; - gc_th->gc_idle = 0; - gc_th->gc_urgent = 0; gc_th->gc_wake= 0; sbi->gc_thread = gc_th; @@ -148,7 +146,7 @@ out: return err; } -void stop_gc_thread(struct f2fs_sb_info *sbi) +void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi) { struct f2fs_gc_kthread *gc_th = sbi->gc_thread; if (!gc_th) @@ -158,21 +156,19 @@ void stop_gc_thread(struct f2fs_sb_info *sbi) sbi->gc_thread = NULL; } -static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type) +static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type) { int gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY; - if (!gc_th) - return gc_mode; - - if (gc_th->gc_idle) { - if (gc_th->gc_idle == 1) - gc_mode = GC_CB; - else if (gc_th->gc_idle == 2) - gc_mode = GC_GREEDY; - } - if (gc_th->gc_urgent) + switch (sbi->gc_mode) { + case GC_IDLE_CB: + gc_mode = GC_CB; + break; + case GC_IDLE_GREEDY: + case GC_URGENT: gc_mode = GC_GREEDY; + break; + } return gc_mode; } @@ -187,7 +183,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, p->max_search = dirty_i->nr_dirty[type]; p->ofs_unit = 1; } else { - p->gc_mode = select_gc_type(sbi->gc_thread, gc_type); + p->gc_mode = select_gc_type(sbi, gc_type); p->dirty_segmap = dirty_i->dirty_segmap[DIRTY]; p->max_search = dirty_i->nr_dirty[DIRTY]; p->ofs_unit = sbi->segs_per_sec; @@ -195,7 +191,7 @@ static void select_policy(struct f2fs_sb_info *sbi, int gc_type, /* we need to check every dirty segments in the FG_GC case */ if (gc_type != FG_GC && - (sbi->gc_thread && !sbi->gc_thread->gc_urgent) && + (sbi->gc_mode != GC_URGENT) && p->max_search > sbi->max_victim_search) p->max_search = sbi->max_victim_search; @@ -234,10 +230,6 @@ static unsigned int check_bg_victims(struct f2fs_sb_info *sbi) for_each_set_bit(secno, dirty_i->victim_secmap, MAIN_SECS(sbi)) { if (sec_usage_check(sbi, secno)) continue; - - if (no_fggc_candidate(sbi, secno)) - continue; - clear_bit(secno, dirty_i->victim_secmap); return GET_SEG_FROM_SEC(sbi, secno); } @@ -377,9 +369,6 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, goto next; if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) goto next; - if (gc_type == FG_GC && p.alloc_mode == LFS && - no_fggc_candidate(sbi, secno)) - goto next; cost = get_gc_cost(sbi, segno, &p); @@ -440,7 +429,7 @@ static void add_gc_inode(struct gc_inode_list *gc_list, struct inode *inode) iput(inode); return; } - new_ie = f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS); + new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab, GFP_NOFS); new_ie->inode = inode; f2fs_radix_tree_insert(&gc_list->iroot, inode->i_ino, new_ie); @@ -454,7 +443,7 @@ static void put_gc_inode(struct gc_inode_list *gc_list) radix_tree_delete(&gc_list->iroot, ie->inode->i_ino); iput(ie->inode); list_del(&ie->list); - kmem_cache_free(inode_entry_slab, ie); + kmem_cache_free(f2fs_inode_entry_slab, ie); } } @@ -484,12 +473,16 @@ static void gc_node_segment(struct f2fs_sb_info *sbi, block_t start_addr; int off; int phase = 0; + bool fggc = (gc_type == FG_GC); start_addr = START_BLOCK(sbi, segno); next_step: entry = sum; + if (fggc && phase == 2) + atomic_inc(&sbi->wb_sync_req[NODE]); + for (off = 0; off < sbi->blocks_per_seg; off++, entry++) { nid_t nid = le32_to_cpu(entry->nid); struct page *node_page; @@ -503,39 +496,42 @@ next_step: continue; if (phase == 0) { - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, META_NAT, true); continue; } if (phase == 1) { - ra_node_page(sbi, nid); + f2fs_ra_node_page(sbi, nid); continue; } /* phase == 2 */ - node_page = get_node_page(sbi, nid); + node_page = f2fs_get_node_page(sbi, nid); if (IS_ERR(node_page)) continue; - /* block may become invalid during get_node_page */ + /* block may become invalid during f2fs_get_node_page */ if (check_valid_map(sbi, segno, off) == 0) { f2fs_put_page(node_page, 1); continue; } - get_node_info(sbi, nid, &ni); + f2fs_get_node_info(sbi, nid, &ni); if (ni.blk_addr != start_addr + off) { f2fs_put_page(node_page, 1); continue; } - move_node_page(node_page, gc_type); + f2fs_move_node_page(node_page, gc_type); stat_inc_node_blk_count(sbi, 1, gc_type); } if (++phase < 3) goto next_step; + + if (fggc) + atomic_dec(&sbi->wb_sync_req[NODE]); } /* @@ -545,7 +541,7 @@ next_step: * as indirect or double indirect node blocks, are given, it must be a caller's * bug. */ -block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode) +block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode) { unsigned int indirect_blks = 2 * NIDS_PER_BLOCK + 4; unsigned int bidx; @@ -576,11 +572,11 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, nid = le32_to_cpu(sum->nid); ofs_in_node = le16_to_cpu(sum->ofs_in_node); - node_page = get_node_page(sbi, nid); + node_page = f2fs_get_node_page(sbi, nid); if (IS_ERR(node_page)) return false; - get_node_info(sbi, nid, dni); + f2fs_get_node_info(sbi, nid, dni); if (sum->version != dni->version) { f2fs_msg(sbi->sb, KERN_WARNING, @@ -603,7 +599,7 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, * This can be used to move blocks, aka LBAs, directly on disk. */ static void move_data_block(struct inode *inode, block_t bidx, - unsigned int segno, int off) + int gc_type, unsigned int segno, int off) { struct f2fs_io_info fio = { .sbi = F2FS_I_SB(inode), @@ -614,6 +610,7 @@ static void move_data_block(struct inode *inode, block_t bidx, .op_flags = REQ_SYNC, .encrypted_page = NULL, .in_list = false, + .retry = false, }; struct dnode_of_data dn; struct f2fs_summary sum; @@ -621,6 +618,7 @@ static void move_data_block(struct inode *inode, block_t bidx, struct page *page; block_t newaddr; int err; + bool lfs_mode = test_opt(fio.sbi, LFS); /* do not read out */ page = f2fs_grab_cache_page(inode->i_mapping, bidx, false); @@ -630,8 +628,11 @@ static void move_data_block(struct inode *inode, block_t bidx, if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode)) { + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++; + F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++; goto out; + } if (f2fs_is_pinned_file(inode)) { f2fs_pin_file_control(inode, true); @@ -639,7 +640,7 @@ static void move_data_block(struct inode *inode, block_t bidx, } set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, bidx, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, bidx, LOOKUP_NODE); if (err) goto out; @@ -654,14 +655,17 @@ static void move_data_block(struct inode *inode, block_t bidx, */ f2fs_wait_on_page_writeback(page, DATA, true); - get_node_info(fio.sbi, dn.nid, &ni); + f2fs_get_node_info(fio.sbi, dn.nid, &ni); set_summary(&sum, dn.nid, dn.ofs_in_node, ni.version); /* read page */ fio.page = page; fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr; - allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr, + if (lfs_mode) + down_write(&fio.sbi->io_order_lock); + + f2fs_allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr, &sum, CURSEG_COLD_DATA, NULL, false); fio.encrypted_page = f2fs_pagecache_get_page(META_MAPPING(fio.sbi), @@ -701,8 +705,8 @@ static void move_data_block(struct inode *inode, block_t bidx, fio.op = REQ_OP_WRITE; fio.op_flags = REQ_SYNC | REQ_NOIDLE; fio.new_blkaddr = newaddr; - err = f2fs_submit_page_write(&fio); - if (err) { + f2fs_submit_page_write(&fio); + if (fio.retry) { if (PageWriteback(fio.encrypted_page)) end_page_writeback(fio.encrypted_page); goto put_page_out; @@ -717,8 +721,10 @@ static void move_data_block(struct inode *inode, block_t bidx, put_page_out: f2fs_put_page(fio.encrypted_page, 1); recover_block: + if (lfs_mode) + up_write(&fio.sbi->io_order_lock); if (err) - __f2fs_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr, + f2fs_do_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr, true, true); put_out: f2fs_put_dnode(&dn); @@ -731,15 +737,18 @@ static void move_data_page(struct inode *inode, block_t bidx, int gc_type, { struct page *page; - page = get_lock_data_page(inode, bidx, true); + page = f2fs_get_lock_data_page(inode, bidx, true); if (IS_ERR(page)) return; if (!check_valid_map(F2FS_I_SB(inode), segno, off)) goto out; - if (f2fs_is_atomic_file(inode)) + if (f2fs_is_atomic_file(inode)) { + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++; + F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++; goto out; + } if (f2fs_is_pinned_file(inode)) { if (gc_type == FG_GC) f2fs_pin_file_control(inode, true); @@ -773,15 +782,20 @@ retry: f2fs_wait_on_page_writeback(page, DATA, true); if (clear_page_dirty_for_io(page)) { inode_dec_dirty_pages(inode); - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); } set_cold_data(page); - err = do_write_data_page(&fio); - if (err == -ENOMEM && is_dirty) { - congestion_wait(BLK_RW_ASYNC, HZ/50); - goto retry; + err = f2fs_do_write_data_page(&fio); + if (err) { + clear_cold_data(page); + if (err == -ENOMEM) { + congestion_wait(BLK_RW_ASYNC, HZ/50); + goto retry; + } + if (is_dirty) + set_page_dirty(page); } } out: @@ -825,13 +839,13 @@ next_step: continue; if (phase == 0) { - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1, META_NAT, true); continue; } if (phase == 1) { - ra_node_page(sbi, nid); + f2fs_ra_node_page(sbi, nid); continue; } @@ -840,7 +854,7 @@ next_step: continue; if (phase == 2) { - ra_node_page(sbi, dni.ino); + f2fs_ra_node_page(sbi, dni.ino); continue; } @@ -858,16 +872,16 @@ next_step: } if (!down_write_trylock( - &F2FS_I(inode)->dio_rwsem[WRITE])) { + &F2FS_I(inode)->i_gc_rwsem[WRITE])) { iput(inode); continue; } - start_bidx = start_bidx_of_node(nofs, inode); - data_page = get_read_data_page(inode, + start_bidx = f2fs_start_bidx_of_node(nofs, inode); + data_page = f2fs_get_read_data_page(inode, start_bidx + ofs_in_node, REQ_RAHEAD, true); - up_write(&F2FS_I(inode)->dio_rwsem[WRITE]); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); if (IS_ERR(data_page)) { iput(inode); continue; @@ -885,11 +899,11 @@ next_step: bool locked = false; if (S_ISREG(inode->i_mode)) { - if (!down_write_trylock(&fi->dio_rwsem[READ])) + if (!down_write_trylock(&fi->i_gc_rwsem[READ])) continue; if (!down_write_trylock( - &fi->dio_rwsem[WRITE])) { - up_write(&fi->dio_rwsem[READ]); + &fi->i_gc_rwsem[WRITE])) { + up_write(&fi->i_gc_rwsem[READ]); continue; } locked = true; @@ -898,17 +912,18 @@ next_step: inode_dio_wait(inode); } - start_bidx = start_bidx_of_node(nofs, inode) + start_bidx = f2fs_start_bidx_of_node(nofs, inode) + ofs_in_node; if (f2fs_post_read_required(inode)) - move_data_block(inode, start_bidx, segno, off); + move_data_block(inode, start_bidx, gc_type, + segno, off); else move_data_page(inode, start_bidx, gc_type, segno, off); if (locked) { - up_write(&fi->dio_rwsem[WRITE]); - up_write(&fi->dio_rwsem[READ]); + up_write(&fi->i_gc_rwsem[WRITE]); + up_write(&fi->i_gc_rwsem[READ]); } stat_inc_data_blk_count(sbi, 1, gc_type); @@ -947,12 +962,12 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi, /* readahead multi ssa blocks those have contiguous address */ if (sbi->segs_per_sec > 1) - ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno), + f2fs_ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno), sbi->segs_per_sec, META_SSA, true); /* reference all summary page */ while (segno < end_segno) { - sum_page = get_sum_page(sbi, segno++); + sum_page = f2fs_get_sum_page(sbi, segno++); unlock_page(sum_page); } @@ -1018,6 +1033,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, .ilist = LIST_HEAD_INIT(gc_list.ilist), .iroot = RADIX_TREE_INIT(GFP_NOFS), }; + unsigned long long last_skipped = sbi->skipped_atomic_files[FG_GC]; + unsigned int skipped_round = 0, round = 0; trace_f2fs_gc_begin(sbi->sb, sync, background, get_pages(sbi, F2FS_DIRTY_NODES), @@ -1046,7 +1063,7 @@ gc_more: * secure free segments which doesn't need fggc any more. */ if (prefree_segments(sbi)) { - ret = write_checkpoint(sbi, &cpc); + ret = f2fs_write_checkpoint(sbi, &cpc); if (ret) goto stop; } @@ -1069,17 +1086,27 @@ gc_more: sec_freed++; total_freed += seg_freed; + if (gc_type == FG_GC) { + if (sbi->skipped_atomic_files[FG_GC] > last_skipped) + skipped_round++; + last_skipped = sbi->skipped_atomic_files[FG_GC]; + round++; + } + if (gc_type == FG_GC) sbi->cur_victim_sec = NULL_SEGNO; if (!sync) { if (has_not_enough_free_secs(sbi, sec_freed, 0)) { + if (skipped_round > MAX_SKIP_ATOMIC_COUNT && + skipped_round * 2 >= round) + f2fs_drop_inmem_pages_all(sbi, true); segno = NULL_SEGNO; goto gc_more; } if (gc_type == FG_GC) - ret = write_checkpoint(sbi, &cpc); + ret = f2fs_write_checkpoint(sbi, &cpc); } stop: SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0; @@ -1103,19 +1130,10 @@ stop: return ret; } -void build_gc_manager(struct f2fs_sb_info *sbi) +void f2fs_build_gc_manager(struct f2fs_sb_info *sbi) { - u64 main_count, resv_count, ovp_count; - DIRTY_I(sbi)->v_ops = &default_v_ops; - /* threshold of # of valid blocks in a section for victims of FG_GC */ - main_count = SM_I(sbi)->main_segments << sbi->log_blocks_per_seg; - resv_count = SM_I(sbi)->reserved_segments << sbi->log_blocks_per_seg; - ovp_count = SM_I(sbi)->ovp_segments << sbi->log_blocks_per_seg; - - sbi->fggc_threshold = div64_u64((main_count - ovp_count) * - BLKS_PER_SEC(sbi), (main_count - resv_count)); sbi->gc_pin_file_threshold = DEF_GC_FAILED_PINNED_FILES; /* give warm/cold data area from slower device */ diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index b0045d4c8d1e..c8619e408009 100644 --- a/fs/f2fs/gc.h +++ b/fs/f2fs/gc.h @@ -36,8 +36,6 @@ struct f2fs_gc_kthread { unsigned int no_gc_sleep_time; /* for changing gc mode */ - unsigned int gc_idle; - unsigned int gc_urgent; unsigned int gc_wake; }; diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index ac951ee9b20b..b71d9f6783ea 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -43,7 +43,7 @@ bool f2fs_may_inline_dentry(struct inode *inode) return true; } -void read_inline_data(struct page *page, struct page *ipage) +void f2fs_do_read_inline_data(struct page *page, struct page *ipage) { struct inode *inode = page->mapping->host; void *src_addr, *dst_addr; @@ -65,7 +65,8 @@ void read_inline_data(struct page *page, struct page *ipage) SetPageUptodate(page); } -void truncate_inline_inode(struct inode *inode, struct page *ipage, u64 from) +void f2fs_truncate_inline_inode(struct inode *inode, + struct page *ipage, u64 from) { void *addr; @@ -97,7 +98,7 @@ int f2fs_read_inline_data(struct inode *inode, struct page *page) path, current->comm); } - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) { trace_android_fs_dataread_end(inode, page_offset(page), PAGE_SIZE); @@ -115,7 +116,7 @@ int f2fs_read_inline_data(struct inode *inode, struct page *page) if (page->index) zero_user_segment(page, 0, PAGE_SIZE); else - read_inline_data(page, ipage); + f2fs_do_read_inline_data(page, ipage); if (!PageUptodate(page)) SetPageUptodate(page); @@ -149,7 +150,7 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) f2fs_bug_on(F2FS_P_SB(page), PageWriteback(page)); - read_inline_data(page, dn->inode_page); + f2fs_do_read_inline_data(page, dn->inode_page); set_page_dirty(page); /* clear dirty state */ @@ -160,18 +161,18 @@ int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page) ClearPageError(page); fio.old_blkaddr = dn->data_blkaddr; set_inode_flag(dn->inode, FI_HOT_DATA); - write_data_page(dn, &fio); + f2fs_outplace_write_data(dn, &fio); f2fs_wait_on_page_writeback(page, DATA, true); if (dirty) { inode_dec_dirty_pages(dn->inode); - remove_dirty_inode(dn->inode); + f2fs_remove_dirty_inode(dn->inode); } /* this converted inline_data should be recovered. */ set_inode_flag(dn->inode, FI_APPEND_WRITE); /* clear inline data and flag after data writeback */ - truncate_inline_inode(dn->inode, dn->inode_page, 0); + f2fs_truncate_inline_inode(dn->inode, dn->inode_page, 0); clear_inline_node(dn->inode_page); clear_out: stat_dec_inline_inode(dn->inode); @@ -196,7 +197,7 @@ int f2fs_convert_inline_inode(struct inode *inode) f2fs_lock_op(sbi); - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(ipage)) { err = PTR_ERR(ipage); goto out; @@ -222,12 +223,10 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page) { void *src_addr, *dst_addr; struct dnode_of_data dn; - struct address_space *mapping = page_mapping(page); - unsigned long flags; int err; set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, 0, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, 0, LOOKUP_NODE); if (err) return err; @@ -245,10 +244,7 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page) kunmap_atomic(src_addr); set_page_dirty(dn.inode_page); - spin_lock_irqsave(&mapping->tree_lock, flags); - radix_tree_tag_clear(&mapping->page_tree, page_index(page), - PAGECACHE_TAG_DIRTY); - spin_unlock_irqrestore(&mapping->tree_lock, flags); + f2fs_clear_radix_tree_dirty_tag(page); set_inode_flag(inode, FI_APPEND_WRITE); set_inode_flag(inode, FI_DATA_EXIST); @@ -258,7 +254,7 @@ int f2fs_write_inline_data(struct inode *inode, struct page *page) return 0; } -bool recover_inline_data(struct inode *inode, struct page *npage) +bool f2fs_recover_inline_data(struct inode *inode, struct page *npage) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode *ri = NULL; @@ -279,7 +275,7 @@ bool recover_inline_data(struct inode *inode, struct page *npage) if (f2fs_has_inline_data(inode) && ri && (ri->i_inline & F2FS_INLINE_DATA)) { process_inline: - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); f2fs_bug_on(sbi, IS_ERR(ipage)); f2fs_wait_on_page_writeback(ipage, NODE, true); @@ -297,20 +293,20 @@ process_inline: } if (f2fs_has_inline_data(inode)) { - ipage = get_node_page(sbi, inode->i_ino); + ipage = f2fs_get_node_page(sbi, inode->i_ino); f2fs_bug_on(sbi, IS_ERR(ipage)); - truncate_inline_inode(inode, ipage, 0); + f2fs_truncate_inline_inode(inode, ipage, 0); clear_inode_flag(inode, FI_INLINE_DATA); f2fs_put_page(ipage, 1); } else if (ri && (ri->i_inline & F2FS_INLINE_DATA)) { - if (truncate_blocks(inode, 0, false)) + if (f2fs_truncate_blocks(inode, 0, false)) return false; goto process_inline; } return false; } -struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, +struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir, struct fscrypt_name *fname, struct page **res_page) { struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb); @@ -321,7 +317,7 @@ struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, void *inline_dentry; f2fs_hash_t namehash; - ipage = get_node_page(sbi, dir->i_ino); + ipage = f2fs_get_node_page(sbi, dir->i_ino); if (IS_ERR(ipage)) { *res_page = ipage; return NULL; @@ -332,7 +328,7 @@ struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, inline_dentry = inline_data_addr(dir, ipage); make_dentry_ptr_inline(dir, &d, inline_dentry); - de = find_target_dentry(fname, namehash, NULL, &d); + de = f2fs_find_target_dentry(fname, namehash, NULL, &d); unlock_page(ipage); if (de) *res_page = ipage; @@ -342,7 +338,7 @@ struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir, return de; } -int make_empty_inline_dir(struct inode *inode, struct inode *parent, +int f2fs_make_empty_inline_dir(struct inode *inode, struct inode *parent, struct page *ipage) { struct f2fs_dentry_ptr d; @@ -351,7 +347,7 @@ int make_empty_inline_dir(struct inode *inode, struct inode *parent, inline_dentry = inline_data_addr(inode, ipage); make_dentry_ptr_inline(inode, &d, inline_dentry); - do_make_empty_dir(inode, parent, &d); + f2fs_do_make_empty_dir(inode, parent, &d); set_page_dirty(ipage); @@ -386,7 +382,6 @@ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, goto out; f2fs_wait_on_page_writeback(page, DATA, true); - zero_user_segment(page, MAX_INLINE_DATA(dir), PAGE_SIZE); dentry_blk = page_address(page); @@ -410,7 +405,7 @@ static int f2fs_move_inline_dirents(struct inode *dir, struct page *ipage, set_page_dirty(page); /* clear inline dir and flag after data writeback */ - truncate_inline_inode(dir, ipage, 0); + f2fs_truncate_inline_inode(dir, ipage, 0); stat_dec_inline_dir(dir); clear_inode_flag(dir, FI_INLINE_DENTRY); @@ -453,7 +448,7 @@ static int f2fs_add_inline_entries(struct inode *dir, void *inline_dentry) new_name.len = le16_to_cpu(de->name_len); ino = le32_to_cpu(de->ino); - fake_mode = get_de_type(de) << S_SHIFT; + fake_mode = f2fs_get_de_type(de) << S_SHIFT; err = f2fs_add_regular_entry(dir, &new_name, NULL, NULL, ino, fake_mode); @@ -465,8 +460,8 @@ static int f2fs_add_inline_entries(struct inode *dir, void *inline_dentry) return 0; punch_dentry_pages: truncate_inode_pages(&dir->i_data, 0); - truncate_blocks(dir, 0, false); - remove_dirty_inode(dir); + f2fs_truncate_blocks(dir, 0, false); + f2fs_remove_dirty_inode(dir); return err; } @@ -484,7 +479,7 @@ static int f2fs_move_rehashed_dirents(struct inode *dir, struct page *ipage, } memcpy(backup_dentry, inline_dentry, MAX_INLINE_DATA(dir)); - truncate_inline_inode(dir, ipage, 0); + f2fs_truncate_inline_inode(dir, ipage, 0); unlock_page(ipage); @@ -533,14 +528,14 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, struct page *page = NULL; int err = 0; - ipage = get_node_page(sbi, dir->i_ino); + ipage = f2fs_get_node_page(sbi, dir->i_ino); if (IS_ERR(ipage)) return PTR_ERR(ipage); inline_dentry = inline_data_addr(dir, ipage); make_dentry_ptr_inline(dir, &d, inline_dentry); - bit_pos = room_for_filename(d.bitmap, slots, d.max); + bit_pos = f2fs_room_for_filename(d.bitmap, slots, d.max); if (bit_pos >= d.max) { err = f2fs_convert_inline_dir(dir, ipage, inline_dentry); if (err) @@ -551,7 +546,7 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, if (inode) { down_write(&F2FS_I(inode)->i_sem); - page = init_inode_metadata(inode, dir, new_name, + page = f2fs_init_inode_metadata(inode, dir, new_name, orig_name, ipage); if (IS_ERR(page)) { err = PTR_ERR(page); @@ -572,7 +567,7 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name, f2fs_put_page(page, 1); } - update_parent_metadata(dir, inode, 0); + f2fs_update_parent_metadata(dir, inode, 0); fail: if (inode) up_write(&F2FS_I(inode)->i_sem); @@ -618,7 +613,7 @@ bool f2fs_empty_inline_dir(struct inode *dir) void *inline_dentry; struct f2fs_dentry_ptr d; - ipage = get_node_page(sbi, dir->i_ino); + ipage = f2fs_get_node_page(sbi, dir->i_ino); if (IS_ERR(ipage)) return false; @@ -649,7 +644,7 @@ int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx, if (ctx->pos == d.max) return 0; - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) return PTR_ERR(ipage); @@ -675,7 +670,7 @@ int f2fs_inline_data_fiemap(struct inode *inode, struct page *ipage; int err = 0; - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) return PTR_ERR(ipage); @@ -691,7 +686,7 @@ int f2fs_inline_data_fiemap(struct inode *inode, ilen = start + len; ilen -= start; - get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni); + f2fs_get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni); byteaddr = (__u64)ni.blk_addr << inode->i_sb->s_blocksize_bits; byteaddr += (char *)inline_data_addr(inode, ipage) - (char *)F2FS_INODE(ipage); diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 51846fc54fbd..27e18b5cb459 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -36,15 +36,15 @@ void f2fs_set_inode_flags(struct inode *inode) unsigned int flags = F2FS_I(inode)->i_flags; unsigned int new_fl = 0; - if (flags & FS_SYNC_FL) + if (flags & F2FS_SYNC_FL) new_fl |= S_SYNC; - if (flags & FS_APPEND_FL) + if (flags & F2FS_APPEND_FL) new_fl |= S_APPEND; - if (flags & FS_IMMUTABLE_FL) + if (flags & F2FS_IMMUTABLE_FL) new_fl |= S_IMMUTABLE; - if (flags & FS_NOATIME_FL) + if (flags & F2FS_NOATIME_FL) new_fl |= S_NOATIME; - if (flags & FS_DIRSYNC_FL) + if (flags & F2FS_DIRSYNC_FL) new_fl |= S_DIRSYNC; if (f2fs_encrypted_inode(inode)) new_fl |= S_ENCRYPTED; @@ -72,7 +72,7 @@ static bool __written_first_block(struct f2fs_inode *ri) { block_t addr = le32_to_cpu(ri->i_addr[offset_in_addr(ri)]); - if (addr != NEW_ADDR && addr != NULL_ADDR) + if (is_valid_blkaddr(addr)) return true; return false; } @@ -117,7 +117,6 @@ static void __recover_inline_status(struct inode *inode, struct page *ipage) static bool f2fs_enable_inode_chksum(struct f2fs_sb_info *sbi, struct page *page) { struct f2fs_inode *ri = &F2FS_NODE(page)->i; - int extra_isize = le32_to_cpu(ri->i_extra_isize); if (!f2fs_sb_has_inode_chksum(sbi->sb)) return false; @@ -125,7 +124,8 @@ static bool f2fs_enable_inode_chksum(struct f2fs_sb_info *sbi, struct page *page if (!RAW_IS_INODE(F2FS_NODE(page)) || !(ri->i_inline & F2FS_EXTRA_ATTR)) return false; - if (!F2FS_FITS_IN_INODE(ri, extra_isize, i_inode_checksum)) + if (!F2FS_FITS_IN_INODE(ri, le16_to_cpu(ri->i_extra_isize), + i_inode_checksum)) return false; return true; @@ -185,6 +185,21 @@ void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page) ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, page)); } +static bool sanity_check_inode(struct inode *inode) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + + if (f2fs_sb_has_flexible_inline_xattr(sbi->sb) + && !f2fs_has_extra_attr(inode)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: corrupted inode ino=%lx, run fsck to fix.", + __func__, inode->i_ino); + return false; + } + return true; +} + static int do_read_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -194,14 +209,10 @@ static int do_read_inode(struct inode *inode) projid_t i_projid; /* Check if ino is within scope */ - if (check_nid_range(sbi, inode->i_ino)) { - f2fs_msg(inode->i_sb, KERN_ERR, "bad inode number: %lu", - (unsigned long) inode->i_ino); - WARN_ON(1); + if (f2fs_check_nid_range(sbi, inode->i_ino)) return -EINVAL; - } - node_page = get_node_page(sbi, inode->i_ino); + node_page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(node_page)) return PTR_ERR(node_page); @@ -221,8 +232,11 @@ static int do_read_inode(struct inode *inode) inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec); inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec); inode->i_generation = le32_to_cpu(ri->i_generation); - - fi->i_current_depth = le32_to_cpu(ri->i_current_depth); + if (S_ISDIR(inode->i_mode)) + fi->i_current_depth = le32_to_cpu(ri->i_current_depth); + else if (S_ISREG(inode->i_mode)) + fi->i_gc_failures[GC_FAILURE_PIN] = + le16_to_cpu(ri->i_gc_failures); fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid); fi->i_flags = le32_to_cpu(ri->i_flags); fi->flags = 0; @@ -239,7 +253,6 @@ static int do_read_inode(struct inode *inode) le16_to_cpu(ri->i_extra_isize) : 0; if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) { - f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode)); fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size); } else if (f2fs_has_inline_xattr(inode) || f2fs_has_inline_dentry(inode)) { @@ -265,10 +278,10 @@ static int do_read_inode(struct inode *inode) if (__written_first_block(ri)) set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN); - if (!need_inode_block_update(sbi, inode->i_ino)) + if (!f2fs_need_inode_block_update(sbi, inode->i_ino)) fi->last_disk_size = inode->i_size; - if (fi->i_flags & FS_PROJINHERIT_FL) + if (fi->i_flags & F2FS_PROJINHERIT_FL) set_inode_flag(inode, FI_PROJ_INHERIT); if (f2fs_has_extra_attr(inode) && f2fs_sb_has_project_quota(sbi->sb) && @@ -317,13 +330,17 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) ret = do_read_inode(inode); if (ret) goto bad_inode; + if (!sanity_check_inode(inode)) { + ret = -EINVAL; + goto bad_inode; + } make_now: if (ino == F2FS_NODE_INO(sbi)) { inode->i_mapping->a_ops = &f2fs_node_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO); + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); } else if (ino == F2FS_META_INO(sbi)) { inode->i_mapping->a_ops = &f2fs_meta_aops; - mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO); + mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS); } else if (S_ISREG(inode->i_mode)) { inode->i_op = &f2fs_file_inode_operations; inode->i_fop = &f2fs_file_operations; @@ -373,7 +390,7 @@ retry: return inode; } -void update_inode(struct inode *inode, struct page *node_page) +void f2fs_update_inode(struct inode *inode, struct page *node_page) { struct f2fs_inode *ri; struct extent_tree *et = F2FS_I(inode)->extent_tree; @@ -408,7 +425,12 @@ void update_inode(struct inode *inode, struct page *node_page) ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec); ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec); ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec); - ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth); + if (S_ISDIR(inode->i_mode)) + ri->i_current_depth = + cpu_to_le32(F2FS_I(inode)->i_current_depth); + else if (S_ISREG(inode->i_mode)) + ri->i_gc_failures = + cpu_to_le16(F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]); ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid); ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags); ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino); @@ -454,12 +476,12 @@ void update_inode(struct inode *inode, struct page *node_page) F2FS_I(inode)->i_disk_time[3] = F2FS_I(inode)->i_crtime; } -void update_inode_page(struct inode *inode) +void f2fs_update_inode_page(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct page *node_page; retry: - node_page = get_node_page(sbi, inode->i_ino); + node_page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(node_page)) { int err = PTR_ERR(node_page); if (err == -ENOMEM) { @@ -470,7 +492,7 @@ retry: } return; } - update_inode(inode, node_page); + f2fs_update_inode(inode, node_page); f2fs_put_page(node_page, 1); } @@ -489,7 +511,7 @@ int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc) * We need to balance fs here to prevent from producing dirty node pages * during the urgent cleaning time when runing out of free sections. */ - update_inode_page(inode); + f2fs_update_inode_page(inode); if (wbc && wbc->nr_to_write) f2fs_balance_fs(sbi, true); return 0; @@ -506,7 +528,7 @@ void f2fs_evict_inode(struct inode *inode) /* some remained atomic pages should discarded */ if (f2fs_is_atomic_file(inode)) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); trace_f2fs_evict_inode(inode); truncate_inode_pages_final(&inode->i_data); @@ -516,7 +538,7 @@ void f2fs_evict_inode(struct inode *inode) goto out_clear; f2fs_bug_on(sbi, get_dirty_pages(inode)); - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); f2fs_destroy_extent_tree(inode); @@ -525,9 +547,9 @@ void f2fs_evict_inode(struct inode *inode) dquot_initialize(inode); - remove_ino_entry(sbi, inode->i_ino, APPEND_INO); - remove_ino_entry(sbi, inode->i_ino, UPDATE_INO); - remove_ino_entry(sbi, inode->i_ino, FLUSH_INO); + f2fs_remove_ino_entry(sbi, inode->i_ino, APPEND_INO); + f2fs_remove_ino_entry(sbi, inode->i_ino, UPDATE_INO); + f2fs_remove_ino_entry(sbi, inode->i_ino, FLUSH_INO); sb_start_intwrite(inode->i_sb); set_inode_flag(inode, FI_NO_ALLOC); @@ -544,7 +566,7 @@ retry: #endif if (!err) { f2fs_lock_op(sbi); - err = remove_inode_page(inode); + err = f2fs_remove_inode_page(inode); f2fs_unlock_op(sbi); if (err == -ENOENT) err = 0; @@ -557,7 +579,7 @@ retry: } if (err) - update_inode_page(inode); + f2fs_update_inode_page(inode); dquot_free_inode(inode); sb_end_intwrite(inode->i_sb); no_delete: @@ -580,16 +602,19 @@ no_delete: invalidate_mapping_pages(NODE_MAPPING(sbi), xnid, xnid); if (inode->i_nlink) { if (is_inode_flag_set(inode, FI_APPEND_WRITE)) - add_ino_entry(sbi, inode->i_ino, APPEND_INO); + f2fs_add_ino_entry(sbi, inode->i_ino, APPEND_INO); if (is_inode_flag_set(inode, FI_UPDATE_WRITE)) - add_ino_entry(sbi, inode->i_ino, UPDATE_INO); + f2fs_add_ino_entry(sbi, inode->i_ino, UPDATE_INO); } if (is_inode_flag_set(inode, FI_FREE_NID)) { - alloc_nid_failed(sbi, inode->i_ino); + f2fs_alloc_nid_failed(sbi, inode->i_ino); clear_inode_flag(inode, FI_FREE_NID); } else { - f2fs_bug_on(sbi, err && - !exist_written_data(sbi, inode->i_ino, ORPHAN_INO)); + /* + * If xattr nid is corrupted, we can reach out error condition, + * err & !f2fs_exist_written_data(sbi, inode->i_ino, ORPHAN_INO)). + * In that case, f2fs_check_nid_range() is enough to give a clue. + */ } out_clear: fscrypt_put_encryption_info(inode, NULL); @@ -597,7 +622,7 @@ out_clear: } /* caller should call f2fs_lock_op() */ -void handle_failed_inode(struct inode *inode) +void f2fs_handle_failed_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct node_info ni; @@ -612,7 +637,7 @@ void handle_failed_inode(struct inode *inode) * we must call this to avoid inode being remained as dirty, resulting * in a panic when flushing dirty inodes in gdirty_list. */ - update_inode_page(inode); + f2fs_update_inode_page(inode); f2fs_inode_synced(inode); /* don't make bad inode, since it becomes a regular file. */ @@ -623,18 +648,18 @@ void handle_failed_inode(struct inode *inode) * so we can prevent losing this orphan when encoutering checkpoint * and following suddenly power-off. */ - get_node_info(sbi, inode->i_ino, &ni); + f2fs_get_node_info(sbi, inode->i_ino, &ni); if (ni.blk_addr != NULL_ADDR) { - int err = acquire_orphan_inode(sbi); + int err = f2fs_acquire_orphan_inode(sbi); if (err) { set_sbi_flag(sbi, SBI_NEED_FSCK); f2fs_msg(sbi->sb, KERN_WARNING, "Too many orphan inodes, run fsck to fix."); } else { - add_orphan_inode(inode); + f2fs_add_orphan_inode(inode); } - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); } else { set_inode_flag(inode, FI_FREE_NID); } diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index 5ec20f077629..bdd0a7f230ca 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -37,7 +37,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) return ERR_PTR(-ENOMEM); f2fs_lock_op(sbi); - if (!alloc_nid(sbi, &ino)) { + if (!f2fs_alloc_nid(sbi, &ino)) { f2fs_unlock_op(sbi); err = -ENOSPC; goto fail; @@ -54,6 +54,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) F2FS_I(inode)->i_crtime = current_time(inode); inode->i_generation = sbi->s_next_generation++; + if (S_ISDIR(inode->i_mode)) + F2FS_I(inode)->i_current_depth = 1; + err = insert_inode_locked(inode); if (err) { err = -EINVAL; @@ -61,7 +64,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) } if (f2fs_sb_has_project_quota(sbi->sb) && - (F2FS_I(dir)->i_flags & FS_PROJINHERIT_FL)) + (F2FS_I(dir)->i_flags & F2FS_PROJINHERIT_FL)) F2FS_I(inode)->i_projid = F2FS_I(dir)->i_projid; else F2FS_I(inode)->i_projid = make_kprojid(&init_user_ns, @@ -116,9 +119,9 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode) f2fs_mask_flags(mode, F2FS_I(dir)->i_flags & F2FS_FL_INHERITED); if (S_ISDIR(inode->i_mode)) - F2FS_I(inode)->i_flags |= FS_INDEX_FL; + F2FS_I(inode)->i_flags |= F2FS_INDEX_FL; - if (F2FS_I(inode)->i_flags & FS_PROJINHERIT_FL) + if (F2FS_I(inode)->i_flags & F2FS_PROJINHERIT_FL) set_inode_flag(inode, FI_PROJ_INHERIT); trace_f2fs_new_inode(inode, 0); @@ -193,7 +196,7 @@ static inline void set_file_temperature(struct f2fs_sb_info *sbi, struct inode * up_read(&sbi->sb_lock); } -int update_extension_list(struct f2fs_sb_info *sbi, const char *name, +int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name, bool hot, bool set) { __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list; @@ -292,7 +295,7 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, goto out; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, ino); + f2fs_alloc_nid_done(sbi, ino); d_instantiate(dentry, inode); unlock_new_inode(inode); @@ -303,7 +306,7 @@ static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode, f2fs_balance_fs(sbi, true); return 0; out: - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } @@ -398,7 +401,7 @@ static int __recover_dot_dentries(struct inode *dir, nid_t pino) err = PTR_ERR(page); goto out; } else { - err = __f2fs_add_link(dir, &dot, NULL, dir->i_ino, S_IFDIR); + err = f2fs_do_add_link(dir, &dot, NULL, dir->i_ino, S_IFDIR); if (err) goto out; } @@ -409,7 +412,7 @@ static int __recover_dot_dentries(struct inode *dir, nid_t pino) else if (IS_ERR(page)) err = PTR_ERR(page); else - err = __f2fs_add_link(dir, &dotdot, NULL, pino, S_IFDIR); + err = f2fs_do_add_link(dir, &dotdot, NULL, pino, S_IFDIR); out: if (!err) clear_inode_flag(dir, FI_INLINE_DOTS); @@ -521,7 +524,7 @@ static int f2fs_unlink(struct inode *dir, struct dentry *dentry) f2fs_balance_fs(sbi, true); f2fs_lock_op(sbi); - err = acquire_orphan_inode(sbi); + err = f2fs_acquire_orphan_inode(sbi); if (err) { f2fs_unlock_op(sbi); f2fs_put_page(page, 0); @@ -583,9 +586,9 @@ static int f2fs_symlink(struct inode *dir, struct dentry *dentry, f2fs_lock_op(sbi); err = f2fs_add_link(dentry, inode); if (err) - goto out_handle_failed_inode; + goto out_f2fs_handle_failed_inode; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); err = fscrypt_encrypt_symlink(inode, symname, len, &disk_link); if (err) @@ -619,8 +622,8 @@ err_out: f2fs_balance_fs(sbi, true); goto out_free_encrypted_link; -out_handle_failed_inode: - handle_failed_inode(inode); +out_f2fs_handle_failed_inode: + f2fs_handle_failed_inode(inode); out_free_encrypted_link: if (disk_link.name != (unsigned char *)symname) kfree(disk_link.name); @@ -656,7 +659,7 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) goto out_fail; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); d_instantiate(dentry, inode); unlock_new_inode(inode); @@ -669,7 +672,7 @@ static int f2fs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) out_fail: clear_inode_flag(inode, FI_INC_LINK); - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } @@ -708,7 +711,7 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, goto out; f2fs_unlock_op(sbi); - alloc_nid_done(sbi, inode->i_ino); + f2fs_alloc_nid_done(sbi, inode->i_ino); d_instantiate(dentry, inode); unlock_new_inode(inode); @@ -719,7 +722,7 @@ static int f2fs_mknod(struct inode *dir, struct dentry *dentry, f2fs_balance_fs(sbi, true); return 0; out: - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } @@ -748,7 +751,7 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, } f2fs_lock_op(sbi); - err = acquire_orphan_inode(sbi); + err = f2fs_acquire_orphan_inode(sbi); if (err) goto out; @@ -760,8 +763,8 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, * add this non-linked tmpfile to orphan list, in this way we could * remove all unused data of tmpfile after abnormal power-off. */ - add_orphan_inode(inode); - alloc_nid_done(sbi, inode->i_ino); + f2fs_add_orphan_inode(inode); + f2fs_alloc_nid_done(sbi, inode->i_ino); if (whiteout) { f2fs_i_links_write(inode, false); @@ -777,9 +780,9 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry *dentry, return 0; release_out: - release_orphan_inode(sbi); + f2fs_release_orphan_inode(sbi); out: - handle_failed_inode(inode); + f2fs_handle_failed_inode(inode); return err; } @@ -886,7 +889,7 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, f2fs_lock_op(sbi); - err = acquire_orphan_inode(sbi); + err = f2fs_acquire_orphan_inode(sbi); if (err) goto put_out_dir; @@ -900,9 +903,9 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, up_write(&F2FS_I(new_inode)->i_sem); if (!new_inode->i_nlink) - add_orphan_inode(new_inode); + f2fs_add_orphan_inode(new_inode); else - release_orphan_inode(sbi); + f2fs_release_orphan_inode(sbi); } else { f2fs_balance_fs(sbi, true); @@ -970,8 +973,12 @@ static int f2fs_rename(struct inode *old_dir, struct dentry *old_dentry, f2fs_put_page(old_dir_page, 0); f2fs_i_links_write(old_dir, false); } - if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) - add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO); + if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) { + f2fs_add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO); + if (S_ISDIR(old_inode->i_mode)) + f2fs_add_ino_entry(sbi, old_inode->i_ino, + TRANS_DIR_INO); + } f2fs_unlock_op(sbi); @@ -1122,8 +1129,8 @@ static int f2fs_cross_rename(struct inode *old_dir, struct dentry *old_dentry, f2fs_mark_inode_dirty_sync(new_dir, false); if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) { - add_ino_entry(sbi, old_dir->i_ino, TRANS_DIR_INO); - add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO); + f2fs_add_ino_entry(sbi, old_dir->i_ino, TRANS_DIR_INO); + f2fs_add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO); } f2fs_unlock_op(sbi); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 16aee2a7b8a9..b72fac4766a9 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -23,13 +23,28 @@ #include "trace.h" #include <trace/events/f2fs.h> -#define on_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock) +#define on_f2fs_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock) static struct kmem_cache *nat_entry_slab; static struct kmem_cache *free_nid_slab; static struct kmem_cache *nat_entry_set_slab; -bool available_free_memory(struct f2fs_sb_info *sbi, int type) +/* + * Check whether the given nid is within node id range. + */ +int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid) +{ + if (unlikely(nid < F2FS_ROOT_INO(sbi) || nid >= NM_I(sbi)->max_nid)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: out-of-range nid=%x, run fsck to fix.", + __func__, nid); + return -EINVAL; + } + return 0; +} + +bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct sysinfo val; @@ -87,18 +102,10 @@ bool available_free_memory(struct f2fs_sb_info *sbi, int type) static void clear_node_page_dirty(struct page *page) { - struct address_space *mapping = page->mapping; - unsigned int long flags; - if (PageDirty(page)) { - spin_lock_irqsave(&mapping->tree_lock, flags); - radix_tree_tag_clear(&mapping->page_tree, - page_index(page), - PAGECACHE_TAG_DIRTY); - spin_unlock_irqrestore(&mapping->tree_lock, flags); - + f2fs_clear_radix_tree_dirty_tag(page); clear_page_dirty_for_io(page); - dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES); + dec_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES); } ClearPageUptodate(page); } @@ -106,7 +113,7 @@ static void clear_node_page_dirty(struct page *page) static struct page *get_current_nat_page(struct f2fs_sb_info *sbi, nid_t nid) { pgoff_t index = current_nat_addr(sbi, nid); - return get_meta_page(sbi, index); + return f2fs_get_meta_page(sbi, index); } static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid) @@ -123,8 +130,8 @@ static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid) dst_off = next_nat_addr(sbi, src_off); /* get current nat block page with lock */ - src_page = get_meta_page(sbi, src_off); - dst_page = grab_meta_page(sbi, dst_off); + src_page = f2fs_get_meta_page(sbi, src_off); + dst_page = f2fs_grab_meta_page(sbi, dst_off); f2fs_bug_on(sbi, PageDirty(src_page)); src_addr = page_address(src_page); @@ -260,7 +267,7 @@ static unsigned int __gang_lookup_nat_set(struct f2fs_nm_info *nm_i, start, nr); } -int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid) +int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct nat_entry *e; @@ -277,7 +284,7 @@ int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid) return need; } -bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid) +bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct nat_entry *e; @@ -291,7 +298,7 @@ bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid) return is_cp; } -bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino) +bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct nat_entry *e; @@ -364,8 +371,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, new_blkaddr == NULL_ADDR); f2fs_bug_on(sbi, nat_get_blkaddr(e) == NEW_ADDR && new_blkaddr == NEW_ADDR); - f2fs_bug_on(sbi, nat_get_blkaddr(e) != NEW_ADDR && - nat_get_blkaddr(e) != NULL_ADDR && + f2fs_bug_on(sbi, is_valid_blkaddr(nat_get_blkaddr(e)) && new_blkaddr == NEW_ADDR); /* increment version no as node is removed */ @@ -376,7 +382,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, /* change address */ nat_set_blkaddr(e, new_blkaddr); - if (new_blkaddr == NEW_ADDR || new_blkaddr == NULL_ADDR) + if (!is_valid_blkaddr(new_blkaddr)) set_nat_flag(e, IS_CHECKPOINTED, false); __set_nat_cache_dirty(nm_i, e); @@ -391,7 +397,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, up_write(&nm_i->nat_tree_lock); } -int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) +int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) { struct f2fs_nm_info *nm_i = NM_I(sbi); int nr = nr_shrink; @@ -413,7 +419,8 @@ int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink) /* * This function always returns success */ -void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni) +void f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid, + struct node_info *ni) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA); @@ -443,7 +450,7 @@ void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni) /* Check current segment summary */ down_read(&curseg->journal_rwsem); - i = lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0); + i = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0); if (i >= 0) { ne = nat_in_journal(journal, i); node_info_from_raw_nat(ni, &ne); @@ -458,7 +465,7 @@ void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni) index = current_nat_addr(sbi, nid); up_read(&nm_i->nat_tree_lock); - page = get_meta_page(sbi, index); + page = f2fs_get_meta_page(sbi, index); nat_blk = (struct f2fs_nat_block *)page_address(page); ne = nat_blk->entries[nid - start_nid]; node_info_from_raw_nat(ni, &ne); @@ -471,7 +478,7 @@ cache: /* * readahead MAX_RA_NODE number of node pages. */ -static void ra_node_pages(struct page *parent, int start, int n) +static void f2fs_ra_node_pages(struct page *parent, int start, int n) { struct f2fs_sb_info *sbi = F2FS_P_SB(parent); struct blk_plug plug; @@ -485,13 +492,13 @@ static void ra_node_pages(struct page *parent, int start, int n) end = min(end, NIDS_PER_BLOCK); for (i = start; i < end; i++) { nid = get_nid(parent, i, false); - ra_node_page(sbi, nid); + f2fs_ra_node_page(sbi, nid); } blk_finish_plug(&plug); } -pgoff_t get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs) +pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs) { const long direct_index = ADDRS_PER_INODE(dn->inode); const long direct_blks = ADDRS_PER_BLOCK; @@ -606,7 +613,7 @@ got: * f2fs_unlock_op() only if ro is not set RDONLY_NODE. * In the case of RDONLY_NODE, we don't need to care about mutex. */ -int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) +int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct page *npage[4]; @@ -625,7 +632,7 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) npage[0] = dn->inode_page; if (!npage[0]) { - npage[0] = get_node_page(sbi, nids[0]); + npage[0] = f2fs_get_node_page(sbi, nids[0]); if (IS_ERR(npage[0])) return PTR_ERR(npage[0]); } @@ -649,24 +656,24 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) if (!nids[i] && mode == ALLOC_NODE) { /* alloc new node */ - if (!alloc_nid(sbi, &(nids[i]))) { + if (!f2fs_alloc_nid(sbi, &(nids[i]))) { err = -ENOSPC; goto release_pages; } dn->nid = nids[i]; - npage[i] = new_node_page(dn, noffset[i]); + npage[i] = f2fs_new_node_page(dn, noffset[i]); if (IS_ERR(npage[i])) { - alloc_nid_failed(sbi, nids[i]); + f2fs_alloc_nid_failed(sbi, nids[i]); err = PTR_ERR(npage[i]); goto release_pages; } set_nid(parent, offset[i - 1], nids[i], i == 1); - alloc_nid_done(sbi, nids[i]); + f2fs_alloc_nid_done(sbi, nids[i]); done = true; } else if (mode == LOOKUP_NODE_RA && i == level && level > 1) { - npage[i] = get_node_page_ra(parent, offset[i - 1]); + npage[i] = f2fs_get_node_page_ra(parent, offset[i - 1]); if (IS_ERR(npage[i])) { err = PTR_ERR(npage[i]); goto release_pages; @@ -681,7 +688,7 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode) } if (!done) { - npage[i] = get_node_page(sbi, nids[i]); + npage[i] = f2fs_get_node_page(sbi, nids[i]); if (IS_ERR(npage[i])) { err = PTR_ERR(npage[i]); f2fs_put_page(npage[0], 0); @@ -720,15 +727,15 @@ static void truncate_node(struct dnode_of_data *dn) struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct node_info ni; - get_node_info(sbi, dn->nid, &ni); + f2fs_get_node_info(sbi, dn->nid, &ni); /* Deallocate node address */ - invalidate_blocks(sbi, ni.blk_addr); + f2fs_invalidate_blocks(sbi, ni.blk_addr); dec_valid_node_count(sbi, dn->inode, dn->nid == dn->inode->i_ino); set_node_addr(sbi, &ni, NULL_ADDR, false); if (dn->nid == dn->inode->i_ino) { - remove_orphan_inode(sbi, dn->nid); + f2fs_remove_orphan_inode(sbi, dn->nid); dec_valid_inode_count(sbi); f2fs_inode_synced(dn->inode); } @@ -753,7 +760,7 @@ static int truncate_dnode(struct dnode_of_data *dn) return 1; /* get direct node */ - page = get_node_page(F2FS_I_SB(dn->inode), dn->nid); + page = f2fs_get_node_page(F2FS_I_SB(dn->inode), dn->nid); if (IS_ERR(page) && PTR_ERR(page) == -ENOENT) return 1; else if (IS_ERR(page)) @@ -762,7 +769,7 @@ static int truncate_dnode(struct dnode_of_data *dn) /* Make dnode_of_data for parameter */ dn->node_page = page; dn->ofs_in_node = 0; - truncate_data_blocks(dn); + f2fs_truncate_data_blocks(dn); truncate_node(dn); return 1; } @@ -783,13 +790,13 @@ static int truncate_nodes(struct dnode_of_data *dn, unsigned int nofs, trace_f2fs_truncate_nodes_enter(dn->inode, dn->nid, dn->data_blkaddr); - page = get_node_page(F2FS_I_SB(dn->inode), dn->nid); + page = f2fs_get_node_page(F2FS_I_SB(dn->inode), dn->nid); if (IS_ERR(page)) { trace_f2fs_truncate_nodes_exit(dn->inode, PTR_ERR(page)); return PTR_ERR(page); } - ra_node_pages(page, ofs, NIDS_PER_BLOCK); + f2fs_ra_node_pages(page, ofs, NIDS_PER_BLOCK); rn = F2FS_NODE(page); if (depth < 3) { @@ -859,7 +866,7 @@ static int truncate_partial_nodes(struct dnode_of_data *dn, /* get indirect nodes in the path */ for (i = 0; i < idx + 1; i++) { /* reference count'll be increased */ - pages[i] = get_node_page(F2FS_I_SB(dn->inode), nid[i]); + pages[i] = f2fs_get_node_page(F2FS_I_SB(dn->inode), nid[i]); if (IS_ERR(pages[i])) { err = PTR_ERR(pages[i]); idx = i - 1; @@ -868,7 +875,7 @@ static int truncate_partial_nodes(struct dnode_of_data *dn, nid[i + 1] = get_nid(pages[i], offset[i + 1], false); } - ra_node_pages(pages[idx], offset[idx + 1], NIDS_PER_BLOCK); + f2fs_ra_node_pages(pages[idx], offset[idx + 1], NIDS_PER_BLOCK); /* free direct nodes linked to a partial indirect node */ for (i = offset[idx + 1]; i < NIDS_PER_BLOCK; i++) { @@ -905,7 +912,7 @@ fail: /* * All the block addresses of data and nodes should be nullified. */ -int truncate_inode_blocks(struct inode *inode, pgoff_t from) +int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); int err = 0, cont = 1; @@ -921,7 +928,7 @@ int truncate_inode_blocks(struct inode *inode, pgoff_t from) if (level < 0) return level; - page = get_node_page(sbi, inode->i_ino); + page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(page)) { trace_f2fs_truncate_inode_blocks_exit(inode, PTR_ERR(page)); return PTR_ERR(page); @@ -1001,7 +1008,7 @@ fail: } /* caller must lock inode page */ -int truncate_xattr_node(struct inode *inode) +int f2fs_truncate_xattr_node(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); nid_t nid = F2FS_I(inode)->i_xattr_nid; @@ -1011,7 +1018,7 @@ int truncate_xattr_node(struct inode *inode) if (!nid) return 0; - npage = get_node_page(sbi, nid); + npage = f2fs_get_node_page(sbi, nid); if (IS_ERR(npage)) return PTR_ERR(npage); @@ -1026,17 +1033,17 @@ int truncate_xattr_node(struct inode *inode) * Caller should grab and release a rwsem by calling f2fs_lock_op() and * f2fs_unlock_op(). */ -int remove_inode_page(struct inode *inode) +int f2fs_remove_inode_page(struct inode *inode) { struct dnode_of_data dn; int err; set_new_dnode(&dn, inode, NULL, NULL, inode->i_ino); - err = get_dnode_of_data(&dn, 0, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, 0, LOOKUP_NODE); if (err) return err; - err = truncate_xattr_node(inode); + err = f2fs_truncate_xattr_node(inode); if (err) { f2fs_put_dnode(&dn); return err; @@ -1045,7 +1052,7 @@ int remove_inode_page(struct inode *inode) /* remove potential inline_data blocks */ if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) - truncate_data_blocks_range(&dn, 1); + f2fs_truncate_data_blocks_range(&dn, 1); /* 0 is possible, after f2fs_new_inode() has failed */ f2fs_bug_on(F2FS_I_SB(inode), @@ -1056,7 +1063,7 @@ int remove_inode_page(struct inode *inode) return 0; } -struct page *new_inode_page(struct inode *inode) +struct page *f2fs_new_inode_page(struct inode *inode) { struct dnode_of_data dn; @@ -1064,10 +1071,10 @@ struct page *new_inode_page(struct inode *inode) set_new_dnode(&dn, inode, NULL, NULL, inode->i_ino); /* caller should f2fs_put_page(page, 1); */ - return new_node_page(&dn, 0); + return f2fs_new_node_page(&dn, 0); } -struct page *new_node_page(struct dnode_of_data *dn, unsigned int ofs) +struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) { struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode); struct node_info new_ni; @@ -1085,7 +1092,7 @@ struct page *new_node_page(struct dnode_of_data *dn, unsigned int ofs) goto fail; #ifdef CONFIG_F2FS_CHECK_FS - get_node_info(sbi, dn->nid, &new_ni); + f2fs_get_node_info(sbi, dn->nid, &new_ni); f2fs_bug_on(sbi, new_ni.blk_addr != NULL_ADDR); #endif new_ni.nid = dn->nid; @@ -1137,7 +1144,7 @@ static int read_node_page(struct page *page, int op_flags) if (PageUptodate(page)) return LOCKED_PAGE; - get_node_info(sbi, page->index, &ni); + f2fs_get_node_info(sbi, page->index, &ni); if (unlikely(ni.blk_addr == NULL_ADDR)) { ClearPageUptodate(page); @@ -1151,14 +1158,15 @@ static int read_node_page(struct page *page, int op_flags) /* * Readahead a node page */ -void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid) +void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid) { struct page *apage; int err; if (!nid) return; - f2fs_bug_on(sbi, check_nid_range(sbi, nid)); + if (f2fs_check_nid_range(sbi, nid)) + return; rcu_read_lock(); apage = radix_tree_lookup(&NODE_MAPPING(sbi)->page_tree, nid); @@ -1182,7 +1190,8 @@ static struct page *__get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid, if (!nid) return ERR_PTR(-ENOENT); - f2fs_bug_on(sbi, check_nid_range(sbi, nid)); + if (f2fs_check_nid_range(sbi, nid)) + return ERR_PTR(-EINVAL); repeat: page = f2fs_grab_cache_page(NODE_MAPPING(sbi), nid, false); if (!page) @@ -1198,7 +1207,7 @@ repeat: } if (parent) - ra_node_pages(parent, start + 1, MAX_RA_NODE); + f2fs_ra_node_pages(parent, start + 1, MAX_RA_NODE); lock_page(page); @@ -1232,12 +1241,12 @@ out_err: return page; } -struct page *get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid) +struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid) { return __get_node_page(sbi, nid, NULL, 0); } -struct page *get_node_page_ra(struct page *parent, int start) +struct page *f2fs_get_node_page_ra(struct page *parent, int start) { struct f2fs_sb_info *sbi = F2FS_P_SB(parent); nid_t nid = get_nid(parent, start, false); @@ -1272,7 +1281,7 @@ static void flush_inline_data(struct f2fs_sb_info *sbi, nid_t ino) ret = f2fs_write_inline_data(inode, page); inode_dec_dirty_pages(inode); - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); if (ret) set_page_dirty(page); page_out: @@ -1283,21 +1292,17 @@ iput_out: static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino) { - pgoff_t index, end; + pgoff_t index; struct pagevec pvec; struct page *last_page = NULL; + int nr_pages; pagevec_init(&pvec, 0); index = 0; - end = ULONG_MAX; - - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) - break; + + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, + PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; @@ -1363,11 +1368,8 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, trace_f2fs_writepage(page, NODE); - if (unlikely(f2fs_cp_error(sbi))) { - dec_page_count(sbi, F2FS_DIRTY_NODES); - unlock_page(page); - return 0; - } + if (unlikely(f2fs_cp_error(sbi))) + goto redirty_out; if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) goto redirty_out; @@ -1383,7 +1385,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, down_read(&sbi->node_write); } - get_node_info(sbi, nid, &ni); + f2fs_get_node_info(sbi, nid, &ni); /* This page is already truncated */ if (unlikely(ni.blk_addr == NULL_ADDR)) { @@ -1400,7 +1402,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, set_page_writeback(page); ClearPageError(page); fio.old_blkaddr = ni.blk_addr; - write_node_page(nid, &fio); + f2fs_do_write_node_page(nid, &fio); set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page)); dec_page_count(sbi, F2FS_DIRTY_NODES); up_read(&sbi->node_write); @@ -1429,7 +1431,7 @@ redirty_out: return AOP_WRITEPAGE_ACTIVATE; } -void move_node_page(struct page *node_page, int gc_type) +void f2fs_move_node_page(struct page *node_page, int gc_type) { if (gc_type == FG_GC) { struct writeback_control wbc = { @@ -1466,16 +1468,17 @@ static int f2fs_write_node_page(struct page *page, return __write_node_page(page, false, NULL, wbc, false, FS_NODE_IO); } -int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, +int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, struct writeback_control *wbc, bool atomic) { - pgoff_t index, end; + pgoff_t index; pgoff_t last_idx = ULONG_MAX; struct pagevec pvec; int ret = 0; struct page *last_page = NULL; bool marked = false; nid_t ino = inode->i_ino; + int nr_pages; if (atomic) { last_page = last_fsync_dnode(sbi, ino); @@ -1485,15 +1488,10 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode, retry: pagevec_init(&pvec, 0); index = 0; - end = ULONG_MAX; - - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) - break; + + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, + PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; @@ -1537,9 +1535,9 @@ continue_unlock: if (IS_INODE(page)) { if (is_inode_flag_set(inode, FI_DIRTY_INODE)) - update_inode(inode, page); + f2fs_update_inode(inode, page); set_dentry_mark(page, - need_dentry_mark(sbi, ino)); + f2fs_need_dentry_mark(sbi, ino)); } /* may be written by other thread */ if (!PageDirty(page)) @@ -1589,33 +1587,37 @@ out: return ret ? -EIO: 0; } -int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc, +int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, + struct writeback_control *wbc, bool do_balance, enum iostat_type io_type) { - pgoff_t index, end; + pgoff_t index; struct pagevec pvec; int step = 0; int nwritten = 0; int ret = 0; + int nr_pages, done = 0; pagevec_init(&pvec, 0); next_step: index = 0; - end = ULONG_MAX; - - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_DIRTY, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) - break; + + while (!done && (nr_pages = pagevec_lookup_tag(&pvec, + NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; bool submitted = false; + /* give a priority to WB_SYNC threads */ + if (atomic_read(&sbi->wb_sync_req[NODE]) && + wbc->sync_mode == WB_SYNC_NONE) { + done = 1; + break; + } + /* * flushing sequence with step: * 0. indirect nodes @@ -1694,29 +1696,22 @@ continue_unlock: return ret; } -int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino) +int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino) { - pgoff_t index = 0, end = ULONG_MAX; + pgoff_t index = 0; struct pagevec pvec; int ret2 = 0, ret = 0; + int nr_pages; pagevec_init(&pvec, 0); - while (index <= end) { - int i, nr_pages; - nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, - PAGECACHE_TAG_WRITEBACK, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); - if (nr_pages == 0) - break; + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index, + PAGECACHE_TAG_WRITEBACK))) { + int i; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; - /* until radix tree lookup accepts end_index */ - if (unlikely(page->index > end)) - continue; - if (ino && ino_of_node(page) == ino) { f2fs_wait_on_page_writeback(page, NODE, true); if (TestClearPageError(page)) @@ -1753,14 +1748,21 @@ static int f2fs_write_node_pages(struct address_space *mapping, if (get_pages(sbi, F2FS_DIRTY_NODES) < nr_pages_to_skip(sbi, NODE)) goto skip_write; + if (wbc->sync_mode == WB_SYNC_ALL) + atomic_inc(&sbi->wb_sync_req[NODE]); + else if (atomic_read(&sbi->wb_sync_req[NODE])) + goto skip_write; + trace_f2fs_writepages(mapping->host, wbc, NODE); diff = nr_pages_to_write(sbi, NODE, wbc); - wbc->sync_mode = WB_SYNC_NONE; blk_start_plug(&plug); - sync_node_pages(sbi, wbc, true, FS_NODE_IO); + f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO); blk_finish_plug(&plug); wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff); + + if (wbc->sync_mode == WB_SYNC_ALL) + atomic_dec(&sbi->wb_sync_req[NODE]); return 0; skip_write: @@ -1906,20 +1908,20 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, * Thread A Thread B * - f2fs_create * - f2fs_new_inode - * - alloc_nid + * - f2fs_alloc_nid * - __insert_nid_to_list(PREALLOC_NID) * - f2fs_balance_fs_bg - * - build_free_nids - * - __build_free_nids + * - f2fs_build_free_nids + * - __f2fs_build_free_nids * - scan_nat_page * - add_free_nid * - __lookup_nat_cache * - f2fs_add_link - * - init_inode_metadata - * - new_inode_page - * - new_node_page + * - f2fs_init_inode_metadata + * - f2fs_new_inode_page + * - f2fs_new_node_page * - set_node_addr - * - alloc_nid_done + * - f2fs_alloc_nid_done * - __remove_nid_from_list(PREALLOC_NID) * - __insert_nid_to_list(FREE_NID) */ @@ -2051,7 +2053,8 @@ out: up_read(&nm_i->nat_tree_lock); } -static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) +static void __f2fs_build_free_nids(struct f2fs_sb_info *sbi, + bool sync, bool mount) { struct f2fs_nm_info *nm_i = NM_I(sbi); int i = 0; @@ -2064,7 +2067,7 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK) return; - if (!sync && !available_free_memory(sbi, FREE_NIDS)) + if (!sync && !f2fs_available_free_memory(sbi, FREE_NIDS)) return; if (!mount) { @@ -2076,7 +2079,7 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) } /* readahead nat pages to be scanned */ - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES, + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES, META_NAT, true); down_read(&nm_i->nat_tree_lock); @@ -2106,14 +2109,14 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) up_read(&nm_i->nat_tree_lock); - ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nm_i->next_scan_nid), + f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nm_i->next_scan_nid), nm_i->ra_nid_pages, META_NAT, false); } -void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) +void f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) { mutex_lock(&NM_I(sbi)->build_lock); - __build_free_nids(sbi, sync, mount); + __f2fs_build_free_nids(sbi, sync, mount); mutex_unlock(&NM_I(sbi)->build_lock); } @@ -2122,7 +2125,7 @@ void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) * from second parameter of this function. * The returned nid could be used ino as well as nid when inode is created. */ -bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid) +bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i = NULL; @@ -2140,8 +2143,8 @@ retry: return false; } - /* We should not use stale free nids created by build_free_nids */ - if (nm_i->nid_cnt[FREE_NID] && !on_build_free_nids(nm_i)) { + /* We should not use stale free nids created by f2fs_build_free_nids */ + if (nm_i->nid_cnt[FREE_NID] && !on_f2fs_build_free_nids(nm_i)) { f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list)); i = list_first_entry(&nm_i->free_nid_list, struct free_nid, list); @@ -2158,14 +2161,14 @@ retry: spin_unlock(&nm_i->nid_list_lock); /* Let's scan nat pages and its caches to get free nids */ - build_free_nids(sbi, true, false); + f2fs_build_free_nids(sbi, true, false); goto retry; } /* - * alloc_nid() should be called prior to this function. + * f2fs_alloc_nid() should be called prior to this function. */ -void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid) +void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i; @@ -2180,9 +2183,9 @@ void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid) } /* - * alloc_nid() should be called prior to this function. + * f2fs_alloc_nid() should be called prior to this function. */ -void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid) +void f2fs_alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i; @@ -2195,7 +2198,7 @@ void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid) i = __lookup_free_nid_list(nm_i, nid); f2fs_bug_on(sbi, !i); - if (!available_free_memory(sbi, FREE_NIDS)) { + if (!f2fs_available_free_memory(sbi, FREE_NIDS)) { __remove_free_nid(sbi, i, PREALLOC_NID); need_free = true; } else { @@ -2212,7 +2215,7 @@ void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid) kmem_cache_free(free_nid_slab, i); } -int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink) +int f2fs_try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i, *next; @@ -2240,14 +2243,14 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink) return nr - nr_shrink; } -void recover_inline_xattr(struct inode *inode, struct page *page) +void f2fs_recover_inline_xattr(struct inode *inode, struct page *page) { void *src_addr, *dst_addr; size_t inline_size; struct page *ipage; struct f2fs_inode *ri; - ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino); + ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); f2fs_bug_on(F2FS_I_SB(inode), IS_ERR(ipage)); ri = F2FS_INODE(page); @@ -2265,11 +2268,11 @@ void recover_inline_xattr(struct inode *inode, struct page *page) f2fs_wait_on_page_writeback(ipage, NODE, true); memcpy(dst_addr, src_addr, inline_size); update_inode: - update_inode(inode, ipage); + f2fs_update_inode(inode, ipage); f2fs_put_page(ipage, 1); } -int recover_xattr_data(struct inode *inode, struct page *page) +int f2fs_recover_xattr_data(struct inode *inode, struct page *page) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); nid_t prev_xnid = F2FS_I(inode)->i_xattr_nid; @@ -2282,25 +2285,25 @@ int recover_xattr_data(struct inode *inode, struct page *page) goto recover_xnid; /* 1: invalidate the previous xattr nid */ - get_node_info(sbi, prev_xnid, &ni); - invalidate_blocks(sbi, ni.blk_addr); + f2fs_get_node_info(sbi, prev_xnid, &ni); + f2fs_invalidate_blocks(sbi, ni.blk_addr); dec_valid_node_count(sbi, inode, false); set_node_addr(sbi, &ni, NULL_ADDR, false); recover_xnid: /* 2: update xattr nid in inode */ - if (!alloc_nid(sbi, &new_xnid)) + if (!f2fs_alloc_nid(sbi, &new_xnid)) return -ENOSPC; set_new_dnode(&dn, inode, NULL, NULL, new_xnid); - xpage = new_node_page(&dn, XATTR_NODE_OFFSET); + xpage = f2fs_new_node_page(&dn, XATTR_NODE_OFFSET); if (IS_ERR(xpage)) { - alloc_nid_failed(sbi, new_xnid); + f2fs_alloc_nid_failed(sbi, new_xnid); return PTR_ERR(xpage); } - alloc_nid_done(sbi, new_xnid); - update_inode_page(inode); + f2fs_alloc_nid_done(sbi, new_xnid); + f2fs_update_inode_page(inode); /* 3: update and set xattr node page dirty */ memcpy(F2FS_NODE(xpage), F2FS_NODE(page), VALID_XATTR_BLOCK_SIZE); @@ -2311,14 +2314,14 @@ recover_xnid: return 0; } -int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) +int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page) { struct f2fs_inode *src, *dst; nid_t ino = ino_of_node(page); struct node_info old_ni, new_ni; struct page *ipage; - get_node_info(sbi, ino, &old_ni); + f2fs_get_node_info(sbi, ino, &old_ni); if (unlikely(old_ni.blk_addr != NULL_ADDR)) return -EINVAL; @@ -2372,7 +2375,7 @@ retry: return 0; } -void restore_node_summary(struct f2fs_sb_info *sbi, +void f2fs_restore_node_summary(struct f2fs_sb_info *sbi, unsigned int segno, struct f2fs_summary_block *sum) { struct f2fs_node *rn; @@ -2389,10 +2392,10 @@ void restore_node_summary(struct f2fs_sb_info *sbi, nrpages = min(last_offset - i, BIO_MAX_PAGES); /* readahead node pages */ - ra_meta_pages(sbi, addr, nrpages, META_POR, true); + f2fs_ra_meta_pages(sbi, addr, nrpages, META_POR, true); for (idx = addr; idx < addr + nrpages; idx++) { - struct page *page = get_tmp_page(sbi, idx); + struct page *page = f2fs_get_tmp_page(sbi, idx); rn = F2FS_NODE(page); sum_entry->nid = rn->footer.nid; @@ -2534,7 +2537,7 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi, f2fs_bug_on(sbi, nat_get_blkaddr(ne) == NEW_ADDR); if (to_journal) { - offset = lookup_journal_in_cursum(journal, + offset = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 1); f2fs_bug_on(sbi, offset < 0); raw_ne = &nat_in_journal(journal, offset); @@ -2571,7 +2574,7 @@ static void __flush_nat_entry_set(struct f2fs_sb_info *sbi, /* * This function is called during the checkpointing process. */ -void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc) +void f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA); @@ -2634,7 +2637,7 @@ static int __get_nat_bitmaps(struct f2fs_sb_info *sbi) nat_bits_addr = __start_cp_addr(sbi) + sbi->blocks_per_seg - nm_i->nat_bits_blocks; for (i = 0; i < nm_i->nat_bits_blocks; i++) { - struct page *page = get_meta_page(sbi, nat_bits_addr++); + struct page *page = f2fs_get_meta_page(sbi, nat_bits_addr++); memcpy(nm_i->nat_bits + (i << F2FS_BLKSIZE_BITS), page_address(page), F2FS_BLKSIZE); @@ -2753,8 +2756,10 @@ static int init_free_nid_cache(struct f2fs_sb_info *sbi) struct f2fs_nm_info *nm_i = NM_I(sbi); int i; - nm_i->free_nid_bitmap = f2fs_kzalloc(sbi, nm_i->nat_blocks * - sizeof(unsigned char *), GFP_KERNEL); + nm_i->free_nid_bitmap = + f2fs_kzalloc(sbi, array_size(sizeof(unsigned char *), + nm_i->nat_blocks), + GFP_KERNEL); if (!nm_i->free_nid_bitmap) return -ENOMEM; @@ -2770,14 +2775,16 @@ static int init_free_nid_cache(struct f2fs_sb_info *sbi) if (!nm_i->nat_block_bitmap) return -ENOMEM; - nm_i->free_nid_count = f2fs_kvzalloc(sbi, nm_i->nat_blocks * - sizeof(unsigned short), GFP_KERNEL); + nm_i->free_nid_count = + f2fs_kvzalloc(sbi, array_size(sizeof(unsigned short), + nm_i->nat_blocks), + GFP_KERNEL); if (!nm_i->free_nid_count) return -ENOMEM; return 0; } -int build_node_manager(struct f2fs_sb_info *sbi) +int f2fs_build_node_manager(struct f2fs_sb_info *sbi) { int err; @@ -2797,11 +2804,11 @@ int build_node_manager(struct f2fs_sb_info *sbi) /* load free nid status from nat_bits table */ load_free_nid_bitmap(sbi); - build_free_nids(sbi, true, true); + f2fs_build_free_nids(sbi, true, true); return 0; } -void destroy_node_manager(struct f2fs_sb_info *sbi) +void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi) { struct f2fs_nm_info *nm_i = NM_I(sbi); struct free_nid *i, *next_i; @@ -2873,7 +2880,7 @@ void destroy_node_manager(struct f2fs_sb_info *sbi) kfree(nm_i); } -int __init create_node_manager_caches(void) +int __init f2fs_create_node_manager_caches(void) { nat_entry_slab = f2fs_kmem_cache_create("nat_entry", sizeof(struct nat_entry)); @@ -2899,7 +2906,7 @@ fail: return -ENOMEM; } -void destroy_node_manager_caches(void) +void f2fs_destroy_node_manager_caches(void) { kmem_cache_destroy(nat_entry_set_slab); kmem_cache_destroy(free_nid_slab); diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 4ddc2262baf1..daf81d416b89 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -47,7 +47,7 @@ static struct kmem_cache *fsync_entry_slab; -bool space_for_roll_forward(struct f2fs_sb_info *sbi) +bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi) { s64 nalloc = percpu_counter_sum_positive(&sbi->alloc_valid_block_count); @@ -162,7 +162,7 @@ retry: goto out_put; } - err = acquire_orphan_inode(F2FS_I_SB(inode)); + err = f2fs_acquire_orphan_inode(F2FS_I_SB(inode)); if (err) { iput(einode); goto out_put; @@ -173,7 +173,7 @@ retry: } else if (IS_ERR(page)) { err = PTR_ERR(page); } else { - err = __f2fs_do_add_link(dir, &fname, inode, + err = f2fs_add_dentry(dir, &fname, inode, inode->i_ino, inode->i_mode); } if (err == -ENOMEM) @@ -204,8 +204,6 @@ static void recover_inline_flags(struct inode *inode, struct f2fs_inode *ri) set_inode_flag(inode, FI_DATA_EXIST); else clear_inode_flag(inode, FI_DATA_EXIST); - if (!(ri->i_inline & F2FS_INLINE_DOTS)) - clear_inode_flag(inode, FI_INLINE_DOTS); } static void recover_inode(struct inode *inode, struct page *page) @@ -254,10 +252,10 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, while (1) { struct fsync_inode_entry *entry; - if (!is_valid_blkaddr(sbi, blkaddr, META_POR)) + if (!f2fs_is_valid_meta_blkaddr(sbi, blkaddr, META_POR)) return 0; - page = get_tmp_page(sbi, blkaddr); + page = f2fs_get_tmp_page(sbi, blkaddr); if (!is_recoverable_dnode(page)) break; @@ -271,7 +269,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, if (!check_only && IS_INODE(page) && is_dent_dnode(page)) { - err = recover_inode_page(sbi, page); + err = f2fs_recover_inode_page(sbi, page); if (err) break; quota_inode = true; @@ -312,7 +310,7 @@ next: blkaddr = next_blkaddr_of_node(page); f2fs_put_page(page, 1); - ra_meta_pages_cond(sbi, blkaddr); + f2fs_ra_meta_pages_cond(sbi, blkaddr); } f2fs_put_page(page, 1); return err; @@ -355,7 +353,7 @@ static int check_index_in_prev_nodes(struct f2fs_sb_info *sbi, } } - sum_page = get_sum_page(sbi, segno); + sum_page = f2fs_get_sum_page(sbi, segno); sum_node = (struct f2fs_summary_block *)page_address(sum_page); sum = sum_node->entries[blkoff]; f2fs_put_page(sum_page, 1); @@ -375,7 +373,7 @@ got_it: } /* Get the node page */ - node_page = get_node_page(sbi, nid); + node_page = f2fs_get_node_page(sbi, nid); if (IS_ERR(node_page)) return PTR_ERR(node_page); @@ -400,7 +398,8 @@ got_it: inode = dn->inode; } - bidx = start_bidx_of_node(offset, inode) + le16_to_cpu(sum.ofs_in_node); + bidx = f2fs_start_bidx_of_node(offset, inode) + + le16_to_cpu(sum.ofs_in_node); /* * if inode page is locked, unlock temporarily, but its reference @@ -410,11 +409,11 @@ got_it: unlock_page(dn->inode_page); set_new_dnode(&tdn, inode, NULL, NULL, 0); - if (get_dnode_of_data(&tdn, bidx, LOOKUP_NODE)) + if (f2fs_get_dnode_of_data(&tdn, bidx, LOOKUP_NODE)) goto out; if (tdn.data_blkaddr == blkaddr) - truncate_data_blocks_range(&tdn, 1); + f2fs_truncate_data_blocks_range(&tdn, 1); f2fs_put_dnode(&tdn); out: @@ -427,7 +426,7 @@ out: truncate_out: if (datablock_addr(tdn.inode, tdn.node_page, tdn.ofs_in_node) == blkaddr) - truncate_data_blocks_range(&tdn, 1); + f2fs_truncate_data_blocks_range(&tdn, 1); if (dn->inode->i_ino == nid && !dn->inode_page_locked) unlock_page(dn->inode_page); return 0; @@ -443,25 +442,25 @@ static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode, /* step 1: recover xattr */ if (IS_INODE(page)) { - recover_inline_xattr(inode, page); + f2fs_recover_inline_xattr(inode, page); } else if (f2fs_has_xattr_block(ofs_of_node(page))) { - err = recover_xattr_data(inode, page); + err = f2fs_recover_xattr_data(inode, page); if (!err) recovered++; goto out; } /* step 2: recover inline data */ - if (recover_inline_data(inode, page)) + if (f2fs_recover_inline_data(inode, page)) goto out; /* step 3: recover data indices */ - start = start_bidx_of_node(ofs_of_node(page), inode); + start = f2fs_start_bidx_of_node(ofs_of_node(page), inode); end = start + ADDRS_PER_PAGE(page, inode); set_new_dnode(&dn, inode, NULL, NULL, 0); retry_dn: - err = get_dnode_of_data(&dn, start, ALLOC_NODE); + err = f2fs_get_dnode_of_data(&dn, start, ALLOC_NODE); if (err) { if (err == -ENOMEM) { congestion_wait(BLK_RW_ASYNC, HZ/50); @@ -472,7 +471,7 @@ retry_dn: f2fs_wait_on_page_writeback(dn.node_page, NODE, true); - get_node_info(sbi, dn.nid, &ni); + f2fs_get_node_info(sbi, dn.nid, &ni); f2fs_bug_on(sbi, ni.ino != ino_of_node(page)); f2fs_bug_on(sbi, ofs_of_node(dn.node_page) != ofs_of_node(page)); @@ -488,7 +487,7 @@ retry_dn: /* dest is invalid, just invalidate src block */ if (dest == NULL_ADDR) { - truncate_data_blocks_range(&dn, 1); + f2fs_truncate_data_blocks_range(&dn, 1); continue; } @@ -502,19 +501,19 @@ retry_dn: * and then reserve one new block in dnode page. */ if (dest == NEW_ADDR) { - truncate_data_blocks_range(&dn, 1); - reserve_new_block(&dn); + f2fs_truncate_data_blocks_range(&dn, 1); + f2fs_reserve_new_block(&dn); continue; } /* dest is valid block, try to recover from src to dest */ - if (is_valid_blkaddr(sbi, dest, META_POR)) { + if (f2fs_is_valid_meta_blkaddr(sbi, dest, META_POR)) { if (src == NULL_ADDR) { - err = reserve_new_block(&dn); + err = f2fs_reserve_new_block(&dn); #ifdef CONFIG_F2FS_FAULT_INJECTION while (err) - err = reserve_new_block(&dn); + err = f2fs_reserve_new_block(&dn); #endif /* We should not get -ENOSPC */ f2fs_bug_on(sbi, err); @@ -569,12 +568,12 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list, while (1) { struct fsync_inode_entry *entry; - if (!is_valid_blkaddr(sbi, blkaddr, META_POR)) + if (!f2fs_is_valid_meta_blkaddr(sbi, blkaddr, META_POR)) break; - ra_meta_pages_cond(sbi, blkaddr); + f2fs_ra_meta_pages_cond(sbi, blkaddr); - page = get_tmp_page(sbi, blkaddr); + page = f2fs_get_tmp_page(sbi, blkaddr); if (!is_recoverable_dnode(page)) { f2fs_put_page(page, 1); @@ -612,11 +611,11 @@ next: f2fs_put_page(page, 1); } if (!err) - allocate_new_segments(sbi); + f2fs_allocate_new_segments(sbi); return err; } -int recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only) +int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only) { struct list_head inode_list; struct list_head dir_list; @@ -691,7 +690,7 @@ skip: struct cp_control cpc = { .reason = CP_RECOVERY, }; - err = write_checkpoint(sbi, &cpc); + err = f2fs_write_checkpoint(sbi, &cpc); } kmem_cache_destroy(fsync_entry_slab); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index a02d5c1a7ed2..3d0c42ef0474 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -169,7 +169,7 @@ found: return result - size + __reverse_ffz(tmp); } -bool need_SSR(struct f2fs_sb_info *sbi) +bool f2fs_need_SSR(struct f2fs_sb_info *sbi) { int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES); int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS); @@ -177,14 +177,14 @@ bool need_SSR(struct f2fs_sb_info *sbi) if (test_opt(sbi, LFS)) return false; - if (sbi->gc_thread && sbi->gc_thread->gc_urgent) + if (sbi->gc_mode == GC_URGENT) return true; return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs + SM_I(sbi)->min_ssr_sections + reserved_sections(sbi)); } -void register_inmem_page(struct inode *inode, struct page *page) +void f2fs_register_inmem_page(struct inode *inode, struct page *page) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode_info *fi = F2FS_I(inode); @@ -230,6 +230,8 @@ static int __revoke_inmem_pages(struct inode *inode, lock_page(page); + f2fs_wait_on_page_writeback(page, DATA, true); + if (recover) { struct dnode_of_data dn; struct node_info ni; @@ -237,7 +239,8 @@ static int __revoke_inmem_pages(struct inode *inode, trace_f2fs_commit_inmem_page(page, INMEM_REVOKE); retry: set_new_dnode(&dn, inode, NULL, NULL, 0); - err = get_dnode_of_data(&dn, page->index, LOOKUP_NODE); + err = f2fs_get_dnode_of_data(&dn, page->index, + LOOKUP_NODE); if (err) { if (err == -ENOMEM) { congestion_wait(BLK_RW_ASYNC, HZ/50); @@ -247,9 +250,9 @@ retry: err = -EAGAIN; goto next; } - get_node_info(sbi, dn.nid, &ni); + f2fs_get_node_info(sbi, dn.nid, &ni); if (cur->old_addr == NEW_ADDR) { - invalidate_blocks(sbi, dn.data_blkaddr); + f2fs_invalidate_blocks(sbi, dn.data_blkaddr); f2fs_update_data_blkaddr(&dn, NEW_ADDR); } else f2fs_replace_block(sbi, &dn, dn.data_blkaddr, @@ -271,7 +274,7 @@ next: return err; } -void drop_inmem_pages_all(struct f2fs_sb_info *sbi) +void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure) { struct list_head *head = &sbi->inode_list[ATOMIC_FILE]; struct inode *inode; @@ -287,15 +290,23 @@ next: spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); if (inode) { - drop_inmem_pages(inode); + if (gc_failure) { + if (fi->i_gc_failures[GC_FAILURE_ATOMIC]) + goto drop; + goto skip; + } +drop: + set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + f2fs_drop_inmem_pages(inode); iput(inode); } +skip: congestion_wait(BLK_RW_ASYNC, HZ/50); cond_resched(); goto next; } -void drop_inmem_pages(struct inode *inode) +void f2fs_drop_inmem_pages(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode_info *fi = F2FS_I(inode); @@ -309,11 +320,11 @@ void drop_inmem_pages(struct inode *inode) mutex_unlock(&fi->inmem_lock); clear_inode_flag(inode, FI_ATOMIC_FILE); - clear_inode_flag(inode, FI_HOT_DATA); + fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0; stat_dec_atomic_write(inode); } -void drop_inmem_page(struct inode *inode, struct page *page) +void f2fs_drop_inmem_page(struct inode *inode, struct page *page) { struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -328,7 +339,7 @@ void drop_inmem_page(struct inode *inode, struct page *page) break; } - f2fs_bug_on(sbi, !cur || cur->page != page); + f2fs_bug_on(sbi, list_empty(head) || cur->page != page); list_del(&cur->list); mutex_unlock(&fi->inmem_lock); @@ -343,8 +354,7 @@ void drop_inmem_page(struct inode *inode, struct page *page) trace_f2fs_commit_inmem_page(page, INMEM_INVALIDATE); } -static int __commit_inmem_pages(struct inode *inode, - struct list_head *revoke_list) +static int __f2fs_commit_inmem_pages(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode_info *fi = F2FS_I(inode); @@ -357,9 +367,12 @@ static int __commit_inmem_pages(struct inode *inode, .op_flags = REQ_SYNC | REQ_PRIO, .io_type = FS_DATA_IO, }; + struct list_head revoke_list; pgoff_t last_idx = ULONG_MAX; int err = 0; + INIT_LIST_HEAD(&revoke_list); + list_for_each_entry_safe(cur, tmp, &fi->inmem_pages, list) { struct page *page = cur->page; @@ -371,14 +384,14 @@ static int __commit_inmem_pages(struct inode *inode, f2fs_wait_on_page_writeback(page, DATA, true); if (clear_page_dirty_for_io(page)) { inode_dec_dirty_pages(inode); - remove_dirty_inode(inode); + f2fs_remove_dirty_inode(inode); } retry: fio.page = page; fio.old_blkaddr = NULL_ADDR; fio.encrypted_page = NULL; fio.need_lock = LOCK_DONE; - err = do_write_data_page(&fio); + err = f2fs_do_write_data_page(&fio); if (err) { if (err == -ENOMEM) { congestion_wait(BLK_RW_ASYNC, HZ/50); @@ -393,50 +406,46 @@ retry: last_idx = page->index; } unlock_page(page); - list_move_tail(&cur->list, revoke_list); + list_move_tail(&cur->list, &revoke_list); } if (last_idx != ULONG_MAX) f2fs_submit_merged_write_cond(sbi, inode, 0, last_idx, DATA); - if (!err) - __revoke_inmem_pages(inode, revoke_list, false, false); + if (err) { + /* + * try to revoke all committed pages, but still we could fail + * due to no memory or other reason, if that happened, EAGAIN + * will be returned, which means in such case, transaction is + * already not integrity, caller should use journal to do the + * recovery or rewrite & commit last transaction. For other + * error number, revoking was done by filesystem itself. + */ + err = __revoke_inmem_pages(inode, &revoke_list, false, true); + + /* drop all uncommitted pages */ + __revoke_inmem_pages(inode, &fi->inmem_pages, true, false); + } else { + __revoke_inmem_pages(inode, &revoke_list, false, false); + } return err; } -int commit_inmem_pages(struct inode *inode) +int f2fs_commit_inmem_pages(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_inode_info *fi = F2FS_I(inode); - struct list_head revoke_list; int err; - INIT_LIST_HEAD(&revoke_list); f2fs_balance_fs(sbi, true); f2fs_lock_op(sbi); set_inode_flag(inode, FI_ATOMIC_COMMIT); mutex_lock(&fi->inmem_lock); - err = __commit_inmem_pages(inode, &revoke_list); - if (err) { - int ret; - /* - * try to revoke all committed pages, but still we could fail - * due to no memory or other reason, if that happened, EAGAIN - * will be returned, which means in such case, transaction is - * already not integrity, caller should use journal to do the - * recovery or rewrite & commit last transaction. For other - * error number, revoking was done by filesystem itself. - */ - ret = __revoke_inmem_pages(inode, &revoke_list, false, true); - if (ret) - err = ret; + err = __f2fs_commit_inmem_pages(inode); - /* drop all uncommitted pages */ - __revoke_inmem_pages(inode, &fi->inmem_pages, true, false); - } spin_lock(&sbi->inode_lock[ATOMIC_FILE]); if (!list_empty(&fi->inmem_ilist)) list_del_init(&fi->inmem_ilist); @@ -478,25 +487,28 @@ void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need) void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi) { + if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) + return; + /* try to shrink extent cache when there is no enough memory */ - if (!available_free_memory(sbi, EXTENT_CACHE)) + if (!f2fs_available_free_memory(sbi, EXTENT_CACHE)) f2fs_shrink_extent_tree(sbi, EXTENT_CACHE_SHRINK_NUMBER); /* check the # of cached NAT entries */ - if (!available_free_memory(sbi, NAT_ENTRIES)) - try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK); + if (!f2fs_available_free_memory(sbi, NAT_ENTRIES)) + f2fs_try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK); - if (!available_free_memory(sbi, FREE_NIDS)) - try_to_free_nids(sbi, MAX_FREE_NIDS); + if (!f2fs_available_free_memory(sbi, FREE_NIDS)) + f2fs_try_to_free_nids(sbi, MAX_FREE_NIDS); else - build_free_nids(sbi, false, false); + f2fs_build_free_nids(sbi, false, false); if (!is_idle(sbi) && !excess_dirty_nats(sbi)) return; /* checkpoint is the only way to shrink partial cached entries */ - if (!available_free_memory(sbi, NAT_ENTRIES) || - !available_free_memory(sbi, INO_ENTRIES) || + if (!f2fs_available_free_memory(sbi, NAT_ENTRIES) || + !f2fs_available_free_memory(sbi, INO_ENTRIES) || excess_prefree_segs(sbi) || excess_dirty_nats(sbi) || f2fs_time_over(sbi, CP_TIME)) { @@ -504,7 +516,7 @@ void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi) struct blk_plug plug; blk_start_plug(&plug); - sync_dirty_inodes(sbi, FILE_INODE); + f2fs_sync_dirty_inodes(sbi, FILE_INODE); blk_finish_plug(&plug); } f2fs_sync_fs(sbi->sb, true); @@ -537,7 +549,7 @@ static int submit_flush_wait(struct f2fs_sb_info *sbi, nid_t ino) return __submit_flush_wait(sbi, sbi->sb->s_bdev); for (i = 0; i < sbi->s_ndevs; i++) { - if (!is_dirty_device(sbi, ino, i, FLUSH_INO)) + if (!f2fs_is_dirty_device(sbi, ino, i, FLUSH_INO)) continue; ret = __submit_flush_wait(sbi, FDEV(i).bdev); if (ret) @@ -648,7 +660,7 @@ int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino) return cmd.ret; } -int create_flush_cmd_control(struct f2fs_sb_info *sbi) +int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi) { dev_t dev = sbi->sb->s_bdev->bd_dev; struct flush_cmd_control *fcc; @@ -685,7 +697,7 @@ init_thread: return err; } -void destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free) +void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free) { struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info; @@ -1010,6 +1022,7 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, if (discard_type == DPOLICY_BG) { dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME; + dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME; dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME; dpolicy->io_aware = true; dpolicy->sync = false; @@ -1019,11 +1032,13 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, } } else if (discard_type == DPOLICY_FORCE) { dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME; + dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME; dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME; dpolicy->io_aware = false; } else if (discard_type == DPOLICY_FSTRIM) { dpolicy->io_aware = false; } else if (discard_type == DPOLICY_UMOUNT) { + dpolicy->max_requests = UINT_MAX; dpolicy->io_aware = false; } } @@ -1043,6 +1058,9 @@ static void __submit_discard_cmd(struct f2fs_sb_info *sbi, if (dc->state != D_PREP) return; + if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) + return; + trace_f2fs_issue_discard(dc->bdev, dc->start, dc->len); dc->error = __blkdev_issue_discard(dc->bdev, @@ -1085,7 +1103,7 @@ static struct discard_cmd *__insert_discard_tree(struct f2fs_sb_info *sbi, goto do_insert; } - p = __lookup_rb_tree_for_insert(sbi, &dcc->root, &parent, lstart); + p = f2fs_lookup_rb_tree_for_insert(sbi, &dcc->root, &parent, lstart); do_insert: dc = __attach_discard_cmd(sbi, bdev, lstart, start, len, parent, p); if (!dc) @@ -1150,7 +1168,7 @@ static void __update_discard_tree_range(struct f2fs_sb_info *sbi, mutex_lock(&dcc->cmd_lock); - dc = (struct discard_cmd *)__lookup_rb_tree_ret(&dcc->root, + dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root, NULL, lstart, (struct rb_entry **)&prev_dc, (struct rb_entry **)&next_dc, @@ -1261,7 +1279,8 @@ static int __issue_discard_cmd(struct f2fs_sb_info *sbi, mutex_lock(&dcc->cmd_lock); if (list_empty(pend_list)) goto next; - f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, &dcc->root)); + f2fs_bug_on(sbi, + !f2fs_check_rb_tree_consistence(sbi, &dcc->root)); blk_start_plug(&plug); list_for_each_entry_safe(dc, tmp, pend_list, list) { f2fs_bug_on(sbi, dc->state != D_PREP); @@ -1314,7 +1333,7 @@ static bool __drop_discard_cmd(struct f2fs_sb_info *sbi) return dropped; } -void drop_discard_cmd(struct f2fs_sb_info *sbi) +void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi) { __drop_discard_cmd(sbi); } @@ -1405,7 +1424,8 @@ static void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr) bool need_wait = false; mutex_lock(&dcc->cmd_lock); - dc = (struct discard_cmd *)__lookup_rb_tree(&dcc->root, NULL, blkaddr); + dc = (struct discard_cmd *)f2fs_lookup_rb_tree(&dcc->root, + NULL, blkaddr); if (dc) { if (dc->state == D_PREP) { __punch_discard_cmd(sbi, dc, blkaddr); @@ -1420,7 +1440,7 @@ static void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr) __wait_one_discard_bio(sbi, dc); } -void stop_discard_thread(struct f2fs_sb_info *sbi) +void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi) { struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; @@ -1468,25 +1488,32 @@ static int issue_discard_thread(void *data) kthread_should_stop() || freezing(current) || dcc->discard_wake, msecs_to_jiffies(wait_ms)); + + if (dcc->discard_wake) + dcc->discard_wake = 0; + if (try_to_freeze()) continue; if (f2fs_readonly(sbi->sb)) continue; if (kthread_should_stop()) return 0; + if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) { + wait_ms = dpolicy.max_interval; + continue; + } - if (dcc->discard_wake) - dcc->discard_wake = 0; - - if (sbi->gc_thread && sbi->gc_thread->gc_urgent) + if (sbi->gc_mode == GC_URGENT) __init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1); sb_start_intwrite(sbi->sb); issued = __issue_discard_cmd(sbi, &dpolicy); - if (issued) { + if (issued > 0) { __wait_all_discard_cmd(sbi, &dpolicy); wait_ms = dpolicy.min_interval; + } else if (issued == -1){ + wait_ms = dpolicy.mid_interval; } else { wait_ms = dpolicy.max_interval; } @@ -1655,20 +1682,24 @@ static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc, return false; } -void release_discard_addrs(struct f2fs_sb_info *sbi) +static void release_discard_addr(struct discard_entry *entry) +{ + list_del(&entry->list); + kmem_cache_free(discard_entry_slab, entry); +} + +void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi) { struct list_head *head = &(SM_I(sbi)->dcc_info->entry_list); struct discard_entry *entry, *this; /* drop caches */ - list_for_each_entry_safe(entry, this, head, list) { - list_del(&entry->list); - kmem_cache_free(discard_entry_slab, entry); - } + list_for_each_entry_safe(entry, this, head, list) + release_discard_addr(entry); } /* - * Should call clear_prefree_segments after checkpoint is done. + * Should call f2fs_clear_prefree_segments after checkpoint is done. */ static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi) { @@ -1681,7 +1712,8 @@ static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi) mutex_unlock(&dirty_i->seglist_lock); } -void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc) +void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi, + struct cp_control *cpc) { struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; struct list_head *head = &dcc->entry_list; @@ -1764,9 +1796,8 @@ skip: if (cur_pos < sbi->blocks_per_seg) goto find_next; - list_del(&entry->list); + release_discard_addr(entry); dcc->nr_discards -= total_len; - kmem_cache_free(discard_entry_slab, entry); } wake_up_discard_thread(sbi, false); @@ -1824,7 +1855,7 @@ static void destroy_discard_cmd_control(struct f2fs_sb_info *sbi) if (!dcc) return; - stop_discard_thread(sbi); + f2fs_stop_discard_thread(sbi); kfree(dcc); SM_I(sbi)->dcc_info = NULL; @@ -1871,8 +1902,9 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) (new_vblocks > sbi->blocks_per_seg))); se->valid_blocks = new_vblocks; - se->mtime = get_mtime(sbi); - SIT_I(sbi)->max_mtime = se->mtime; + se->mtime = get_mtime(sbi, false); + if (se->mtime > SIT_I(sbi)->max_mtime) + SIT_I(sbi)->max_mtime = se->mtime; /* Update valid block bitmap */ if (del > 0) { @@ -1940,7 +1972,7 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) get_sec_entry(sbi, segno)->valid_blocks += del; } -void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr) +void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr) { unsigned int segno = GET_SEGNO(sbi, addr); struct sit_info *sit_i = SIT_I(sbi); @@ -1960,14 +1992,14 @@ void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr) up_write(&sit_i->sentry_lock); } -bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr) +bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr) { struct sit_info *sit_i = SIT_I(sbi); unsigned int segno, offset; struct seg_entry *se; bool is_cp = false; - if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) + if (!is_valid_blkaddr(blkaddr)) return true; down_read(&sit_i->sentry_lock); @@ -1999,7 +2031,7 @@ static void __add_sum_entry(struct f2fs_sb_info *sbi, int type, /* * Calculate the number of current summary pages for writing */ -int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra) +int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra) { int valid_sum_count = 0; int i, sum_in_page; @@ -2029,14 +2061,15 @@ int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra) /* * Caller should put this summary page */ -struct page *get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno) +struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno) { - return get_meta_page(sbi, GET_SUM_BLOCK(sbi, segno)); + return f2fs_get_meta_page(sbi, GET_SUM_BLOCK(sbi, segno)); } -void update_meta_page(struct f2fs_sb_info *sbi, void *src, block_t blk_addr) +void f2fs_update_meta_page(struct f2fs_sb_info *sbi, + void *src, block_t blk_addr) { - struct page *page = grab_meta_page(sbi, blk_addr); + struct page *page = f2fs_grab_meta_page(sbi, blk_addr); memcpy(page_address(page), src, PAGE_SIZE); set_page_dirty(page); @@ -2046,18 +2079,19 @@ void update_meta_page(struct f2fs_sb_info *sbi, void *src, block_t blk_addr) static void write_sum_page(struct f2fs_sb_info *sbi, struct f2fs_summary_block *sum_blk, block_t blk_addr) { - update_meta_page(sbi, (void *)sum_blk, blk_addr); + f2fs_update_meta_page(sbi, (void *)sum_blk, blk_addr); } static void write_current_sum_page(struct f2fs_sb_info *sbi, int type, block_t blk_addr) { struct curseg_info *curseg = CURSEG_I(sbi, type); - struct page *page = grab_meta_page(sbi, blk_addr); + struct page *page = f2fs_grab_meta_page(sbi, blk_addr); struct f2fs_summary_block *src = curseg->sum_blk; struct f2fs_summary_block *dst; dst = (struct f2fs_summary_block *)page_address(page); + memset(dst, 0, PAGE_SIZE); mutex_lock(&curseg->curseg_mutex); @@ -2297,7 +2331,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type) curseg->alloc_type = SSR; __next_free_blkoff(sbi, curseg, 0); - sum_page = get_sum_page(sbi, new_segno); + sum_page = f2fs_get_sum_page(sbi, new_segno); sum_node = (struct f2fs_summary_block *)page_address(sum_page); memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE); f2fs_put_page(sum_page, 1); @@ -2311,7 +2345,7 @@ static int get_ssr_segment(struct f2fs_sb_info *sbi, int type) int i, cnt; bool reversed = false; - /* need_SSR() already forces to do this */ + /* f2fs_need_SSR() already forces to do this */ if (v_ops->get_victim(sbi, &segno, BG_GC, type, SSR)) { curseg->next_segno = segno; return 1; @@ -2363,7 +2397,7 @@ static void allocate_segment_by_default(struct f2fs_sb_info *sbi, new_curseg(sbi, type, false); else if (curseg->alloc_type == LFS && is_next_segment_free(sbi, type)) new_curseg(sbi, type, false); - else if (need_SSR(sbi) && get_ssr_segment(sbi, type)) + else if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type)) change_curseg(sbi, type); else new_curseg(sbi, type, false); @@ -2371,7 +2405,7 @@ static void allocate_segment_by_default(struct f2fs_sb_info *sbi, stat_inc_seg_type(sbi, curseg); } -void allocate_new_segments(struct f2fs_sb_info *sbi) +void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi) { struct curseg_info *curseg; unsigned int old_segno; @@ -2393,7 +2427,8 @@ static const struct segment_allocation default_salloc_ops = { .allocate_segment = allocate_segment_by_default, }; -bool exist_trim_candidates(struct f2fs_sb_info *sbi, struct cp_control *cpc) +bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi, + struct cp_control *cpc) { __u64 trim_start = cpc->trim_start; bool has_candidate = false; @@ -2426,9 +2461,9 @@ next: issued = 0; mutex_lock(&dcc->cmd_lock); - f2fs_bug_on(sbi, !__check_rb_tree_consistence(sbi, &dcc->root)); + f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi, &dcc->root)); - dc = (struct discard_cmd *)__lookup_rb_tree_ret(&dcc->root, + dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root, NULL, start, (struct rb_entry **)&prev_dc, (struct rb_entry **)&next_dc, @@ -2487,12 +2522,12 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) return -EINVAL; if (end <= MAIN_BLKADDR(sbi)) - goto out; + return -EINVAL; if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) { f2fs_msg(sbi->sb, KERN_WARNING, "Found FS corruption, run fsck to fix."); - goto out; + return -EIO; } /* start/end segment number in main_area */ @@ -2509,7 +2544,7 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) goto out; mutex_lock(&sbi->gc_mutex); - err = write_checkpoint(sbi, &cpc); + err = f2fs_write_checkpoint(sbi, &cpc); mutex_unlock(&sbi->gc_mutex); if (err) goto out; @@ -2543,7 +2578,7 @@ static bool __has_curseg_space(struct f2fs_sb_info *sbi, int type) return false; } -int rw_hint_to_seg_type(enum rw_hint hint) +int f2fs_rw_hint_to_seg_type(enum rw_hint hint) { switch (hint) { case WRITE_LIFE_SHORT: @@ -2616,7 +2651,7 @@ int rw_hint_to_seg_type(enum rw_hint hint) * WRITE_LIFE_LONG " WRITE_LIFE_LONG */ -enum rw_hint io_type_to_rw_hint(struct f2fs_sb_info *sbi, +enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi, enum page_type type, enum temp_type temp) { if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER) { @@ -2683,9 +2718,11 @@ static int __get_segment_type_6(struct f2fs_io_info *fio) if (is_cold_data(fio->page) || file_is_cold(inode)) return CURSEG_COLD_DATA; if (file_is_hot(inode) || - is_inode_flag_set(inode, FI_HOT_DATA)) + is_inode_flag_set(inode, FI_HOT_DATA) || + is_inode_flag_set(inode, FI_ATOMIC_FILE) || + is_inode_flag_set(inode, FI_VOLATILE_FILE)) return CURSEG_HOT_DATA; - /* rw_hint_to_seg_type(inode->i_write_hint); */ + /* f2fs_rw_hint_to_seg_type(inode->i_write_hint); */ return CURSEG_WARM_DATA; } else { if (IS_DNODE(fio->page)) @@ -2722,7 +2759,7 @@ static int __get_segment_type(struct f2fs_io_info *fio) return type; } -void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, +void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, block_t old_blkaddr, block_t *new_blkaddr, struct f2fs_summary *sum, int type, struct f2fs_io_info *fio, bool add_list) @@ -2782,6 +2819,7 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, INIT_LIST_HEAD(&fio->list); fio->in_list = true; + fio->retry = false; io = sbi->write_io[fio->type] + fio->temp; spin_lock(&io->io_lock); list_add_tail(&fio->list, &io->io_list); @@ -2804,7 +2842,7 @@ static void update_device_state(struct f2fs_io_info *fio) devidx = f2fs_target_device_index(sbi, fio->new_blkaddr); /* update device state for fsync */ - set_dirty_device(sbi, fio->ino, devidx, FLUSH_INO); + f2fs_set_dirty_device(sbi, fio->ino, devidx, FLUSH_INO); /* update device state for checkpoint */ if (!f2fs_test_bit(devidx, (char *)&sbi->dirty_device)) { @@ -2817,23 +2855,28 @@ static void update_device_state(struct f2fs_io_info *fio) static void do_write_page(struct f2fs_summary *sum, struct f2fs_io_info *fio) { int type = __get_segment_type(fio); - int err; + bool keep_order = (test_opt(fio->sbi, LFS) && type == CURSEG_COLD_DATA); + if (keep_order) + down_read(&fio->sbi->io_order_lock); reallocate: - allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr, + f2fs_allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr, &fio->new_blkaddr, sum, type, fio, true); /* writeout dirty page into bdev */ - err = f2fs_submit_page_write(fio); - if (err == -EAGAIN) { + f2fs_submit_page_write(fio); + if (fio->retry) { fio->old_blkaddr = fio->new_blkaddr; goto reallocate; - } else if (!err) { - update_device_state(fio); } + + update_device_state(fio); + + if (keep_order) + up_read(&fio->sbi->io_order_lock); } -void write_meta_page(struct f2fs_sb_info *sbi, struct page *page, +void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page, enum iostat_type io_type) { struct f2fs_io_info fio = { @@ -2859,7 +2902,7 @@ void write_meta_page(struct f2fs_sb_info *sbi, struct page *page, f2fs_update_iostat(sbi, io_type, F2FS_BLKSIZE); } -void write_node_page(unsigned int nid, struct f2fs_io_info *fio) +void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio) { struct f2fs_summary sum; @@ -2869,14 +2912,15 @@ void write_node_page(unsigned int nid, struct f2fs_io_info *fio) f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE); } -void write_data_page(struct dnode_of_data *dn, struct f2fs_io_info *fio) +void f2fs_outplace_write_data(struct dnode_of_data *dn, + struct f2fs_io_info *fio) { struct f2fs_sb_info *sbi = fio->sbi; struct f2fs_summary sum; struct node_info ni; f2fs_bug_on(sbi, dn->data_blkaddr == NULL_ADDR); - get_node_info(sbi, dn->nid, &ni); + f2fs_get_node_info(sbi, dn->nid, &ni); set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version); do_write_page(&sum, fio); f2fs_update_data_blkaddr(dn, fio->new_blkaddr); @@ -2884,7 +2928,7 @@ void write_data_page(struct dnode_of_data *dn, struct f2fs_io_info *fio) f2fs_update_iostat(sbi, fio->io_type, F2FS_BLKSIZE); } -int rewrite_data_page(struct f2fs_io_info *fio) +int f2fs_inplace_write_data(struct f2fs_io_info *fio) { int err; struct f2fs_sb_info *sbi = fio->sbi; @@ -2919,7 +2963,7 @@ static inline int __f2fs_get_curseg(struct f2fs_sb_info *sbi, return i; } -void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, +void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, block_t old_blkaddr, block_t new_blkaddr, bool recover_curseg, bool recover_newaddr) { @@ -3004,7 +3048,7 @@ void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn, set_summary(&sum, dn->nid, dn->ofs_in_node, version); - __f2fs_replace_block(sbi, &sum, old_addr, new_addr, + f2fs_do_replace_block(sbi, &sum, old_addr, new_addr, recover_curseg, recover_newaddr); f2fs_update_data_blkaddr(dn, new_addr); @@ -3029,7 +3073,7 @@ void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr) { struct page *cpage; - if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) + if (!is_valid_blkaddr(blkaddr)) return; cpage = find_lock_page(META_MAPPING(sbi), blkaddr); @@ -3050,7 +3094,7 @@ static void read_compacted_summaries(struct f2fs_sb_info *sbi) start = start_sum_block(sbi); - page = get_meta_page(sbi, start++); + page = f2fs_get_meta_page(sbi, start++); kaddr = (unsigned char *)page_address(page); /* Step 1: restore nat cache */ @@ -3090,7 +3134,7 @@ static void read_compacted_summaries(struct f2fs_sb_info *sbi) f2fs_put_page(page, 1); page = NULL; - page = get_meta_page(sbi, start++); + page = f2fs_get_meta_page(sbi, start++); kaddr = (unsigned char *)page_address(page); offset = 0; } @@ -3129,7 +3173,7 @@ static int read_normal_summaries(struct f2fs_sb_info *sbi, int type) blk_addr = GET_SUM_BLOCK(sbi, segno); } - new = get_meta_page(sbi, blk_addr); + new = f2fs_get_meta_page(sbi, blk_addr); sum = (struct f2fs_summary_block *)page_address(new); if (IS_NODESEG(type)) { @@ -3141,7 +3185,7 @@ static int read_normal_summaries(struct f2fs_sb_info *sbi, int type) ns->ofs_in_node = 0; } } else { - restore_node_summary(sbi, segno, sum); + f2fs_restore_node_summary(sbi, segno, sum); } } @@ -3173,10 +3217,10 @@ static int restore_curseg_summaries(struct f2fs_sb_info *sbi) int err; if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG)) { - int npages = npages_for_summary_flush(sbi, true); + int npages = f2fs_npages_for_summary_flush(sbi, true); if (npages >= 2) - ra_meta_pages(sbi, start_sum_block(sbi), npages, + f2fs_ra_meta_pages(sbi, start_sum_block(sbi), npages, META_CP, true); /* restore for compacted data summary */ @@ -3185,7 +3229,7 @@ static int restore_curseg_summaries(struct f2fs_sb_info *sbi) } if (__exist_node_summaries(sbi)) - ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type), + f2fs_ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type), NR_CURSEG_TYPE - type, META_CP, true); for (; type <= CURSEG_COLD_NODE; type++) { @@ -3211,8 +3255,9 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr) int written_size = 0; int i, j; - page = grab_meta_page(sbi, blkaddr++); + page = f2fs_grab_meta_page(sbi, blkaddr++); kaddr = (unsigned char *)page_address(page); + memset(kaddr, 0, PAGE_SIZE); /* Step 1: write nat cache */ seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA); @@ -3235,8 +3280,9 @@ static void write_compacted_summaries(struct f2fs_sb_info *sbi, block_t blkaddr) for (j = 0; j < blkoff; j++) { if (!page) { - page = grab_meta_page(sbi, blkaddr++); + page = f2fs_grab_meta_page(sbi, blkaddr++); kaddr = (unsigned char *)page_address(page); + memset(kaddr, 0, PAGE_SIZE); written_size = 0; } summary = (struct f2fs_summary *)(kaddr + written_size); @@ -3271,7 +3317,7 @@ static void write_normal_summaries(struct f2fs_sb_info *sbi, write_current_sum_page(sbi, i, blkaddr + (i - type)); } -void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk) +void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk) { if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG)) write_compacted_summaries(sbi, start_blk); @@ -3279,12 +3325,12 @@ void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk) write_normal_summaries(sbi, start_blk, CURSEG_HOT_DATA); } -void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk) +void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk) { write_normal_summaries(sbi, start_blk, CURSEG_HOT_NODE); } -int lookup_journal_in_cursum(struct f2fs_journal *journal, int type, +int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type, unsigned int val, int alloc) { int i; @@ -3309,7 +3355,7 @@ int lookup_journal_in_cursum(struct f2fs_journal *journal, int type, static struct page *get_current_sit_page(struct f2fs_sb_info *sbi, unsigned int segno) { - return get_meta_page(sbi, current_sit_addr(sbi, segno)); + return f2fs_get_meta_page(sbi, current_sit_addr(sbi, segno)); } static struct page *get_next_sit_page(struct f2fs_sb_info *sbi, @@ -3322,7 +3368,7 @@ static struct page *get_next_sit_page(struct f2fs_sb_info *sbi, src_off = current_sit_addr(sbi, start); dst_off = next_sit_addr(sbi, src_off); - page = grab_meta_page(sbi, dst_off); + page = f2fs_grab_meta_page(sbi, dst_off); seg_info_to_sit_page(sbi, page, start); set_page_dirty(page); @@ -3418,7 +3464,7 @@ static void remove_sits_in_journal(struct f2fs_sb_info *sbi) * CP calls this function, which flushes SIT entries including sit_journal, * and moves prefree segs to free segs. */ -void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc) +void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc) { struct sit_info *sit_i = SIT_I(sbi); unsigned long *bitmap = sit_i->dirty_sentries_bitmap; @@ -3477,6 +3523,11 @@ void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc) int offset, sit_offset; se = get_seg_entry(sbi, segno); +#ifdef CONFIG_F2FS_CHECK_FS + if (memcmp(se->cur_valid_map, se->cur_valid_map_mir, + SIT_VBLOCK_MAP_SIZE)) + f2fs_bug_on(sbi, 1); +#endif /* add discard candidates */ if (!(cpc->reason & CP_DISCARD)) { @@ -3485,17 +3536,21 @@ void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc) } if (to_journal) { - offset = lookup_journal_in_cursum(journal, + offset = f2fs_lookup_journal_in_cursum(journal, SIT_JOURNAL, segno, 1); f2fs_bug_on(sbi, offset < 0); segno_in_journal(journal, offset) = cpu_to_le32(segno); seg_info_to_raw_sit(se, &sit_in_journal(journal, offset)); + check_block_count(sbi, segno, + &sit_in_journal(journal, offset)); } else { sit_offset = SIT_ENTRY_OFFSET(sit_i, segno); seg_info_to_raw_sit(se, &raw_sit->entries[sit_offset]); + check_block_count(sbi, segno, + &raw_sit->entries[sit_offset]); } __clear_bit(segno, bitmap); @@ -3543,8 +3598,10 @@ static int build_sit_info(struct f2fs_sb_info *sbi) SM_I(sbi)->sit_info = sit_i; - sit_i->sentries = f2fs_kvzalloc(sbi, MAIN_SEGS(sbi) * - sizeof(struct seg_entry), GFP_KERNEL); + sit_i->sentries = + f2fs_kvzalloc(sbi, array_size(sizeof(struct seg_entry), + MAIN_SEGS(sbi)), + GFP_KERNEL); if (!sit_i->sentries) return -ENOMEM; @@ -3584,8 +3641,10 @@ static int build_sit_info(struct f2fs_sb_info *sbi) return -ENOMEM; if (sbi->segs_per_sec > 1) { - sit_i->sec_entries = f2fs_kvzalloc(sbi, MAIN_SECS(sbi) * - sizeof(struct sec_entry), GFP_KERNEL); + sit_i->sec_entries = + f2fs_kvzalloc(sbi, array_size(sizeof(struct sec_entry), + MAIN_SECS(sbi)), + GFP_KERNEL); if (!sit_i->sec_entries) return -ENOMEM; } @@ -3661,7 +3720,8 @@ static int build_curseg(struct f2fs_sb_info *sbi) struct curseg_info *array; int i; - array = f2fs_kzalloc(sbi, sizeof(*array) * NR_CURSEG_TYPE, GFP_KERNEL); + array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE, sizeof(*array)), + GFP_KERNEL); if (!array) return -ENOMEM; @@ -3694,9 +3754,10 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) unsigned int i, start, end; unsigned int readed, start_blk = 0; int err = 0; + block_t total_node_blocks = 0; do { - readed = ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES, + readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES, META_SIT, true); start = start_blk * sit_i->sents_per_block; @@ -3716,6 +3777,8 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) if (err) return err; seg_info_from_raw_sit(se, &sit); + if (IS_NODESEG(se->type)) + total_node_blocks += se->valid_blocks; /* build discard map only one time */ if (f2fs_discard_en(sbi)) { @@ -3744,15 +3807,28 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) unsigned int old_valid_blocks; start = le32_to_cpu(segno_in_journal(journal, i)); + if (start >= MAIN_SEGS(sbi)) { + f2fs_msg(sbi->sb, KERN_ERR, + "Wrong journal entry on segno %u", + start); + set_sbi_flag(sbi, SBI_NEED_FSCK); + err = -EINVAL; + break; + } + se = &sit_i->sentries[start]; sit = sit_in_journal(journal, i); old_valid_blocks = se->valid_blocks; + if (IS_NODESEG(se->type)) + total_node_blocks -= old_valid_blocks; err = check_block_count(sbi, start, &sit); if (err) break; seg_info_from_raw_sit(se, &sit); + if (IS_NODESEG(se->type)) + total_node_blocks += se->valid_blocks; if (f2fs_discard_en(sbi)) { if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { @@ -3761,16 +3837,28 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) } else { memcpy(se->discard_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE); - sbi->discard_blks += old_valid_blocks - - se->valid_blocks; + sbi->discard_blks += old_valid_blocks; + sbi->discard_blks -= se->valid_blocks; } } - if (sbi->segs_per_sec > 1) + if (sbi->segs_per_sec > 1) { get_sec_entry(sbi, start)->valid_blocks += - se->valid_blocks - old_valid_blocks; + se->valid_blocks; + get_sec_entry(sbi, start)->valid_blocks -= + old_valid_blocks; + } } up_read(&curseg->journal_rwsem); + + if (!err && total_node_blocks != valid_node_count(sbi)) { + f2fs_msg(sbi->sb, KERN_ERR, + "SIT is corrupted node# %u vs %u", + total_node_blocks, valid_node_count(sbi)); + set_sbi_flag(sbi, SBI_NEED_FSCK); + err = -EINVAL; + } + return err; } @@ -3869,7 +3957,7 @@ static void init_min_max_mtime(struct f2fs_sb_info *sbi) down_write(&sit_i->sentry_lock); - sit_i->min_mtime = LLONG_MAX; + sit_i->min_mtime = ULLONG_MAX; for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) { unsigned int i; @@ -3883,11 +3971,11 @@ static void init_min_max_mtime(struct f2fs_sb_info *sbi) if (sit_i->min_mtime > mtime) sit_i->min_mtime = mtime; } - sit_i->max_mtime = get_mtime(sbi); + sit_i->max_mtime = get_mtime(sbi, false); up_write(&sit_i->sentry_lock); } -int build_segment_manager(struct f2fs_sb_info *sbi) +int f2fs_build_segment_manager(struct f2fs_sb_info *sbi) { struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi); struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); @@ -3924,7 +4012,7 @@ int build_segment_manager(struct f2fs_sb_info *sbi) init_rwsem(&sm_info->curseg_lock); if (!f2fs_readonly(sbi->sb)) { - err = create_flush_cmd_control(sbi); + err = f2fs_create_flush_cmd_control(sbi); if (err) return err; } @@ -4049,13 +4137,13 @@ static void destroy_sit_info(struct f2fs_sb_info *sbi) kfree(sit_i); } -void destroy_segment_manager(struct f2fs_sb_info *sbi) +void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi) { struct f2fs_sm_info *sm_info = SM_I(sbi); if (!sm_info) return; - destroy_flush_cmd_control(sbi, true); + f2fs_destroy_flush_cmd_control(sbi, true); destroy_discard_cmd_control(sbi); destroy_dirty_segmap(sbi); destroy_curseg(sbi); @@ -4065,7 +4153,7 @@ void destroy_segment_manager(struct f2fs_sb_info *sbi) kfree(sm_info); } -int __init create_segment_manager_caches(void) +int __init f2fs_create_segment_manager_caches(void) { discard_entry_slab = f2fs_kmem_cache_create("discard_entry", sizeof(struct discard_entry)); @@ -4098,7 +4186,7 @@ fail: return -ENOMEM; } -void destroy_segment_manager_caches(void) +void f2fs_destroy_segment_manager_caches(void) { kmem_cache_destroy(sit_entry_set_slab); kmem_cache_destroy(discard_cmd_slab); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 96a2d57ba8a4..f18fc82fbe99 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -85,7 +85,7 @@ (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & ((sbi)->blocks_per_seg - 1)) #define GET_SEGNO(sbi, blk_addr) \ - ((((blk_addr) == NULL_ADDR) || ((blk_addr) == NEW_ADDR)) ? \ + ((!is_valid_blkaddr(blk_addr)) ? \ NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \ GET_SEGNO_FROM_SEG0(sbi, blk_addr))) #define BLKS_PER_SEC(sbi) \ @@ -215,6 +215,8 @@ struct segment_allocation { #define IS_DUMMY_WRITTEN_PAGE(page) \ (page_private(page) == (unsigned long)DUMMY_WRITTEN_PAGE) +#define MAX_SKIP_ATOMIC_COUNT 16 + struct inmem_pages { struct list_head list; struct page *page; @@ -375,6 +377,7 @@ static inline void seg_info_to_sit_page(struct f2fs_sb_info *sbi, int i; raw_sit = (struct f2fs_sit_block *)page_address(page); + memset(raw_sit, 0, PAGE_SIZE); for (i = 0; i < end - start; i++) { rs = &raw_sit->entries[i]; se = get_seg_entry(sbi, start + i); @@ -742,11 +745,23 @@ static inline void set_to_next_sit(struct sit_info *sit_i, unsigned int start) #endif } -static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi) +static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi, + bool base_time) { struct sit_info *sit_i = SIT_I(sbi); - return sit_i->elapsed_time + CURRENT_TIME_SEC.tv_sec - - sit_i->mounted_time; + time64_t diff, now = ktime_get_real_seconds(); + + if (now >= sit_i->mounted_time) + return sit_i->elapsed_time + now - sit_i->mounted_time; + + /* system time is set to the past */ + if (!base_time) { + diff = sit_i->mounted_time - now; + if (sit_i->elapsed_time >= diff) + return sit_i->elapsed_time - diff; + return 0; + } + return sit_i->elapsed_time; } static inline void set_summary(struct f2fs_summary *sum, nid_t nid, @@ -770,15 +785,6 @@ static inline block_t sum_blk_addr(struct f2fs_sb_info *sbi, int base, int type) - (base + 1) + type; } -static inline bool no_fggc_candidate(struct f2fs_sb_info *sbi, - unsigned int secno) -{ - if (get_valid_blocks(sbi, GET_SEG_FROM_SEC(sbi, secno), true) > - sbi->fggc_threshold) - return true; - return false; -} - static inline bool sec_usage_check(struct f2fs_sb_info *sbi, unsigned int secno) { if (IS_CURSEC(sbi, secno) || (sbi->cur_victim_sec == secno)) diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c index 0b5664a1a6cc..36cfd816c160 100644 --- a/fs/f2fs/shrinker.c +++ b/fs/f2fs/shrinker.c @@ -109,11 +109,11 @@ unsigned long f2fs_shrink_scan(struct shrinker *shrink, /* shrink clean nat cache entries */ if (freed < nr) - freed += try_to_free_nats(sbi, nr - freed); + freed += f2fs_try_to_free_nats(sbi, nr - freed); /* shrink free nids cache entries */ if (freed < nr) - freed += try_to_free_nids(sbi, nr - freed); + freed += f2fs_try_to_free_nids(sbi, nr - freed); spin_lock(&f2fs_list_lock); p = p->next; diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index cb57ad3ca32d..08635dc2594f 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -830,15 +830,14 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) /* Initialize f2fs-specific inode info */ atomic_set(&fi->dirty_pages, 0); - fi->i_current_depth = 1; init_rwsem(&fi->i_sem); INIT_LIST_HEAD(&fi->dirty_list); INIT_LIST_HEAD(&fi->gdirty_list); INIT_LIST_HEAD(&fi->inmem_ilist); INIT_LIST_HEAD(&fi->inmem_pages); mutex_init(&fi->inmem_lock); - init_rwsem(&fi->dio_rwsem[READ]); - init_rwsem(&fi->dio_rwsem[WRITE]); + init_rwsem(&fi->i_gc_rwsem[READ]); + init_rwsem(&fi->i_gc_rwsem[WRITE]); init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_xattr_sem); @@ -866,7 +865,7 @@ static int f2fs_drop_inode(struct inode *inode) /* some remained atomic pages should discarded */ if (f2fs_is_atomic_file(inode)) - drop_inmem_pages(inode); + f2fs_drop_inmem_pages(inode); /* should remain fi->extent_tree for writepage */ f2fs_destroy_extent_node(inode); @@ -1003,7 +1002,7 @@ static void f2fs_put_super(struct super_block *sb) struct cp_control cpc = { .reason = CP_UMOUNT, }; - write_checkpoint(sbi, &cpc); + f2fs_write_checkpoint(sbi, &cpc); } /* be sure to wait for any on-going discard commands */ @@ -1013,17 +1012,17 @@ static void f2fs_put_super(struct super_block *sb) struct cp_control cpc = { .reason = CP_UMOUNT | CP_TRIMMED, }; - write_checkpoint(sbi, &cpc); + f2fs_write_checkpoint(sbi, &cpc); } - /* write_checkpoint can update stat informaion */ + /* f2fs_write_checkpoint can update stat informaion */ f2fs_destroy_stats(sbi); /* * normally superblock is clean, so we need to release this. * In addition, EIO will skip do checkpoint, we need this as well. */ - release_ino_entry(sbi, true); + f2fs_release_ino_entry(sbi, true); f2fs_leave_shrinker(sbi); mutex_unlock(&sbi->umount_mutex); @@ -1035,8 +1034,8 @@ static void f2fs_put_super(struct super_block *sb) iput(sbi->meta_inode); /* destroy f2fs internal modules */ - destroy_node_manager(sbi); - destroy_segment_manager(sbi); + f2fs_destroy_node_manager(sbi); + f2fs_destroy_segment_manager(sbi); kfree(sbi->ckpt); @@ -1079,7 +1078,7 @@ int f2fs_sync_fs(struct super_block *sb, int sync) cpc.reason = __get_cp_reason(sbi); mutex_lock(&sbi->gc_mutex); - err = write_checkpoint(sbi, &cpc); + err = f2fs_write_checkpoint(sbi, &cpc); mutex_unlock(&sbi->gc_mutex); } f2fs_trace_ios(NULL, 1); @@ -1482,11 +1481,11 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) */ if ((*flags & MS_RDONLY) || !test_opt(sbi, BG_GC)) { if (sbi->gc_thread) { - stop_gc_thread(sbi); + f2fs_stop_gc_thread(sbi); need_restart_gc = true; } } else if (!sbi->gc_thread) { - err = start_gc_thread(sbi); + err = f2fs_start_gc_thread(sbi); if (err) goto restore_opts; need_stop_gc = true; @@ -1509,9 +1508,9 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) */ if ((*flags & MS_RDONLY) || !test_opt(sbi, FLUSH_MERGE)) { clear_opt(sbi, FLUSH_MERGE); - destroy_flush_cmd_control(sbi, false); + f2fs_destroy_flush_cmd_control(sbi, false); } else { - err = create_flush_cmd_control(sbi); + err = f2fs_create_flush_cmd_control(sbi); if (err) goto restore_gc; } @@ -1529,11 +1528,11 @@ skip: return 0; restore_gc: if (need_restart_gc) { - if (start_gc_thread(sbi)) + if (f2fs_start_gc_thread(sbi)) f2fs_msg(sbi->sb, KERN_WARNING, "background gc thread has stopped"); } else if (need_stop_gc) { - stop_gc_thread(sbi); + f2fs_stop_gc_thread(sbi); } restore_opts: #ifdef CONFIG_QUOTA @@ -1805,7 +1804,7 @@ static int f2fs_quota_on(struct super_block *sb, int type, int format_id, inode = d_inode(path->dentry); inode_lock(inode); - F2FS_I(inode)->i_flags |= FS_NOATIME_FL | FS_IMMUTABLE_FL; + F2FS_I(inode)->i_flags |= F2FS_NOATIME_FL | F2FS_IMMUTABLE_FL; inode_set_flags(inode, S_NOATIME | S_IMMUTABLE, S_NOATIME | S_IMMUTABLE); inode_unlock(inode); @@ -1829,7 +1828,7 @@ static int f2fs_quota_off(struct super_block *sb, int type) goto out_put; inode_lock(inode); - F2FS_I(inode)->i_flags &= ~(FS_NOATIME_FL | FS_IMMUTABLE_FL); + F2FS_I(inode)->i_flags &= ~(F2FS_NOATIME_FL | F2FS_IMMUTABLE_FL); inode_set_flags(inode, 0, S_NOATIME | S_IMMUTABLE); inode_unlock(inode); f2fs_mark_inode_dirty_sync(inode, false); @@ -1938,19 +1937,13 @@ static bool f2fs_dummy_context(struct inode *inode) return DUMMY_ENCRYPTION_ENABLED(F2FS_I_SB(inode)); } -static unsigned f2fs_max_namelen(struct inode *inode) -{ - return S_ISLNK(inode->i_mode) ? - inode->i_sb->s_blocksize : F2FS_NAME_LEN; -} - static const struct fscrypt_operations f2fs_cryptops = { .key_prefix = "f2fs:", .get_context = f2fs_get_context, .set_context = f2fs_set_context, .dummy_context = f2fs_dummy_context, .empty_dir = f2fs_empty_dir, - .max_namelen = f2fs_max_namelen, + .max_namelen = F2FS_NAME_LEN, }; #endif @@ -1960,7 +1953,7 @@ static struct inode *f2fs_nfs_get_inode(struct super_block *sb, struct f2fs_sb_info *sbi = F2FS_SB(sb); struct inode *inode; - if (check_nid_range(sbi, ino)) + if (f2fs_check_nid_range(sbi, ino)) return ERR_PTR(-ESTALE); /* @@ -2143,6 +2136,8 @@ static inline bool sanity_check_area_boundary(struct f2fs_sb_info *sbi, static int sanity_check_raw_super(struct f2fs_sb_info *sbi, struct buffer_head *bh) { + block_t segment_count, segs_per_sec, secs_per_zone; + block_t total_sections, blocks_per_seg; struct f2fs_super_block *raw_super = (struct f2fs_super_block *) (bh->b_data + F2FS_SUPER_OFFSET); struct super_block *sb = sbi->sb; @@ -2199,6 +2194,72 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, return 1; } + segment_count = le32_to_cpu(raw_super->segment_count); + segs_per_sec = le32_to_cpu(raw_super->segs_per_sec); + secs_per_zone = le32_to_cpu(raw_super->secs_per_zone); + total_sections = le32_to_cpu(raw_super->section_count); + + /* blocks_per_seg should be 512, given the above check */ + blocks_per_seg = 1 << le32_to_cpu(raw_super->log_blocks_per_seg); + + if (segment_count > F2FS_MAX_SEGMENT || + segment_count < F2FS_MIN_SEGMENTS) { + f2fs_msg(sb, KERN_INFO, + "Invalid segment count (%u)", + segment_count); + return 1; + } + + if (total_sections > segment_count || + total_sections < F2FS_MIN_SEGMENTS || + segs_per_sec > segment_count || !segs_per_sec) { + f2fs_msg(sb, KERN_INFO, + "Invalid segment/section count (%u, %u x %u)", + segment_count, total_sections, segs_per_sec); + return 1; + } + + if ((segment_count / segs_per_sec) < total_sections) { + f2fs_msg(sb, KERN_INFO, + "Small segment_count (%u < %u * %u)", + segment_count, segs_per_sec, total_sections); + return 1; + } + + if (segment_count > (le32_to_cpu(raw_super->block_count) >> 9)) { + f2fs_msg(sb, KERN_INFO, + "Wrong segment_count / block_count (%u > %u)", + segment_count, le32_to_cpu(raw_super->block_count)); + return 1; + } + + if (secs_per_zone > total_sections) { + f2fs_msg(sb, KERN_INFO, + "Wrong secs_per_zone (%u > %u)", + secs_per_zone, total_sections); + return 1; + } + if (le32_to_cpu(raw_super->extension_count) > F2FS_MAX_EXTENSION || + raw_super->hot_ext_count > F2FS_MAX_EXTENSION || + (le32_to_cpu(raw_super->extension_count) + + raw_super->hot_ext_count) > F2FS_MAX_EXTENSION) { + f2fs_msg(sb, KERN_INFO, + "Corrupted extension count (%u + %u > %u)", + le32_to_cpu(raw_super->extension_count), + raw_super->hot_ext_count, + F2FS_MAX_EXTENSION); + return 1; + } + + if (le32_to_cpu(raw_super->cp_payload) > + (blocks_per_seg - F2FS_CP_PACKS)) { + f2fs_msg(sb, KERN_INFO, + "Insane cp_payload (%u > %u)", + le32_to_cpu(raw_super->cp_payload), + blocks_per_seg - F2FS_CP_PACKS); + return 1; + } + /* check reserved ino info */ if (le32_to_cpu(raw_super->node_ino) != 1 || le32_to_cpu(raw_super->meta_ino) != 2 || @@ -2211,13 +2272,6 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, return 1; } - if (le32_to_cpu(raw_super->segment_count) > F2FS_MAX_SEGMENT) { - f2fs_msg(sb, KERN_INFO, - "Invalid segment count (%u)", - le32_to_cpu(raw_super->segment_count)); - return 1; - } - /* check CP/SIT/NAT/SSA/MAIN_AREA area boundary */ if (sanity_check_area_boundary(sbi, bh)) return 1; @@ -2225,7 +2279,7 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, return 0; } -int sanity_check_ckpt(struct f2fs_sb_info *sbi) +int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi) { unsigned int total, fsmeta; struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi); @@ -2306,13 +2360,15 @@ static void init_sb_info(struct f2fs_sb_info *sbi) for (i = 0; i < NR_COUNT_TYPE; i++) atomic_set(&sbi->nr_pages[i], 0); - atomic_set(&sbi->wb_sync_req, 0); + for (i = 0; i < META; i++) + atomic_set(&sbi->wb_sync_req[i], 0); INIT_LIST_HEAD(&sbi->s_list); mutex_init(&sbi->umount_mutex); for (i = 0; i < NR_PAGE_TYPE - 1; i++) for (j = HOT; j < NR_TEMP_TYPE; j++) mutex_init(&sbi->wio_mutex[i][j]); + init_rwsem(&sbi->io_order_lock); spin_lock_init(&sbi->cp_lock); sbi->dirty_device = 0; @@ -2367,8 +2423,10 @@ static int init_blkz_info(struct f2fs_sb_info *sbi, int devi) #define F2FS_REPORT_NR_ZONES 4096 - zones = f2fs_kzalloc(sbi, sizeof(struct blk_zone) * - F2FS_REPORT_NR_ZONES, GFP_KERNEL); + zones = f2fs_kzalloc(sbi, + array_size(F2FS_REPORT_NR_ZONES, + sizeof(struct blk_zone)), + GFP_KERNEL); if (!zones) return -ENOMEM; @@ -2512,8 +2570,10 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi) * Initialize multiple devices information, or single * zoned block device information. */ - sbi->devs = f2fs_kzalloc(sbi, sizeof(struct f2fs_dev_info) * - max_devices, GFP_KERNEL); + sbi->devs = f2fs_kzalloc(sbi, + array_size(max_devices, + sizeof(struct f2fs_dev_info)), + GFP_KERNEL); if (!sbi->devs) return -ENOMEM; @@ -2735,9 +2795,11 @@ try_onemore: int n = (i == META) ? 1: NR_TEMP_TYPE; int j; - sbi->write_io[i] = f2fs_kmalloc(sbi, - n * sizeof(struct f2fs_bio_info), - GFP_KERNEL); + sbi->write_io[i] = + f2fs_kmalloc(sbi, + array_size(n, + sizeof(struct f2fs_bio_info)), + GFP_KERNEL); if (!sbi->write_io[i]) { err = -ENOMEM; goto free_options; @@ -2777,7 +2839,7 @@ try_onemore: goto free_io_dummy; } - err = get_valid_checkpoint(sbi); + err = f2fs_get_valid_checkpoint(sbi); if (err) { f2fs_msg(sb, KERN_ERR, "Failed to get valid F2FS checkpoint"); goto free_meta_inode; @@ -2807,18 +2869,18 @@ try_onemore: spin_lock_init(&sbi->inode_lock[i]); } - init_extent_cache_info(sbi); + f2fs_init_extent_cache_info(sbi); - init_ino_entry_info(sbi); + f2fs_init_ino_entry_info(sbi); /* setup f2fs internal modules */ - err = build_segment_manager(sbi); + err = f2fs_build_segment_manager(sbi); if (err) { f2fs_msg(sb, KERN_ERR, "Failed to initialize F2FS segment manager"); goto free_sm; } - err = build_node_manager(sbi); + err = f2fs_build_node_manager(sbi); if (err) { f2fs_msg(sb, KERN_ERR, "Failed to initialize F2FS node manager"); @@ -2836,7 +2898,7 @@ try_onemore: sbi->kbytes_written = le64_to_cpu(seg_i->journal->info.kbytes_written); - build_gc_manager(sbi); + f2fs_build_gc_manager(sbi); /* get an inode for node space */ sbi->node_inode = f2fs_iget(sb, F2FS_NODE_INO(sbi)); @@ -2888,7 +2950,7 @@ try_onemore: } #endif /* if there are nt orphan nodes free them */ - err = recover_orphan_inodes(sbi); + err = f2fs_recover_orphan_inodes(sbi); if (err) goto free_meta; @@ -2910,7 +2972,7 @@ try_onemore: if (!retry) goto skip_recovery; - err = recover_fsync_data(sbi, false); + err = f2fs_recover_fsync_data(sbi, false); if (err < 0) { need_fsck = true; f2fs_msg(sb, KERN_ERR, @@ -2918,7 +2980,7 @@ try_onemore: goto free_meta; } } else { - err = recover_fsync_data(sbi, true); + err = f2fs_recover_fsync_data(sbi, true); if (!f2fs_readonly(sb) && err > 0) { err = -EINVAL; @@ -2928,7 +2990,7 @@ try_onemore: } } skip_recovery: - /* recover_fsync_data() cleared this already */ + /* f2fs_recover_fsync_data() cleared this already */ clear_sbi_flag(sbi, SBI_POR_DOING); /* @@ -2937,7 +2999,7 @@ skip_recovery: */ if (test_opt(sbi, BG_GC) && !f2fs_readonly(sb)) { /* After POR, we can run background GC thread.*/ - err = start_gc_thread(sbi); + err = f2fs_start_gc_thread(sbi); if (err) goto free_meta; } @@ -2968,10 +3030,10 @@ free_meta: #endif f2fs_sync_inode_meta(sbi); /* - * Some dirty meta pages can be produced by recover_orphan_inodes() + * Some dirty meta pages can be produced by f2fs_recover_orphan_inodes() * failed by EIO. Then, iput(node_inode) can trigger balance_fs_bg() - * followed by write_checkpoint() through f2fs_write_node_pages(), which - * falls into an infinite loop in sync_meta_pages(). + * followed by f2fs_write_checkpoint() through f2fs_write_node_pages(), which + * falls into an infinite loop in f2fs_sync_meta_pages(). */ truncate_inode_pages_final(META_MAPPING(sbi)); #ifdef CONFIG_QUOTA @@ -2984,13 +3046,13 @@ free_root_inode: free_stats: f2fs_destroy_stats(sbi); free_node_inode: - release_ino_entry(sbi, true); + f2fs_release_ino_entry(sbi, true); truncate_inode_pages_final(NODE_MAPPING(sbi)); iput(sbi->node_inode); free_nm: - destroy_node_manager(sbi); + f2fs_destroy_node_manager(sbi); free_sm: - destroy_segment_manager(sbi); + f2fs_destroy_segment_manager(sbi); free_devices: destroy_device_list(sbi); kfree(sbi->ckpt); @@ -3036,8 +3098,8 @@ static void kill_f2fs_super(struct super_block *sb) { if (sb->s_root) { set_sbi_flag(F2FS_SB(sb), SBI_IS_CLOSE); - stop_gc_thread(F2FS_SB(sb)); - stop_discard_thread(F2FS_SB(sb)); + f2fs_stop_gc_thread(F2FS_SB(sb)); + f2fs_stop_discard_thread(F2FS_SB(sb)); } kill_block_super(sb); } @@ -3075,21 +3137,27 @@ static int __init init_f2fs_fs(void) { int err; + if (PAGE_SIZE != F2FS_BLKSIZE) { + printk("F2FS not supported on PAGE_SIZE(%lu) != %d\n", + PAGE_SIZE, F2FS_BLKSIZE); + return -EINVAL; + } + f2fs_build_trace_ios(); err = init_inodecache(); if (err) goto fail; - err = create_node_manager_caches(); + err = f2fs_create_node_manager_caches(); if (err) goto free_inodecache; - err = create_segment_manager_caches(); + err = f2fs_create_segment_manager_caches(); if (err) goto free_node_manager_caches; - err = create_checkpoint_caches(); + err = f2fs_create_checkpoint_caches(); if (err) goto free_segment_manager_caches; - err = create_extent_cache(); + err = f2fs_create_extent_cache(); if (err) goto free_checkpoint_caches; err = f2fs_init_sysfs(); @@ -3118,13 +3186,13 @@ free_shrinker: free_sysfs: f2fs_exit_sysfs(); free_extent_cache: - destroy_extent_cache(); + f2fs_destroy_extent_cache(); free_checkpoint_caches: - destroy_checkpoint_caches(); + f2fs_destroy_checkpoint_caches(); free_segment_manager_caches: - destroy_segment_manager_caches(); + f2fs_destroy_segment_manager_caches(); free_node_manager_caches: - destroy_node_manager_caches(); + f2fs_destroy_node_manager_caches(); free_inodecache: destroy_inodecache(); fail: @@ -3138,10 +3206,10 @@ static void __exit exit_f2fs_fs(void) unregister_filesystem(&f2fs_fs_type); unregister_shrinker(&f2fs_shrinker_info); f2fs_exit_sysfs(); - destroy_extent_cache(); - destroy_checkpoint_caches(); - destroy_segment_manager_caches(); - destroy_node_manager_caches(); + f2fs_destroy_extent_cache(); + f2fs_destroy_checkpoint_caches(); + f2fs_destroy_segment_manager_caches(); + f2fs_destroy_node_manager_caches(); destroy_inodecache(); f2fs_destroy_trace_ios(); } diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index 2c53de9251be..60c827eadd82 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -147,13 +147,13 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a, int len = 0, i; len += snprintf(buf + len, PAGE_SIZE - len, - "cold file extenstion:\n"); + "cold file extension:\n"); for (i = 0; i < cold_count; i++) len += snprintf(buf + len, PAGE_SIZE - len, "%s\n", extlist[i]); len += snprintf(buf + len, PAGE_SIZE - len, - "hot file extenstion:\n"); + "hot file extension:\n"); for (i = cold_count; i < cold_count + hot_count; i++) len += snprintf(buf + len, PAGE_SIZE - len, "%s\n", extlist[i]); @@ -165,7 +165,7 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a, return snprintf(buf, PAGE_SIZE, "%u\n", *ui); } -static ssize_t f2fs_sbi_store(struct f2fs_attr *a, +static ssize_t __sbi_store(struct f2fs_attr *a, struct f2fs_sb_info *sbi, const char *buf, size_t count) { @@ -201,13 +201,13 @@ static ssize_t f2fs_sbi_store(struct f2fs_attr *a, down_write(&sbi->sb_lock); - ret = update_extension_list(sbi, name, hot, set); + ret = f2fs_update_extension_list(sbi, name, hot, set); if (ret) goto out; ret = f2fs_commit_super(sbi, false); if (ret) - update_extension_list(sbi, name, hot, !set); + f2fs_update_extension_list(sbi, name, hot, !set); out: up_write(&sbi->sb_lock); return ret ? ret : count; @@ -248,19 +248,53 @@ out: if (!strcmp(a->attr.name, "trim_sections")) return -EINVAL; + if (!strcmp(a->attr.name, "gc_urgent")) { + if (t >= 1) { + sbi->gc_mode = GC_URGENT; + if (sbi->gc_thread) { + wake_up_interruptible_all( + &sbi->gc_thread->gc_wait_queue_head); + wake_up_discard_thread(sbi, true); + } + } else { + sbi->gc_mode = GC_NORMAL; + } + return count; + } + if (!strcmp(a->attr.name, "gc_idle")) { + if (t == GC_IDLE_CB) + sbi->gc_mode = GC_IDLE_CB; + else if (t == GC_IDLE_GREEDY) + sbi->gc_mode = GC_IDLE_GREEDY; + else + sbi->gc_mode = GC_NORMAL; + return count; + } + *ui = t; if (!strcmp(a->attr.name, "iostat_enable") && *ui == 0) f2fs_reset_iostat(sbi); - if (!strcmp(a->attr.name, "gc_urgent") && t == 1 && sbi->gc_thread) { - sbi->gc_thread->gc_wake = 1; - wake_up_interruptible_all(&sbi->gc_thread->gc_wait_queue_head); - wake_up_discard_thread(sbi, true); - } - return count; } +static ssize_t f2fs_sbi_store(struct f2fs_attr *a, + struct f2fs_sb_info *sbi, + const char *buf, size_t count) +{ + ssize_t ret; + bool gc_entry = (!strcmp(a->attr.name, "gc_urgent") || + a->struct_type == GC_THREAD); + + if (gc_entry) + down_read(&sbi->sb->s_umount); + ret = __sbi_store(a, sbi, buf, count); + if (gc_entry) + up_read(&sbi->sb->s_umount); + + return ret; +} + static ssize_t f2fs_attr_show(struct kobject *kobj, struct attribute *attr, char *buf) { @@ -349,8 +383,8 @@ F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_urgent_sleep_time, F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_min_sleep_time, min_sleep_time); F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_max_sleep_time, max_sleep_time); F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_no_gc_sleep_time, no_gc_sleep_time); -F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_idle, gc_idle); -F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_urgent, gc_urgent); +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_idle, gc_mode); +F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_urgent, gc_mode); F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, reclaim_segments, rec_prefree_segments); F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_small_discards, max_discards); F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, discard_granularity, discard_granularity); diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c index 116be979b897..61a5d9284bc0 100644 --- a/fs/f2fs/xattr.c +++ b/fs/f2fs/xattr.c @@ -299,7 +299,7 @@ static int read_inline_xattr(struct inode *inode, struct page *ipage, if (ipage) { inline_addr = inline_xattr_addr(inode, ipage); } else { - page = get_node_page(sbi, inode->i_ino); + page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(page)) return PTR_ERR(page); @@ -320,7 +320,7 @@ static int read_xattr_block(struct inode *inode, void *txattr_addr) void *xattr_addr; /* The inode already has an extended attribute block. */ - xpage = get_node_page(sbi, xnid); + xpage = f2fs_get_node_page(sbi, xnid); if (IS_ERR(xpage)) return PTR_ERR(xpage); @@ -444,7 +444,7 @@ static inline int write_all_xattrs(struct inode *inode, __u32 hsize, int err = 0; if (hsize > inline_size && !F2FS_I(inode)->i_xattr_nid) - if (!alloc_nid(sbi, &new_nid)) + if (!f2fs_alloc_nid(sbi, &new_nid)) return -ENOSPC; /* write to inline xattr */ @@ -452,9 +452,9 @@ static inline int write_all_xattrs(struct inode *inode, __u32 hsize, if (ipage) { inline_addr = inline_xattr_addr(inode, ipage); } else { - in_page = get_node_page(sbi, inode->i_ino); + in_page = f2fs_get_node_page(sbi, inode->i_ino); if (IS_ERR(in_page)) { - alloc_nid_failed(sbi, new_nid); + f2fs_alloc_nid_failed(sbi, new_nid); return PTR_ERR(in_page); } inline_addr = inline_xattr_addr(inode, in_page); @@ -464,8 +464,8 @@ static inline int write_all_xattrs(struct inode *inode, __u32 hsize, NODE, true); /* no need to use xattr node block */ if (hsize <= inline_size) { - err = truncate_xattr_node(inode); - alloc_nid_failed(sbi, new_nid); + err = f2fs_truncate_xattr_node(inode); + f2fs_alloc_nid_failed(sbi, new_nid); if (err) { f2fs_put_page(in_page, 1); return err; @@ -478,10 +478,10 @@ static inline int write_all_xattrs(struct inode *inode, __u32 hsize, /* write to xattr node block */ if (F2FS_I(inode)->i_xattr_nid) { - xpage = get_node_page(sbi, F2FS_I(inode)->i_xattr_nid); + xpage = f2fs_get_node_page(sbi, F2FS_I(inode)->i_xattr_nid); if (IS_ERR(xpage)) { err = PTR_ERR(xpage); - alloc_nid_failed(sbi, new_nid); + f2fs_alloc_nid_failed(sbi, new_nid); goto in_page_out; } f2fs_bug_on(sbi, new_nid); @@ -489,13 +489,13 @@ static inline int write_all_xattrs(struct inode *inode, __u32 hsize, } else { struct dnode_of_data dn; set_new_dnode(&dn, inode, NULL, NULL, new_nid); - xpage = new_node_page(&dn, XATTR_NODE_OFFSET); + xpage = f2fs_new_node_page(&dn, XATTR_NODE_OFFSET); if (IS_ERR(xpage)) { err = PTR_ERR(xpage); - alloc_nid_failed(sbi, new_nid); + f2fs_alloc_nid_failed(sbi, new_nid); goto in_page_out; } - alloc_nid_done(sbi, new_nid); + f2fs_alloc_nid_done(sbi, new_nid); } xattr_addr = page_address(xpage); @@ -733,7 +733,7 @@ int f2fs_setxattr(struct inode *inode, int index, const char *name, if (err) return err; - /* this case is only from init_inode_metadata */ + /* this case is only from f2fs_init_inode_metadata */ if (ipage) return __f2fs_setxattr(inode, index, name, value, size, ipage, flags); diff --git a/fs/fat/inode.c b/fs/fat/inode.c index a6c21fba6e9f..1ac142ad7d1b 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -613,13 +613,21 @@ static void fat_set_state(struct super_block *sb, brelse(bh); } +static void fat_reset_iocharset(struct fat_mount_options *opts) +{ + if (opts->iocharset != fat_default_iocharset) { + /* Note: opts->iocharset can be NULL here */ + kfree(opts->iocharset); + opts->iocharset = fat_default_iocharset; + } +} + static void delayed_free(struct rcu_head *p) { struct msdos_sb_info *sbi = container_of(p, struct msdos_sb_info, rcu); unload_nls(sbi->nls_disk); unload_nls(sbi->nls_io); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); kfree(sbi); } @@ -1035,7 +1043,7 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat, opts->fs_fmask = opts->fs_dmask = current_umask(); opts->allow_utime = -1; opts->codepage = fat_default_codepage; - opts->iocharset = fat_default_iocharset; + fat_reset_iocharset(opts); if (is_vfat) { opts->shortname = VFAT_SFN_DISPLAY_WINNT|VFAT_SFN_CREATE_WIN95; opts->rodir = 0; @@ -1185,8 +1193,7 @@ static int parse_options(struct super_block *sb, char *options, int is_vfat, /* vfat specific */ case Opt_charset: - if (opts->iocharset != fat_default_iocharset) - kfree(opts->iocharset); + fat_reset_iocharset(opts); iocharset = match_strdup(&args[0]); if (!iocharset) return -ENOMEM; @@ -1777,8 +1784,7 @@ out_fail: iput(fat_inode); unload_nls(sbi->nls_io); unload_nls(sbi->nls_disk); - if (sbi->options.iocharset != fat_default_iocharset) - kfree(sbi->options.iocharset); + fat_reset_iocharset(&sbi->options); sb->s_fs_info = NULL; kfree(sbi); return error; diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c index 1caee0534587..582ef53f2104 100644 --- a/fs/gfs2/aops.c +++ b/fs/gfs2/aops.c @@ -249,22 +249,6 @@ static int gfs2_write_jdata_pagevec(struct address_space *mapping, for(i = 0; i < nr_pages; i++) { struct page *page = pvec->pages[i]; - /* - * At this point, the page may be truncated or - * invalidated (changing page->mapping to NULL), or - * even swizzled back from swapper_space to tmpfs file - * mapping. However, page->index will not change - * because we have a reference on the page. - */ - if (page->index > end) { - /* - * can't be range_cyclic (1st pass) because - * end == -1 in that case. - */ - ret = 1; - break; - } - *done_index = page->index; lock_page(page); @@ -382,8 +366,8 @@ retry: tag_pages_for_writeback(mapping, index, end); done_index = index; while (!done && (index <= end)) { - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag); if (nr_pages == 0) break; diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 57e3262ec57a..ee0da259a3d3 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -1538,6 +1538,8 @@ nfsd4_decode_getdeviceinfo(struct nfsd4_compoundargs *argp, gdev->gd_maxcount = be32_to_cpup(p++); num = be32_to_cpup(p++); if (num) { + if (num > 1000) + goto xdr_error; READ_BUF(4 * num); gdev->gd_notify_types = be32_to_cpup(p++); for (i = 1; i < num; i++) { diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c index 3a3821b00486..9deca59be7e5 100644 --- a/fs/nilfs2/btree.c +++ b/fs/nilfs2/btree.c @@ -2147,8 +2147,8 @@ static void nilfs_btree_lookup_dirty_buffers(struct nilfs_bmap *btree, pagevec_init(&pvec, 0); - while (pagevec_lookup_tag(&pvec, btcache, &index, PAGECACHE_TAG_DIRTY, - PAGEVEC_SIZE)) { + while (pagevec_lookup_tag(&pvec, btcache, &index, + PAGECACHE_TAG_DIRTY)) { for (i = 0; i < pagevec_count(&pvec); i++) { bh = head = page_buffers(pvec.pages[i]); do { diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 45d650addd56..447999563737 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -262,8 +262,7 @@ int nilfs_copy_dirty_pages(struct address_space *dmap, pagevec_init(&pvec, 0); repeat: - if (!pagevec_lookup_tag(&pvec, smap, &index, PAGECACHE_TAG_DIRTY, - PAGEVEC_SIZE)) + if (!pagevec_lookup_tag(&pvec, smap, &index, PAGECACHE_TAG_DIRTY)) return 0; for (i = 0; i < pagevec_count(&pvec); i++) { @@ -382,8 +381,8 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent) pagevec_init(&pvec, 0); - while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY, - PAGEVEC_SIZE)) { + while (pagevec_lookup_tag(&pvec, mapping, &index, + PAGECACHE_TAG_DIRTY)) { for (i = 0; i < pagevec_count(&pvec); i++) { struct page *page = pvec.pages[i]; diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 34c22fe4eca0..092c0496aef4 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -705,18 +705,14 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode, pagevec_init(&pvec, 0); repeat: if (unlikely(index > last) || - !pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY, - min_t(pgoff_t, last - index, - PAGEVEC_SIZE - 1) + 1)) + !pagevec_lookup_range_tag(&pvec, mapping, &index, last, + PAGECACHE_TAG_DIRTY)) return ndirties; for (i = 0; i < pagevec_count(&pvec); i++) { struct buffer_head *bh, *head; struct page *page = pvec.pages[i]; - if (unlikely(page->index > last)) - break; - lock_page(page); if (!page_has_buffers(page)) create_empty_buffers(page, i_blocksize(inode), 0); @@ -753,8 +749,8 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode, pagevec_init(&pvec, 0); - while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY, - PAGEVEC_SIZE)) { + while (pagevec_lookup_tag(&pvec, mapping, &index, + PAGECACHE_TAG_DIRTY)) { for (i = 0; i < pagevec_count(&pvec); i++) { bh = head = page_buffers(pvec.pages[i]); do { diff --git a/fs/ocfs2/cluster/nodemanager.c b/fs/ocfs2/cluster/nodemanager.c index 72afdca3cea7..3c45a9301a09 100644 --- a/fs/ocfs2/cluster/nodemanager.c +++ b/fs/ocfs2/cluster/nodemanager.c @@ -40,6 +40,9 @@ char *o2nm_fence_method_desc[O2NM_FENCE_METHODS] = { "panic", /* O2NM_FENCE_PANIC */ }; +static inline void o2nm_lock_subsystem(void); +static inline void o2nm_unlock_subsystem(void); + struct o2nm_node *o2nm_get_node_by_num(u8 node_num) { struct o2nm_node *node = NULL; @@ -181,7 +184,10 @@ static struct o2nm_cluster *to_o2nm_cluster_from_node(struct o2nm_node *node) { /* through the first node_set .parent * mycluster/nodes/mynode == o2nm_cluster->o2nm_node_group->o2nm_node */ - return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + if (node->nd_item.ci_parent) + return to_o2nm_cluster(node->nd_item.ci_parent->ci_parent); + else + return NULL; } enum { @@ -194,7 +200,7 @@ static ssize_t o2nm_node_num_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; int ret = 0; @@ -214,6 +220,13 @@ static ssize_t o2nm_node_num_store(struct config_item *item, const char *page, !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */ + o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + write_lock(&cluster->cl_nodes_lock); if (cluster->cl_nodes[tmp]) ret = -EEXIST; @@ -226,6 +239,8 @@ static ssize_t o2nm_node_num_store(struct config_item *item, const char *page, set_bit(tmp, cluster->cl_nodes_bitmap); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret; @@ -269,7 +284,7 @@ static ssize_t o2nm_node_ipv4_address_store(struct config_item *item, size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; int ret, i; struct rb_node **p, *parent; unsigned int octets[4]; @@ -286,6 +301,13 @@ static ssize_t o2nm_node_ipv4_address_store(struct config_item *item, be32_add_cpu(&ipv4_addr, octets[i] << (i * 8)); } + o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + o2nm_unlock_subsystem(); + return -EINVAL; + } + ret = 0; write_lock(&cluster->cl_nodes_lock); if (o2nm_node_ip_tree_lookup(cluster, ipv4_addr, &p, &parent)) @@ -298,6 +320,8 @@ static ssize_t o2nm_node_ipv4_address_store(struct config_item *item, rb_insert_color(&node->nd_ip_node, &cluster->cl_node_ip_tree); } write_unlock(&cluster->cl_nodes_lock); + o2nm_unlock_subsystem(); + if (ret) return ret; @@ -315,7 +339,7 @@ static ssize_t o2nm_node_local_store(struct config_item *item, const char *page, size_t count) { struct o2nm_node *node = to_o2nm_node(item); - struct o2nm_cluster *cluster = to_o2nm_cluster_from_node(node); + struct o2nm_cluster *cluster; unsigned long tmp; char *p = (char *)page; ssize_t ret; @@ -333,17 +357,26 @@ static ssize_t o2nm_node_local_store(struct config_item *item, const char *page, !test_bit(O2NM_NODE_ATTR_PORT, &node->nd_set_attributes)) return -EINVAL; /* XXX */ + o2nm_lock_subsystem(); + cluster = to_o2nm_cluster_from_node(node); + if (!cluster) { + ret = -EINVAL; + goto out; + } + /* the only failure case is trying to set a new local node * when a different one is already set */ if (tmp && tmp == cluster->cl_has_local && - cluster->cl_local_node != node->nd_num) - return -EBUSY; + cluster->cl_local_node != node->nd_num) { + ret = -EBUSY; + goto out; + } /* bring up the rx thread if we're setting the new local node. */ if (tmp && !cluster->cl_has_local) { ret = o2net_start_listening(node); if (ret) - return ret; + goto out; } if (!tmp && cluster->cl_has_local && @@ -358,7 +391,11 @@ static ssize_t o2nm_node_local_store(struct config_item *item, const char *page, cluster->cl_local_node = node->nd_num; } - return count; + ret = count; + +out: + o2nm_unlock_subsystem(); + return ret; } CONFIGFS_ATTR(o2nm_node_, num); @@ -750,6 +787,16 @@ static struct o2nm_cluster_group o2nm_cluster_group = { }, }; +static inline void o2nm_lock_subsystem(void) +{ + mutex_lock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + +static inline void o2nm_unlock_subsystem(void) +{ + mutex_unlock(&o2nm_cluster_group.cs_subsys.su_mutex); +} + int o2nm_depend_item(struct config_item *item) { return configfs_depend_item(&o2nm_cluster_group.cs_subsys, item); diff --git a/fs/proc/array.c b/fs/proc/array.c index d5c6f5b38617..161441f52ebf 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -79,6 +79,7 @@ #include <linux/delayacct.h> #include <linux/seq_file.h> #include <linux/pid_namespace.h> +#include <linux/prctl.h> #include <linux/ptrace.h> #include <linux/tracehook.h> #include <linux/string_helpers.h> @@ -332,6 +333,31 @@ static inline void task_seccomp(struct seq_file *m, struct task_struct *p) #ifdef CONFIG_SECCOMP seq_printf(m, "Seccomp:\t%d\n", p->seccomp.mode); #endif + seq_printf(m, "\nSpeculation_Store_Bypass:\t"); + switch (arch_prctl_spec_ctrl_get(p, PR_SPEC_STORE_BYPASS)) { + case -EINVAL: + seq_printf(m, "unknown"); + break; + case PR_SPEC_NOT_AFFECTED: + seq_printf(m, "not vulnerable"); + break; + case PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE: + seq_printf(m, "thread force mitigated"); + break; + case PR_SPEC_PRCTL | PR_SPEC_DISABLE: + seq_printf(m, "thread mitigated"); + break; + case PR_SPEC_PRCTL | PR_SPEC_ENABLE: + seq_printf(m, "thread vulnerable"); + break; + case PR_SPEC_DISABLE: + seq_printf(m, "globally mitigated"); + break; + default: + seq_printf(m, "vulnerable"); + break; + } + seq_putc(m, '\n'); } static inline void task_context_switch_counts(struct seq_file *m, diff --git a/fs/sdcardfs/main.c b/fs/sdcardfs/main.c index 30e0c431a1ea..27ec726e7a46 100644 --- a/fs/sdcardfs/main.c +++ b/fs/sdcardfs/main.c @@ -295,6 +295,13 @@ static int sdcardfs_read_super(struct vfsmount *mnt, struct super_block *sb, atomic_inc(&lower_sb->s_active); sdcardfs_set_lower_super(sb, lower_sb); + sb->s_stack_depth = lower_sb->s_stack_depth + 1; + if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { + pr_err("sdcardfs: maximum fs stacking depth exceeded\n"); + err = -EINVAL; + goto out_sput; + } + /* inherit maxbytes from lower file system */ sb->s_maxbytes = lower_sb->s_maxbytes; diff --git a/fs/squashfs/cache.c b/fs/squashfs/cache.c index 6785d086ab38..7d78c3d28bbd 100644 --- a/fs/squashfs/cache.c +++ b/fs/squashfs/cache.c @@ -340,6 +340,9 @@ int squashfs_read_metadata(struct super_block *sb, void *buffer, TRACE("Entered squashfs_read_metadata [%llx:%x]\n", *block, *offset); + if (unlikely(length < 0)) + return -EIO; + while (length) { entry = squashfs_cache_get(sb, msblk->block_cache, *block, 0); if (entry->error) { diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index 6f5ef8d7e55a..43d946617d2d 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -198,7 +198,11 @@ static long long read_indexes(struct super_block *sb, int n, } for (i = 0; i < blocks; i++) { - int size = le32_to_cpu(blist[i]); + int size = squashfs_block_size(blist[i]); + if (size < 0) { + err = size; + goto failure; + } block += SQUASHFS_COMPRESSED_SIZE_BLOCK(size); } n -= blocks; @@ -371,7 +375,7 @@ static int read_blocklist(struct inode *inode, int index, u64 *block) sizeof(size)); if (res < 0) return res; - return le32_to_cpu(size); + return squashfs_block_size(size); } /* Copy data into page cache */ diff --git a/fs/squashfs/fragment.c b/fs/squashfs/fragment.c index 0ed6edbc5c71..0681feab4a84 100644 --- a/fs/squashfs/fragment.c +++ b/fs/squashfs/fragment.c @@ -49,11 +49,16 @@ int squashfs_frag_lookup(struct super_block *sb, unsigned int fragment, u64 *fragment_block) { struct squashfs_sb_info *msblk = sb->s_fs_info; - int block = SQUASHFS_FRAGMENT_INDEX(fragment); - int offset = SQUASHFS_FRAGMENT_INDEX_OFFSET(fragment); - u64 start_block = le64_to_cpu(msblk->fragment_index[block]); + int block, offset, size; struct squashfs_fragment_entry fragment_entry; - int size; + u64 start_block; + + if (fragment >= msblk->fragments) + return -EIO; + block = SQUASHFS_FRAGMENT_INDEX(fragment); + offset = SQUASHFS_FRAGMENT_INDEX_OFFSET(fragment); + + start_block = le64_to_cpu(msblk->fragment_index[block]); size = squashfs_read_metadata(sb, &fragment_entry, &start_block, &offset, sizeof(fragment_entry)); @@ -61,9 +66,7 @@ int squashfs_frag_lookup(struct super_block *sb, unsigned int fragment, return size; *fragment_block = le64_to_cpu(fragment_entry.start_block); - size = le32_to_cpu(fragment_entry.size); - - return size; + return squashfs_block_size(fragment_entry.size); } diff --git a/fs/squashfs/squashfs_fs.h b/fs/squashfs/squashfs_fs.h index 506f4ba5b983..e66486366f02 100644 --- a/fs/squashfs/squashfs_fs.h +++ b/fs/squashfs/squashfs_fs.h @@ -129,6 +129,12 @@ #define SQUASHFS_COMPRESSED_BLOCK(B) (!((B) & SQUASHFS_COMPRESSED_BIT_BLOCK)) +static inline int squashfs_block_size(__le32 raw) +{ + u32 size = le32_to_cpu(raw); + return (size >> 25) ? -EIO : size; +} + /* * Inode number ops. Inodes consist of a compressed block number, and an * uncompressed offset within that block diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h index 8a6995de0277..3b767ce1e46d 100644 --- a/fs/squashfs/squashfs_fs_sb.h +++ b/fs/squashfs/squashfs_fs_sb.h @@ -75,6 +75,7 @@ struct squashfs_sb_info { unsigned short block_log; long long bytes_used; unsigned int inodes; + unsigned int fragments; int xattr_ids; }; #endif diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c index 61cd0b39ed0e..4ffbddb688aa 100644 --- a/fs/squashfs/super.c +++ b/fs/squashfs/super.c @@ -176,6 +176,7 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent) msblk->inode_table = le64_to_cpu(sblk->inode_table_start); msblk->directory_table = le64_to_cpu(sblk->directory_table_start); msblk->inodes = le32_to_cpu(sblk->inodes); + msblk->fragments = le32_to_cpu(sblk->fragments); flags = le16_to_cpu(sblk->flags); TRACE("Found valid superblock on %s\n", bdevname(sb->s_bdev, b)); @@ -186,7 +187,7 @@ static int squashfs_fill_super(struct super_block *sb, void *data, int silent) TRACE("Filesystem size %lld bytes\n", msblk->bytes_used); TRACE("Block size %d\n", msblk->block_size); TRACE("Number of inodes %d\n", msblk->inodes); - TRACE("Number of fragments %d\n", le32_to_cpu(sblk->fragments)); + TRACE("Number of fragments %d\n", msblk->fragments); TRACE("Number of ids %d\n", le16_to_cpu(sblk->no_ids)); TRACE("sblk->inode_table_start %llx\n", msblk->inode_table); TRACE("sblk->directory_table_start %llx\n", msblk->directory_table); @@ -273,7 +274,7 @@ allocate_id_index_table: sb->s_export_op = &squashfs_export_ops; handle_fragments: - fragments = le32_to_cpu(sblk->fragments); + fragments = msblk->fragments; if (fragments == 0) goto check_directory_table; diff --git a/include/crypto/internal/skcipher.h b/include/crypto/internal/skcipher.h index 2cf7a61ece59..ce6619c339fe 100644 --- a/include/crypto/internal/skcipher.h +++ b/include/crypto/internal/skcipher.h @@ -19,12 +19,46 @@ struct rtattr; +struct skcipher_instance { + void (*free)(struct skcipher_instance *inst); + union { + struct { + char head[offsetof(struct skcipher_alg, base)]; + struct crypto_instance base; + } s; + struct skcipher_alg alg; + }; +}; + struct crypto_skcipher_spawn { struct crypto_spawn base; }; extern const struct crypto_type crypto_givcipher_type; +static inline struct crypto_instance *skcipher_crypto_instance( + struct skcipher_instance *inst) +{ + return &inst->s.base; +} + +static inline struct skcipher_instance *skcipher_alg_instance( + struct crypto_skcipher *skcipher) +{ + return container_of(crypto_skcipher_alg(skcipher), + struct skcipher_instance, alg); +} + +static inline void *skcipher_instance_ctx(struct skcipher_instance *inst) +{ + return crypto_instance_ctx(skcipher_crypto_instance(inst)); +} + +static inline void skcipher_request_complete(struct skcipher_request *req, int err) +{ + req->base.complete(&req->base, err); +} + static inline void crypto_set_skcipher_spawn( struct crypto_skcipher_spawn *spawn, struct crypto_instance *inst) { @@ -33,6 +67,8 @@ static inline void crypto_set_skcipher_spawn( int crypto_grab_skcipher(struct crypto_skcipher_spawn *spawn, const char *name, u32 type, u32 mask); +int crypto_grab_skcipher2(struct crypto_skcipher_spawn *spawn, + const char *name, u32 type, u32 mask); struct crypto_alg *crypto_lookup_skcipher(const char *name, u32 type, u32 mask); @@ -47,6 +83,12 @@ static inline struct crypto_alg *crypto_skcipher_spawn_alg( return spawn->base.alg; } +static inline struct skcipher_alg *crypto_spawn_skcipher_alg( + struct crypto_skcipher_spawn *spawn) +{ + return container_of(spawn->base.alg, struct skcipher_alg, base); +} + static inline struct crypto_ablkcipher *crypto_spawn_skcipher( struct crypto_skcipher_spawn *spawn) { @@ -55,6 +97,25 @@ static inline struct crypto_ablkcipher *crypto_spawn_skcipher( crypto_skcipher_mask(0))); } +static inline struct crypto_skcipher *crypto_spawn_skcipher2( + struct crypto_skcipher_spawn *spawn) +{ + return crypto_spawn_tfm2(&spawn->base); +} + +static inline void crypto_skcipher_set_reqsize( + struct crypto_skcipher *skcipher, unsigned int reqsize) +{ + skcipher->reqsize = reqsize; +} + +int crypto_register_skcipher(struct skcipher_alg *alg); +void crypto_unregister_skcipher(struct skcipher_alg *alg); +int crypto_register_skciphers(struct skcipher_alg *algs, int count); +void crypto_unregister_skciphers(struct skcipher_alg *algs, int count); +int skcipher_register_instance(struct crypto_template *tmpl, + struct skcipher_instance *inst); + int skcipher_null_givencrypt(struct skcipher_givcrypt_request *req); int skcipher_null_givdecrypt(struct skcipher_givcrypt_request *req); const char *crypto_default_geniv(const struct crypto_alg *alg); @@ -122,5 +183,31 @@ static inline u32 skcipher_request_flags(struct skcipher_request *req) return req->base.flags; } +static inline unsigned int crypto_skcipher_alg_min_keysize( + struct skcipher_alg *alg) +{ + if ((alg->base.cra_flags & CRYPTO_ALG_TYPE_MASK) == + CRYPTO_ALG_TYPE_BLKCIPHER) + return alg->base.cra_blkcipher.min_keysize; + + if (alg->base.cra_ablkcipher.encrypt) + return alg->base.cra_ablkcipher.min_keysize; + + return alg->min_keysize; +} + +static inline unsigned int crypto_skcipher_alg_max_keysize( + struct skcipher_alg *alg) +{ + if ((alg->base.cra_flags & CRYPTO_ALG_TYPE_MASK) == + CRYPTO_ALG_TYPE_BLKCIPHER) + return alg->base.cra_blkcipher.max_keysize; + + if (alg->base.cra_ablkcipher.encrypt) + return alg->base.cra_ablkcipher.max_keysize; + + return alg->max_keysize; +} + #endif /* _CRYPTO_INTERNAL_SKCIPHER_H */ diff --git a/include/crypto/skcipher.h b/include/crypto/skcipher.h index fd8742a40ff3..5c90d3edf975 100644 --- a/include/crypto/skcipher.h +++ b/include/crypto/skcipher.h @@ -60,12 +60,80 @@ struct crypto_skcipher { unsigned int ivsize; unsigned int reqsize; - - bool has_setkey; + unsigned int keysize; struct crypto_tfm base; }; +/** + * struct skcipher_alg - symmetric key cipher definition + * @min_keysize: Minimum key size supported by the transformation. This is the + * smallest key length supported by this transformation algorithm. + * This must be set to one of the pre-defined values as this is + * not hardware specific. Possible values for this field can be + * found via git grep "_MIN_KEY_SIZE" include/crypto/ + * @max_keysize: Maximum key size supported by the transformation. This is the + * largest key length supported by this transformation algorithm. + * This must be set to one of the pre-defined values as this is + * not hardware specific. Possible values for this field can be + * found via git grep "_MAX_KEY_SIZE" include/crypto/ + * @setkey: Set key for the transformation. This function is used to either + * program a supplied key into the hardware or store the key in the + * transformation context for programming it later. Note that this + * function does modify the transformation context. This function can + * be called multiple times during the existence of the transformation + * object, so one must make sure the key is properly reprogrammed into + * the hardware. This function is also responsible for checking the key + * length for validity. In case a software fallback was put in place in + * the @cra_init call, this function might need to use the fallback if + * the algorithm doesn't support all of the key sizes. + * @encrypt: Encrypt a scatterlist of blocks. This function is used to encrypt + * the supplied scatterlist containing the blocks of data. The crypto + * API consumer is responsible for aligning the entries of the + * scatterlist properly and making sure the chunks are correctly + * sized. In case a software fallback was put in place in the + * @cra_init call, this function might need to use the fallback if + * the algorithm doesn't support all of the key sizes. In case the + * key was stored in transformation context, the key might need to be + * re-programmed into the hardware in this function. This function + * shall not modify the transformation context, as this function may + * be called in parallel with the same transformation object. + * @decrypt: Decrypt a single block. This is a reverse counterpart to @encrypt + * and the conditions are exactly the same. + * @init: Initialize the cryptographic transformation object. This function + * is used to initialize the cryptographic transformation object. + * This function is called only once at the instantiation time, right + * after the transformation context was allocated. In case the + * cryptographic hardware has some special requirements which need to + * be handled by software, this function shall check for the precise + * requirement of the transformation and put any software fallbacks + * in place. + * @exit: Deinitialize the cryptographic transformation object. This is a + * counterpart to @init, used to remove various changes set in + * @init. + * @ivsize: IV size applicable for transformation. The consumer must provide an + * IV of exactly that size to perform the encrypt or decrypt operation. + * @chunksize: Equal to the block size except for stream ciphers such as + * CTR where it is set to the underlying block size. + * + * All fields except @ivsize are mandatory and must be filled. + */ +struct skcipher_alg { + int (*setkey)(struct crypto_skcipher *tfm, const u8 *key, + unsigned int keylen); + int (*encrypt)(struct skcipher_request *req); + int (*decrypt)(struct skcipher_request *req); + int (*init)(struct crypto_skcipher *tfm); + void (*exit)(struct crypto_skcipher *tfm); + + unsigned int min_keysize; + unsigned int max_keysize; + unsigned int ivsize; + unsigned int chunksize; + + struct crypto_alg base; +}; + #define SKCIPHER_REQUEST_ON_STACK(name, tfm) \ char __##name##_desc[sizeof(struct skcipher_request) + \ crypto_skcipher_reqsize(tfm)] CRYPTO_MINALIGN_ATTR; \ @@ -233,6 +301,43 @@ static inline int crypto_has_skcipher(const char *alg_name, u32 type, } /** + * crypto_has_skcipher2() - Search for the availability of an skcipher. + * @alg_name: is the cra_name / name or cra_driver_name / driver name of the + * skcipher + * @type: specifies the type of the skcipher + * @mask: specifies the mask for the skcipher + * + * Return: true when the skcipher is known to the kernel crypto API; false + * otherwise + */ +int crypto_has_skcipher2(const char *alg_name, u32 type, u32 mask); + +static inline const char *crypto_skcipher_driver_name( + struct crypto_skcipher *tfm) +{ + return crypto_tfm_alg_name(crypto_skcipher_tfm(tfm)); +} + +static inline struct skcipher_alg *crypto_skcipher_alg( + struct crypto_skcipher *tfm) +{ + return container_of(crypto_skcipher_tfm(tfm)->__crt_alg, + struct skcipher_alg, base); +} + +static inline unsigned int crypto_skcipher_alg_ivsize(struct skcipher_alg *alg) +{ + if ((alg->base.cra_flags & CRYPTO_ALG_TYPE_MASK) == + CRYPTO_ALG_TYPE_BLKCIPHER) + return alg->base.cra_blkcipher.ivsize; + + if (alg->base.cra_ablkcipher.encrypt) + return alg->base.cra_ablkcipher.ivsize; + + return alg->ivsize; +} + +/** * crypto_skcipher_ivsize() - obtain IV size * @tfm: cipher handle * @@ -246,6 +351,36 @@ static inline unsigned int crypto_skcipher_ivsize(struct crypto_skcipher *tfm) return tfm->ivsize; } +static inline unsigned int crypto_skcipher_alg_chunksize( + struct skcipher_alg *alg) +{ + if ((alg->base.cra_flags & CRYPTO_ALG_TYPE_MASK) == + CRYPTO_ALG_TYPE_BLKCIPHER) + return alg->base.cra_blocksize; + + if (alg->base.cra_ablkcipher.encrypt) + return alg->base.cra_blocksize; + + return alg->chunksize; +} + +/** + * crypto_skcipher_chunksize() - obtain chunk size + * @tfm: cipher handle + * + * The block size is set to one for ciphers such as CTR. However, + * you still need to provide incremental updates in multiples of + * the underlying block size as the IV does not have sub-block + * granularity. This is known in this API as the chunk size. + * + * Return: chunk size in bytes + */ +static inline unsigned int crypto_skcipher_chunksize( + struct crypto_skcipher *tfm) +{ + return crypto_skcipher_alg_chunksize(crypto_skcipher_alg(tfm)); +} + /** * crypto_skcipher_blocksize() - obtain block size of cipher * @tfm: cipher handle @@ -309,7 +444,13 @@ static inline int crypto_skcipher_setkey(struct crypto_skcipher *tfm, static inline bool crypto_skcipher_has_setkey(struct crypto_skcipher *tfm) { - return tfm->has_setkey; + return tfm->keysize; +} + +static inline unsigned int crypto_skcipher_default_keysize( + struct crypto_skcipher *tfm) +{ + return tfm->keysize; } /** diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h index bb9d0deca07c..0fb4975fae91 100644 --- a/include/drm/drm_dp_helper.h +++ b/include/drm/drm_dp_helper.h @@ -342,6 +342,7 @@ # define DP_PSR_FRAME_CAPTURE (1 << 3) # define DP_PSR_SELECTIVE_UPDATE (1 << 4) # define DP_PSR_IRQ_HPD_WITH_CRC_ERRORS (1 << 5) +# define DP_PSR_ENABLE_PSR2 (1 << 6) /* eDP 1.4a */ #define DP_ADAPTER_CTRL 0x1a0 # define DP_ADAPTER_CTRL_FORCE_LOAD_SENSE (1 << 0) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 557dae96ce74..143d40e8a1ea 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -65,25 +65,40 @@ #endif /* + * Feature detection for gnu_inline (gnu89 extern inline semantics). Either + * __GNUC_STDC_INLINE__ is defined (not using gnu89 extern inline semantics, + * and we opt in to the gnu89 semantics), or __GNUC_STDC_INLINE__ is not + * defined so the gnu89 semantics are the default. + */ +#ifdef __GNUC_STDC_INLINE__ +# define __gnu_inline __attribute__((gnu_inline)) +#else +# define __gnu_inline +#endif + +/* * Force always-inline if the user requests it so via the .config, * or if gcc is too old. * GCC does not warn about unused static inline functions for * -Wunused-function. This turns out to avoid the need for complex #ifdef * directives. Suppress the warning in clang as well by using "unused" * function attribute, which is redundant but not harmful for gcc. + * Prefer gnu_inline, so that extern inline functions do not emit an + * externally visible function. This makes extern inline behave as per gnu89 + * semantics rather than c99. This prevents multiple symbol definition errors + * of extern inline functions at link time. + * A lot of inline functions can cause havoc with function tracing. */ #if !defined(CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING) || \ !defined(CONFIG_OPTIMIZE_INLINING) || (__GNUC__ < 4) -#define inline inline __attribute__((always_inline,unused)) notrace -#define __inline__ __inline__ __attribute__((always_inline,unused)) notrace -#define __inline __inline __attribute__((always_inline,unused)) notrace +#define inline \ + inline __attribute__((always_inline, unused)) notrace __gnu_inline #else -/* A lot of inline functions can cause havoc with function tracing */ -#define inline inline __attribute__((unused)) notrace -#define __inline__ __inline__ __attribute__((unused)) notrace -#define __inline __inline __attribute__((unused)) notrace +#define inline inline __attribute__((unused)) notrace __gnu_inline #endif +#define __inline__ inline +#define __inline inline #define __always_inline inline __attribute__((always_inline)) #define noinline __attribute__((noinline)) diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 6d4c3d55d29c..76efa14a6959 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -59,6 +59,8 @@ extern ssize_t cpu_show_spectre_v1(struct device *dev, struct device_attribute *attr, char *buf); extern ssize_t cpu_show_spectre_v2(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_spec_store_bypass(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, diff --git a/include/linux/cpufreq_times.h b/include/linux/cpufreq_times.h index 3fb38750c853..356a3fad03c9 100644 --- a/include/linux/cpufreq_times.h +++ b/include/linux/cpufreq_times.h @@ -22,6 +22,7 @@ #ifdef CONFIG_CPU_FREQ_TIMES void cpufreq_task_times_init(struct task_struct *p); +void cpufreq_task_times_alloc(struct task_struct *p); void cpufreq_task_times_exit(struct task_struct *p); int proc_time_in_state_show(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *p); @@ -31,6 +32,11 @@ void cpufreq_times_record_transition(struct cpufreq_freqs *freq); void cpufreq_task_times_remove_uids(uid_t uid_start, uid_t uid_end); int single_uid_time_in_state_open(struct inode *inode, struct file *file); #else +static inline void cpufreq_task_times_init(struct task_struct *p) {} +static inline void cpufreq_task_times_alloc(struct task_struct *p) {} +static inline void cpufreq_task_times_exit(struct task_struct *p) {} +static inline void cpufreq_acct_update_power(struct task_struct *p, + u64 cputime) {} static inline void cpufreq_times_create_policy(struct cpufreq_policy *policy) {} static inline void cpufreq_times_record_transition( struct cpufreq_freqs *freq) {} diff --git a/include/linux/crypto.h b/include/linux/crypto.h index b7c1e1a7ebac..d7c8b37b2e95 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -48,6 +48,7 @@ #define CRYPTO_ALG_TYPE_AEAD 0x00000003 #define CRYPTO_ALG_TYPE_BLKCIPHER 0x00000004 #define CRYPTO_ALG_TYPE_ABLKCIPHER 0x00000005 +#define CRYPTO_ALG_TYPE_SKCIPHER 0x00000005 #define CRYPTO_ALG_TYPE_GIVCIPHER 0x00000006 #define CRYPTO_ALG_TYPE_DIGEST 0x00000008 #define CRYPTO_ALG_TYPE_HASH 0x00000008 diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index f3422440c45f..eecc7240ddfc 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -17,6 +17,7 @@ #define __DMA_IOMMU_H #ifdef __KERNEL__ +#include <linux/types.h> #include <asm/errno.h> #ifdef CONFIG_IOMMU_DMA diff --git a/include/linux/fscrypt_notsupp.h b/include/linux/fscrypt_notsupp.h index 44bd4fbd3ec5..e2729c6d9829 100644 --- a/include/linux/fscrypt_notsupp.h +++ b/include/linux/fscrypt_notsupp.h @@ -67,16 +67,6 @@ static inline void fscrypt_restore_control_page(struct page *page) return; } -static inline void fscrypt_set_d_op(struct dentry *dentry) -{ - return; -} - -static inline void fscrypt_set_encrypted_dentry(struct dentry *dentry) -{ - return; -} - /* policy.c */ static inline int fscrypt_ioctl_set_policy(struct file *filp, const void __user *arg) diff --git a/include/linux/fscrypt_supp.h b/include/linux/fscrypt_supp.h index 9d1857302b73..46b62d82b6d6 100644 --- a/include/linux/fscrypt_supp.h +++ b/include/linux/fscrypt_supp.h @@ -28,7 +28,7 @@ struct fscrypt_operations { int (*set_context)(struct inode *, const void *, size_t, void *); bool (*dummy_context)(struct inode *); bool (*empty_dir)(struct inode *); - unsigned (*max_namelen)(struct inode *); + unsigned int max_namelen; }; struct fscrypt_ctx { @@ -74,20 +74,6 @@ static inline struct page *fscrypt_control_page(struct page *page) extern void fscrypt_restore_control_page(struct page *); -extern const struct dentry_operations fscrypt_d_ops; - -static inline void fscrypt_set_d_op(struct dentry *dentry) -{ - d_set_d_op(dentry, &fscrypt_d_ops); -} - -static inline void fscrypt_set_encrypted_dentry(struct dentry *dentry) -{ - spin_lock(&dentry->d_lock); - dentry->d_flags |= DCACHE_ENCRYPTED_WITH_KEY; - spin_unlock(&dentry->d_lock); -} - /* policy.c */ extern int fscrypt_ioctl_set_policy(struct file *, const void __user *); extern int fscrypt_ioctl_get_policy(struct file *, void __user *); diff --git a/include/linux/fscrypto.h b/include/linux/fscrypto.h new file mode 100644 index 000000000000..e6e53a36104b --- /dev/null +++ b/include/linux/fscrypto.h @@ -0,0 +1,411 @@ +/* + * General per-file encryption definition + * + * Copyright (C) 2015, Google, Inc. + * + * Written by Michael Halcrow, 2015. + * Modified by Jaegeuk Kim, 2015. + */ + +#ifndef _LINUX_FSCRYPTO_H +#define _LINUX_FSCRYPTO_H + +#include <linux/key.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/bio.h> +#include <linux/dcache.h> +#include <crypto/skcipher.h> +#include <uapi/linux/fs.h> + +#define FS_KEY_DERIVATION_NONCE_SIZE 16 +#define FS_ENCRYPTION_CONTEXT_FORMAT_V1 1 + +#define FS_POLICY_FLAGS_PAD_4 0x00 +#define FS_POLICY_FLAGS_PAD_8 0x01 +#define FS_POLICY_FLAGS_PAD_16 0x02 +#define FS_POLICY_FLAGS_PAD_32 0x03 +#define FS_POLICY_FLAGS_PAD_MASK 0x03 +#define FS_POLICY_FLAGS_VALID 0x03 + +/* Encryption algorithms */ +#define FS_ENCRYPTION_MODE_INVALID 0 +#define FS_ENCRYPTION_MODE_AES_256_XTS 1 +#define FS_ENCRYPTION_MODE_AES_256_GCM 2 +#define FS_ENCRYPTION_MODE_AES_256_CBC 3 +#define FS_ENCRYPTION_MODE_AES_256_CTS 4 + +/** + * Encryption context for inode + * + * Protector format: + * 1 byte: Protector format (1 = this version) + * 1 byte: File contents encryption mode + * 1 byte: File names encryption mode + * 1 byte: Flags + * 8 bytes: Master Key descriptor + * 16 bytes: Encryption Key derivation nonce + */ +struct fscrypt_context { + u8 format; + u8 contents_encryption_mode; + u8 filenames_encryption_mode; + u8 flags; + u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE]; + u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE]; +} __packed; + +/* Encryption parameters */ +#define FS_XTS_TWEAK_SIZE 16 +#define FS_AES_128_ECB_KEY_SIZE 16 +#define FS_AES_256_GCM_KEY_SIZE 32 +#define FS_AES_256_CBC_KEY_SIZE 32 +#define FS_AES_256_CTS_KEY_SIZE 32 +#define FS_AES_256_XTS_KEY_SIZE 64 +#define FS_MAX_KEY_SIZE 64 + +#define FS_KEY_DESC_PREFIX "fscrypt:" +#define FS_KEY_DESC_PREFIX_SIZE 8 + +/* This is passed in from userspace into the kernel keyring */ +struct fscrypt_key { + u32 mode; + u8 raw[FS_MAX_KEY_SIZE]; + u32 size; +} __packed; + +struct fscrypt_info { + u8 ci_data_mode; + u8 ci_filename_mode; + u8 ci_flags; + struct crypto_skcipher *ci_ctfm; + struct key *ci_keyring_key; + u8 ci_master_key[FS_KEY_DESCRIPTOR_SIZE]; +}; + +#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x00000001 +#define FS_WRITE_PATH_FL 0x00000002 + +struct fscrypt_ctx { + union { + struct { + struct page *bounce_page; /* Ciphertext page */ + struct page *control_page; /* Original page */ + } w; + struct { + struct bio *bio; + struct work_struct work; + } r; + struct list_head free_list; /* Free list */ + }; + u8 flags; /* Flags */ + u8 mode; /* Encryption mode for tfm */ +}; + +struct fscrypt_completion_result { + struct completion completion; + int res; +}; + +#define DECLARE_FS_COMPLETION_RESULT(ecr) \ + struct fscrypt_completion_result ecr = { \ + COMPLETION_INITIALIZER((ecr).completion), 0 } + +#define FS_FNAME_NUM_SCATTER_ENTRIES 4 +#define FS_CRYPTO_BLOCK_SIZE 16 +#define FS_FNAME_CRYPTO_DIGEST_SIZE 32 + +/** + * For encrypted symlinks, the ciphertext length is stored at the beginning + * of the string in little-endian format. + */ +struct fscrypt_symlink_data { + __le16 len; + char encrypted_path[1]; +} __packed; + +/** + * This function is used to calculate the disk space required to + * store a filename of length l in encrypted symlink format. + */ +static inline u32 fscrypt_symlink_data_len(u32 l) +{ + if (l < FS_CRYPTO_BLOCK_SIZE) + l = FS_CRYPTO_BLOCK_SIZE; + return (l + sizeof(struct fscrypt_symlink_data) - 1); +} + +struct fscrypt_str { + unsigned char *name; + u32 len; +}; + +struct fscrypt_name { + const struct qstr *usr_fname; + struct fscrypt_str disk_name; + u32 hash; + u32 minor_hash; + struct fscrypt_str crypto_buf; +}; + +#define FSTR_INIT(n, l) { .name = n, .len = l } +#define FSTR_TO_QSTR(f) QSTR_INIT((f)->name, (f)->len) +#define fname_name(p) ((p)->disk_name.name) +#define fname_len(p) ((p)->disk_name.len) + +/* + * crypto opertions for filesystems + */ +struct fscrypt_operations { + int (*get_context)(struct inode *, void *, size_t); + int (*key_prefix)(struct inode *, u8 **); + int (*prepare_context)(struct inode *); + int (*set_context)(struct inode *, const void *, size_t, void *); + int (*dummy_context)(struct inode *); + bool (*is_encrypted)(struct inode *); + bool (*empty_dir)(struct inode *); + unsigned (*max_namelen)(struct inode *); +}; + +static inline bool fscrypt_dummy_context_enabled(struct inode *inode) +{ + if (inode->i_sb->s_cop->dummy_context && + inode->i_sb->s_cop->dummy_context(inode)) + return true; + return false; +} + +static inline bool fscrypt_valid_contents_enc_mode(u32 mode) +{ + return (mode == FS_ENCRYPTION_MODE_AES_256_XTS); +} + +static inline bool fscrypt_valid_filenames_enc_mode(u32 mode) +{ + return (mode == FS_ENCRYPTION_MODE_AES_256_CTS); +} + +static inline bool fscrypt_is_dot_dotdot(const struct qstr *str) +{ + if (str->len == 1 && str->name[0] == '.') + return true; + + if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.') + return true; + + return false; +} + +static inline struct page *fscrypt_control_page(struct page *page) +{ +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + return ((struct fscrypt_ctx *)page_private(page))->w.control_page; +#else + WARN_ON_ONCE(1); + return ERR_PTR(-EINVAL); +#endif +} + +static inline int fscrypt_has_encryption_key(struct inode *inode) +{ +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + return (inode->i_crypt_info != NULL); +#else + return 0; +#endif +} + +static inline void fscrypt_set_encrypted_dentry(struct dentry *dentry) +{ +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + spin_lock(&dentry->d_lock); + dentry->d_flags |= DCACHE_ENCRYPTED_WITH_KEY; + spin_unlock(&dentry->d_lock); +#endif +} + +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) +extern const struct dentry_operations fscrypt_d_ops; +#endif + +static inline void fscrypt_set_d_op(struct dentry *dentry) +{ +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) + d_set_d_op(dentry, &fscrypt_d_ops); +#endif +} + +#if IS_ENABLED(CONFIG_FS_ENCRYPTION) +/* crypto.c */ +extern struct kmem_cache *fscrypt_info_cachep; +int fscrypt_initialize(void); + +extern struct fscrypt_ctx *fscrypt_get_ctx(struct inode *, gfp_t); +extern void fscrypt_release_ctx(struct fscrypt_ctx *); +extern struct page *fscrypt_encrypt_page(struct inode *, struct page *, gfp_t); +extern int fscrypt_decrypt_page(struct page *); +extern void fscrypt_decrypt_bio_pages(struct fscrypt_ctx *, struct bio *); +extern void fscrypt_pullback_bio_page(struct page **, bool); +extern void fscrypt_restore_control_page(struct page *); +extern int fscrypt_zeroout_range(struct inode *, pgoff_t, sector_t, + unsigned int); +/* policy.c */ +extern int fscrypt_ioctl_set_policy(struct file *, const void __user *); +extern int fscrypt_ioctl_get_policy(struct file *, void __user *); +extern int fscrypt_has_permitted_context(struct inode *, struct inode *); +extern int fscrypt_inherit_context(struct inode *, struct inode *, + void *, bool); +/* keyinfo.c */ +extern int get_crypt_info(struct inode *); +extern int fscrypt_get_encryption_info(struct inode *); +extern void fscrypt_put_encryption_info(struct inode *, struct fscrypt_info *); + +/* fname.c */ +extern int fscrypt_setup_filename(struct inode *, const struct qstr *, + int lookup, struct fscrypt_name *); +extern void fscrypt_free_filename(struct fscrypt_name *); +extern u32 fscrypt_fname_encrypted_size(struct inode *, u32); +extern int fscrypt_fname_alloc_buffer(struct inode *, u32, + struct fscrypt_str *); +extern void fscrypt_fname_free_buffer(struct fscrypt_str *); +extern int fscrypt_fname_disk_to_usr(struct inode *, u32, u32, + const struct fscrypt_str *, struct fscrypt_str *); +extern int fscrypt_fname_usr_to_disk(struct inode *, const struct qstr *, + struct fscrypt_str *); +#endif + +/* crypto.c */ +static inline struct fscrypt_ctx *fscrypt_notsupp_get_ctx(struct inode *i, + gfp_t f) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline void fscrypt_notsupp_release_ctx(struct fscrypt_ctx *c) +{ + return; +} + +static inline struct page *fscrypt_notsupp_encrypt_page(struct inode *i, + struct page *p, gfp_t f) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline int fscrypt_notsupp_decrypt_page(struct page *p) +{ + return -EOPNOTSUPP; +} + +static inline void fscrypt_notsupp_decrypt_bio_pages(struct fscrypt_ctx *c, + struct bio *b) +{ + return; +} + +static inline void fscrypt_notsupp_pullback_bio_page(struct page **p, bool b) +{ + return; +} + +static inline void fscrypt_notsupp_restore_control_page(struct page *p) +{ + return; +} + +static inline int fscrypt_notsupp_zeroout_range(struct inode *i, pgoff_t p, + sector_t s, unsigned int f) +{ + return -EOPNOTSUPP; +} + +/* policy.c */ +static inline int fscrypt_notsupp_ioctl_set_policy(struct file *f, + const void __user *arg) +{ + return -EOPNOTSUPP; +} + +static inline int fscrypt_notsupp_ioctl_get_policy(struct file *f, + void __user *arg) +{ + return -EOPNOTSUPP; +} + +static inline int fscrypt_notsupp_has_permitted_context(struct inode *p, + struct inode *i) +{ + return 0; +} + +static inline int fscrypt_notsupp_inherit_context(struct inode *p, + struct inode *i, void *v, bool b) +{ + return -EOPNOTSUPP; +} + +/* keyinfo.c */ +static inline int fscrypt_notsupp_get_encryption_info(struct inode *i) +{ + return -EOPNOTSUPP; +} + +static inline void fscrypt_notsupp_put_encryption_info(struct inode *i, + struct fscrypt_info *f) +{ + return; +} + + /* fname.c */ +static inline int fscrypt_notsupp_setup_filename(struct inode *dir, + const struct qstr *iname, + int lookup, struct fscrypt_name *fname) +{ + if (dir->i_sb->s_cop->is_encrypted(dir)) + return -EOPNOTSUPP; + + memset(fname, 0, sizeof(struct fscrypt_name)); + fname->usr_fname = iname; + fname->disk_name.name = (unsigned char *)iname->name; + fname->disk_name.len = iname->len; + return 0; +} + +static inline void fscrypt_notsupp_free_filename(struct fscrypt_name *fname) +{ + return; +} + +static inline u32 fscrypt_notsupp_fname_encrypted_size(struct inode *i, u32 s) +{ + /* never happens */ + WARN_ON(1); + return 0; +} + +static inline int fscrypt_notsupp_fname_alloc_buffer(struct inode *inode, + u32 ilen, struct fscrypt_str *crypto_str) +{ + return -EOPNOTSUPP; +} + +static inline void fscrypt_notsupp_fname_free_buffer(struct fscrypt_str *c) +{ + return; +} + +static inline int fscrypt_notsupp_fname_disk_to_usr(struct inode *inode, + u32 hash, u32 minor_hash, + const struct fscrypt_str *iname, + struct fscrypt_str *oname) +{ + return -EOPNOTSUPP; +} + +static inline int fscrypt_notsupp_fname_usr_to_disk(struct inode *inode, + const struct qstr *iname, + struct fscrypt_str *oname) +{ + return -EOPNOTSUPP; +} +#endif /* _LINUX_FSCRYPTO_H */ diff --git a/include/linux/mmc/sdio_ids.h b/include/linux/mmc/sdio_ids.h index 83430f2ea757..e0325706b76d 100644 --- a/include/linux/mmc/sdio_ids.h +++ b/include/linux/mmc/sdio_ids.h @@ -33,6 +33,7 @@ #define SDIO_DEVICE_ID_BROADCOM_43341 0xa94d #define SDIO_DEVICE_ID_BROADCOM_4335_4339 0x4335 #define SDIO_DEVICE_ID_BROADCOM_43362 0xa962 +#define SDIO_DEVICE_ID_BROADCOM_43364 0xa9a4 #define SDIO_DEVICE_ID_BROADCOM_43430 0xa9a6 #define SDIO_DEVICE_ID_BROADCOM_4345 0x4345 #define SDIO_DEVICE_ID_BROADCOM_4354 0x4354 diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h b/include/linux/netfilter/ipset/ip_set_timeout.h index 1d6a935c1ac5..8793f5a7b820 100644 --- a/include/linux/netfilter/ipset/ip_set_timeout.h +++ b/include/linux/netfilter/ipset/ip_set_timeout.h @@ -65,8 +65,14 @@ ip_set_timeout_set(unsigned long *timeout, u32 value) static inline u32 ip_set_timeout_get(unsigned long *timeout) { - return *timeout == IPSET_ELEM_PERMANENT ? 0 : - jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC; + u32 t; + + if (*timeout == IPSET_ELEM_PERMANENT) + return 0; + + t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC; + /* Zero value in userspace means no timeout */ + return t == 0 ? 1 : t; } #endif /* __KERNEL__ */ diff --git a/include/linux/nospec.h b/include/linux/nospec.h index e791ebc65c9c..0c5ef54fd416 100644 --- a/include/linux/nospec.h +++ b/include/linux/nospec.h @@ -7,6 +7,8 @@ #define _LINUX_NOSPEC_H #include <asm/barrier.h> +struct task_struct; + /** * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise * @index: array element index @@ -55,4 +57,12 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, \ (typeof(_i)) (_i & _mask); \ }) + +/* Speculation control prctl */ +int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which); +int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, + unsigned long ctrl); +/* Speculation control for seccomp enforced mitigation */ +void arch_seccomp_spec_mitigate(struct task_struct *task); + #endif /* _LINUX_NOSPEC_H */ diff --git a/include/linux/overflow.h b/include/linux/overflow.h new file mode 100644 index 000000000000..8712ff70995f --- /dev/null +++ b/include/linux/overflow.h @@ -0,0 +1,278 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +#ifndef __LINUX_OVERFLOW_H +#define __LINUX_OVERFLOW_H + +#include <linux/compiler.h> + +/* + * In the fallback code below, we need to compute the minimum and + * maximum values representable in a given type. These macros may also + * be useful elsewhere, so we provide them outside the + * COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW block. + * + * It would seem more obvious to do something like + * + * #define type_min(T) (T)(is_signed_type(T) ? (T)1 << (8*sizeof(T)-1) : 0) + * #define type_max(T) (T)(is_signed_type(T) ? ((T)1 << (8*sizeof(T)-1)) - 1 : ~(T)0) + * + * Unfortunately, the middle expressions, strictly speaking, have + * undefined behaviour, and at least some versions of gcc warn about + * the type_max expression (but not if -fsanitize=undefined is in + * effect; in that case, the warning is deferred to runtime...). + * + * The slightly excessive casting in type_min is to make sure the + * macros also produce sensible values for the exotic type _Bool. [The + * overflow checkers only almost work for _Bool, but that's + * a-feature-not-a-bug, since people shouldn't be doing arithmetic on + * _Bools. Besides, the gcc builtins don't allow _Bool* as third + * argument.] + * + * Idea stolen from + * https://mail-index.netbsd.org/tech-misc/2007/02/05/0000.html - + * credit to Christian Biere. + */ +#define is_signed_type(type) (((type)(-1)) < (type)1) +#define __type_half_max(type) ((type)1 << (8*sizeof(type) - 1 - is_signed_type(type))) +#define type_max(T) ((T)((__type_half_max(T) - 1) + __type_half_max(T))) +#define type_min(T) ((T)((T)-type_max(T)-(T)1)) + + +#ifdef COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW +/* + * For simplicity and code hygiene, the fallback code below insists on + * a, b and *d having the same type (similar to the min() and max() + * macros), whereas gcc's type-generic overflow checkers accept + * different types. Hence we don't just make check_add_overflow an + * alias for __builtin_add_overflow, but add type checks similar to + * below. + */ +#define check_add_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + __builtin_add_overflow(__a, __b, __d); \ +}) + +#define check_sub_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + __builtin_sub_overflow(__a, __b, __d); \ +}) + +#define check_mul_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + __builtin_mul_overflow(__a, __b, __d); \ +}) + +#else + + +/* Checking for unsigned overflow is relatively easy without causing UB. */ +#define __unsigned_add_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + *__d = __a + __b; \ + *__d < __a; \ +}) +#define __unsigned_sub_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + *__d = __a - __b; \ + __a < __b; \ +}) +/* + * If one of a or b is a compile-time constant, this avoids a division. + */ +#define __unsigned_mul_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + *__d = __a * __b; \ + __builtin_constant_p(__b) ? \ + __b > 0 && __a > type_max(typeof(__a)) / __b : \ + __a > 0 && __b > type_max(typeof(__b)) / __a; \ +}) + +/* + * For signed types, detecting overflow is much harder, especially if + * we want to avoid UB. But the interface of these macros is such that + * we must provide a result in *d, and in fact we must produce the + * result promised by gcc's builtins, which is simply the possibly + * wrapped-around value. Fortunately, we can just formally do the + * operations in the widest relevant unsigned type (u64) and then + * truncate the result - gcc is smart enough to generate the same code + * with and without the (u64) casts. + */ + +/* + * Adding two signed integers can overflow only if they have the same + * sign, and overflow has happened iff the result has the opposite + * sign. + */ +#define __signed_add_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + *__d = (u64)__a + (u64)__b; \ + (((~(__a ^ __b)) & (*__d ^ __a)) \ + & type_min(typeof(__a))) != 0; \ +}) + +/* + * Subtraction is similar, except that overflow can now happen only + * when the signs are opposite. In this case, overflow has happened if + * the result has the opposite sign of a. + */ +#define __signed_sub_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + *__d = (u64)__a - (u64)__b; \ + ((((__a ^ __b)) & (*__d ^ __a)) \ + & type_min(typeof(__a))) != 0; \ +}) + +/* + * Signed multiplication is rather hard. gcc always follows C99, so + * division is truncated towards 0. This means that we can write the + * overflow check like this: + * + * (a > 0 && (b > MAX/a || b < MIN/a)) || + * (a < -1 && (b > MIN/a || b < MAX/a) || + * (a == -1 && b == MIN) + * + * The redundant casts of -1 are to silence an annoying -Wtype-limits + * (included in -Wextra) warning: When the type is u8 or u16, the + * __b_c_e in check_mul_overflow obviously selects + * __unsigned_mul_overflow, but unfortunately gcc still parses this + * code and warns about the limited range of __b. + */ + +#define __signed_mul_overflow(a, b, d) ({ \ + typeof(a) __a = (a); \ + typeof(b) __b = (b); \ + typeof(d) __d = (d); \ + typeof(a) __tmax = type_max(typeof(a)); \ + typeof(a) __tmin = type_min(typeof(a)); \ + (void) (&__a == &__b); \ + (void) (&__a == __d); \ + *__d = (u64)__a * (u64)__b; \ + (__b > 0 && (__a > __tmax/__b || __a < __tmin/__b)) || \ + (__b < (typeof(__b))-1 && (__a > __tmin/__b || __a < __tmax/__b)) || \ + (__b == (typeof(__b))-1 && __a == __tmin); \ +}) + + +#define check_add_overflow(a, b, d) \ + __builtin_choose_expr(is_signed_type(typeof(a)), \ + __signed_add_overflow(a, b, d), \ + __unsigned_add_overflow(a, b, d)) + +#define check_sub_overflow(a, b, d) \ + __builtin_choose_expr(is_signed_type(typeof(a)), \ + __signed_sub_overflow(a, b, d), \ + __unsigned_sub_overflow(a, b, d)) + +#define check_mul_overflow(a, b, d) \ + __builtin_choose_expr(is_signed_type(typeof(a)), \ + __signed_mul_overflow(a, b, d), \ + __unsigned_mul_overflow(a, b, d)) + + +#endif /* COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW */ + +/** + * array_size() - Calculate size of 2-dimensional array. + * + * @a: dimension one + * @b: dimension two + * + * Calculates size of 2-dimensional array: @a * @b. + * + * Returns: number of bytes needed to represent the array or SIZE_MAX on + * overflow. + */ +static inline __must_check size_t array_size(size_t a, size_t b) +{ + size_t bytes; + + if (check_mul_overflow(a, b, &bytes)) + return SIZE_MAX; + + return bytes; +} + +/** + * array3_size() - Calculate size of 3-dimensional array. + * + * @a: dimension one + * @b: dimension two + * @c: dimension three + * + * Calculates size of 3-dimensional array: @a * @b * @c. + * + * Returns: number of bytes needed to represent the array or SIZE_MAX on + * overflow. + */ +static inline __must_check size_t array3_size(size_t a, size_t b, size_t c) +{ + size_t bytes; + + if (check_mul_overflow(a, b, &bytes)) + return SIZE_MAX; + if (check_mul_overflow(bytes, c, &bytes)) + return SIZE_MAX; + + return bytes; +} + +static inline __must_check size_t __ab_c_size(size_t n, size_t size, size_t c) +{ + size_t bytes; + + if (check_mul_overflow(n, size, &bytes)) + return SIZE_MAX; + if (check_add_overflow(bytes, c, &bytes)) + return SIZE_MAX; + + return bytes; +} + +/** + * struct_size() - Calculate size of structure with trailing array. + * @p: Pointer to the structure. + * @member: Name of the array member. + * @n: Number of elements in the array. + * + * Calculates size of memory needed for structure @p followed by an + * array of @n @member elements. + * + * Return: number of bytes needed or SIZE_MAX on overflow. + */ +#define struct_size(p, member, n) \ + __ab_c_size(n, \ + sizeof(*(p)->member) + __must_be_array((p)->member),\ + sizeof(*(p))) + +#endif /* __LINUX_OVERFLOW_H */ diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 771774e13f10..443a063e6af8 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -359,8 +359,16 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start, unsigned int nr_pages, struct page **pages); unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start, unsigned int nr_pages, struct page **pages); -unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index, - int tag, unsigned int nr_pages, struct page **pages); +unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, + pgoff_t end, int tag, unsigned int nr_pages, + struct page **pages); +static inline unsigned find_get_pages_tag(struct address_space *mapping, + pgoff_t *index, int tag, unsigned int nr_pages, + struct page **pages) +{ + return find_get_pages_range_tag(mapping, index, (pgoff_t)-1, tag, + nr_pages, pages); +} struct page *grab_cache_page_write_begin(struct address_space *mapping, pgoff_t index, unsigned flags); diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index b45d391b4540..cead4419f933 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -29,9 +29,17 @@ unsigned pagevec_lookup_entries(struct pagevec *pvec, void pagevec_remove_exceptionals(struct pagevec *pvec); unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping, pgoff_t start, unsigned nr_pages); -unsigned pagevec_lookup_tag(struct pagevec *pvec, - struct address_space *mapping, pgoff_t *index, int tag, - unsigned nr_pages); +unsigned pagevec_lookup_range_tag(struct pagevec *pvec, + struct address_space *mapping, pgoff_t *index, pgoff_t end, + int tag); +unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec, + struct address_space *mapping, pgoff_t *index, pgoff_t end, + int tag, unsigned max_pages); +static inline unsigned pagevec_lookup_tag(struct pagevec *pvec, + struct address_space *mapping, pgoff_t *index, int tag) +{ + return pagevec_lookup_range_tag(pvec, mapping, index, (pgoff_t)-1, tag); +} static inline void pagevec_init(struct pagevec *pvec, int cold) { diff --git a/include/linux/sched.h b/include/linux/sched.h index 81f422018aa1..5fe8c3dfd60f 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2457,6 +2457,8 @@ static inline void memalloc_noio_restore(unsigned int flags) #define PFA_NO_NEW_PRIVS 0 /* May not gain new privileges. */ #define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */ #define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */ +#define PFA_SPEC_SSB_DISABLE 4 /* Speculative Store Bypass disabled */ +#define PFA_SPEC_SSB_FORCE_DISABLE 5 /* Speculative Store Bypass force disabled*/ #define TASK_PFA_TEST(name, func) \ @@ -2480,6 +2482,13 @@ TASK_PFA_TEST(SPREAD_SLAB, spread_slab) TASK_PFA_SET(SPREAD_SLAB, spread_slab) TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab) +TASK_PFA_TEST(SPEC_SSB_DISABLE, spec_ssb_disable) +TASK_PFA_SET(SPEC_SSB_DISABLE, spec_ssb_disable) +TASK_PFA_CLEAR(SPEC_SSB_DISABLE, spec_ssb_disable) + +TASK_PFA_TEST(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) +TASK_PFA_SET(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) + /* * task->jobctl flags */ diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 2296e6b2f690..5a53d34bba26 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -3,7 +3,8 @@ #include <uapi/linux/seccomp.h> -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC) +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ + SECCOMP_FILTER_FLAG_SPEC_ALLOW) #ifdef CONFIG_SECCOMP diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a6da214d0584..c28bd8be290a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -514,6 +514,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp *t1, * @hash: the packet hash * @queue_mapping: Queue mapping for multiqueue devices * @xmit_more: More SKBs are pending for this queue + * @pfmemalloc: skbuff was allocated from PFMEMALLOC reserves * @ndisc_nodetype: router type (from link layer) * @ooo_okay: allow the mapping of a socket to a queue to be changed * @l4_hash: indicate hash is a canonical 4-tuple hash over transport @@ -594,8 +595,8 @@ struct sk_buff { fclone:2, peeked:1, head_frag:1, - xmit_more:1; - /* one bit hole */ + xmit_more:1, + pfmemalloc:1; kmemcheck_bitfield_end(flags1); /* fields enclosed in headers_start/headers_end are copied @@ -615,19 +616,18 @@ struct sk_buff { __u8 __pkt_type_offset[0]; __u8 pkt_type:3; - __u8 pfmemalloc:1; __u8 ignore_df:1; __u8 nfctinfo:3; - __u8 nf_trace:1; + __u8 ip_summed:2; __u8 ooo_okay:1; __u8 l4_hash:1; __u8 sw_hash:1; __u8 wifi_acked_valid:1; __u8 wifi_acked:1; - __u8 no_fcs:1; + /* Indicates the inner headers are valid in the skbuff. */ __u8 encapsulation:1; __u8 encap_hdr_csum:1; @@ -635,11 +635,11 @@ struct sk_buff { __u8 csum_complete_sw:1; __u8 csum_level:2; __u8 csum_bad:1; - #ifdef CONFIG_IPV6_NDISC_NODETYPE __u8 ndisc_nodetype:2; #endif __u8 ipvs_property:1; + __u8 inner_protocol_type:1; __u8 remcsum_offload:1; /* 3 or 5 bit hole */ diff --git a/include/net/cnss2.h b/include/net/cnss2.h index 53f73d436752..3a8b503c1938 100644 --- a/include/net/cnss2.h +++ b/include/net/cnss2.h @@ -115,6 +115,11 @@ struct cnss_shadow_reg_v2_cfg { u32 addr; }; +struct cnss_rri_over_ddr_cfg { + u32 base_addr_low; + u32 base_addr_high; +}; + struct cnss_wlan_enable_cfg { u32 num_ce_tgt_cfg; struct cnss_ce_tgt_pipe_cfg *ce_tgt_cfg; @@ -124,6 +129,8 @@ struct cnss_wlan_enable_cfg { struct cnss_shadow_reg_cfg *shadow_reg_cfg; u32 num_shadow_reg_v2_cfg; struct cnss_shadow_reg_v2_cfg *shadow_reg_v2_cfg; + bool rri_over_ddr_cfg_valid; + struct cnss_rri_over_ddr_cfg rri_over_ddr_cfg; }; enum cnss_driver_mode { diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 84f0d0602433..0e01d570fa22 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -762,7 +762,7 @@ static inline __be32 ip6_make_flowlabel(struct net *net, struct sk_buff *skb, * to minimize possbility that any useful information to an * attacker is leaked. Only lower 20 bits are relevant. */ - rol32(hash, 16); + hash = rol32(hash, 16); flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK; diff --git a/include/net/tcp.h b/include/net/tcp.h index e4f8cde33789..2c1237a31564 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -391,6 +391,8 @@ extern int tcp_use_userconfig_sysctl_handler(struct ctl_table *, int, extern int tcp_proc_delayed_ack_control(struct ctl_table *, int, void __user *, size_t *, loff_t *); +void tcp_enter_quickack_mode(struct sock *sk, unsigned int max_quickacks); + static inline void tcp_dec_quickack_mode(struct sock *sk, const unsigned int pkts) { @@ -574,6 +576,7 @@ void tcp_send_fin(struct sock *sk); void tcp_send_active_reset(struct sock *sk, gfp_t priority); int tcp_send_synack(struct sock *); void tcp_push_one(struct sock *, unsigned int mss_now); +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt); void tcp_send_ack(struct sock *sk); void tcp_send_delayed_ack(struct sock *sk); void tcp_send_loss_probe(struct sock *sk); diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h index 44202ff897fd..f759e0918037 100644 --- a/include/soc/tegra/mc.h +++ b/include/soc/tegra/mc.h @@ -99,6 +99,8 @@ struct tegra_mc_soc { u8 client_id_mask; const struct tegra_smmu_soc *smmu; + + u32 intmask; }; struct tegra_mc { diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h index 30ae8c3c1c85..30dbd1c34335 100644 --- a/include/uapi/drm/msm_drm.h +++ b/include/uapi/drm/msm_drm.h @@ -466,6 +466,7 @@ struct drm_msm_submitqueue_query { #define DRM_MSM_COUNTER_PUT 0x44 #define DRM_MSM_COUNTER_READ 0x45 #define DRM_MSM_GEM_SYNC 0x46 +#define DRM_MSM_RMFB2 0x47 /** * Currently DRM framework supports only VSYNC event. @@ -509,6 +510,8 @@ struct drm_msm_submitqueue_query { #define DRM_IOCTL_MSM_SUBMITQUEUE_QUERY \ DRM_IOWR(DRM_COMMAND_BASE + DRM_MSM_SUBMITQUEUE_QUERY, \ struct drm_msm_submitqueue_query) +#define DRM_IOCTL_MSM_RMFB2 DRM_IOW((DRM_COMMAND_BASE + \ + DRM_MSM_RMFB2), unsigned int) #if defined(__cplusplus) } diff --git a/include/uapi/linux/msm_ipa.h b/include/uapi/linux/msm_ipa.h index 33090dd4489f..be3c68aaf23c 100644 --- a/include/uapi/linux/msm_ipa.h +++ b/include/uapi/linux/msm_ipa.h @@ -473,10 +473,15 @@ enum ipa_per_client_stats_event { IPA_PER_CLIENT_STATS_CONNECT_EVENT = IPA_VLAN_L2TP_EVENT_MAX, IPA_PER_CLIENT_STATS_DISCONNECT_EVENT, IPA_PER_CLIENT_STATS_EVENT_MAX, - IPA_EVENT_MAX_NUM = IPA_PER_CLIENT_STATS_EVENT_MAX, }; -#define IPA_EVENT_MAX_NUM ((int)IPA_PER_CLIENT_STATS_EVENT_MAX) +enum ipa_wlan_fw_ssr_event { + WLAN_FWR_SSR_BEFORE_SHUTDOWN = IPA_PER_CLIENT_STATS_EVENT_MAX, + IPA_WLAN_FW_SSR_EVENT_MAX, +#define IPA_WLAN_FW_SSR_EVENT_MAX IPA_WLAN_FW_SSR_EVENT_MAX +}; + +#define IPA_EVENT_MAX_NUM ((int)IPA_WLAN_FW_SSR_EVENT_MAX) #define IPA_EVENT_MAX ((int)IPA_EVENT_MAX_NUM) /** diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index c1af9b3c27c4..f0e9f0460ee7 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -206,4 +206,16 @@ struct prctl_mm_map { #define PR_SET_VMA 0x53564d41 # define PR_SET_VMA_ANON_NAME 0 +/* Per task speculation control */ +#define PR_GET_SPECULATION_CTRL 52 +#define PR_SET_SPECULATION_CTRL 53 +/* Speculation control variants */ +# define PR_SPEC_STORE_BYPASS 0 +/* Return and control values for PR_SET/GET_SPECULATION_CTRL */ +# define PR_SPEC_NOT_AFFECTED 0 +# define PR_SPEC_PRCTL (1UL << 0) +# define PR_SPEC_ENABLE (1UL << 1) +# define PR_SPEC_DISABLE (1UL << 2) +# define PR_SPEC_FORCE_DISABLE (1UL << 3) + #endif /* _LINUX_PRCTL_H */ diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 0f238a43ff1e..e4acb615792b 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -15,7 +15,9 @@ #define SECCOMP_SET_MODE_FILTER 1 /* Valid flags for SECCOMP_SET_MODE_FILTER */ -#define SECCOMP_FILTER_FLAG_TSYNC 1 +#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) +/* In v4.14+ SECCOMP_FILTER_FLAG_LOG is (1UL << 1) */ +#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) /* * All BPF programs must return a 32-bit value. diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c index b8ff9e193753..b57f929f1b46 100644 --- a/kernel/auditfilter.c +++ b/kernel/auditfilter.c @@ -406,7 +406,7 @@ static int audit_field_valid(struct audit_entry *entry, struct audit_field *f) return -EINVAL; break; case AUDIT_EXE: - if (f->op != Audit_equal) + if (f->op != Audit_not_equal && f->op != Audit_equal) return -EINVAL; if (entry->rule.listnr != AUDIT_FILTER_EXIT) return -EINVAL; diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 34a57d57bcb1..b7aac4c5c9a1 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -470,6 +470,8 @@ static int audit_filter_rules(struct task_struct *tsk, break; case AUDIT_EXE: result = audit_exe_compare(tsk, rule->exe); + if (f->op == Audit_not_equal) + result = !result; break; case AUDIT_UID: result = audit_uid_comparator(cred->uid, f->op, f->uid); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 79e3c21a35d0..35dfa9e9d69e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2101,7 +2101,7 @@ static int replace_map_fd_with_map_ptr(struct verifier_env *env) /* hold the map. If the program is rejected by verifier, * the map will be released by release_maps() or it * will be used by the valid program until it's unloaded - * and all maps are released in free_bpf_prog_info() + * and all maps are released in free_used_maps() */ map = bpf_map_inc(map, false); if (IS_ERR(map)) { @@ -2487,7 +2487,7 @@ free_log_buf: vfree(log_buf); if (!env->prog->aux->used_maps) /* if we didn't copy map pointers into bpf_prog_info, release - * them now. Otherwise free_bpf_prog_info() will release them. + * them now. Otherwise free_used_maps() will release them. */ release_maps(env); *prog = env->prog; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 3fdb7545852e..cd3d81961cc2 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -4092,7 +4092,11 @@ int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from) */ do { css_task_iter_start(&from->self, &it); - task = css_task_iter_next(&it); + + do { + task = css_task_iter_next(&it); + } while (task && (task->flags & PF_EXITING)); + if (task) get_task_struct(task); css_task_iter_end(&it); diff --git a/kernel/exit.c b/kernel/exit.c index f75f7cef0760..fc82e495b729 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -54,7 +54,6 @@ #include <linux/writeback.h> #include <linux/shm.h> #include <linux/kcov.h> -#include <linux/cpufreq_times.h> #include "sched/tune.h" @@ -174,9 +173,6 @@ void release_task(struct task_struct *p) { struct task_struct *leader; int zap_leader; -#ifdef CONFIG_CPU_FREQ_TIMES - cpufreq_task_times_exit(p); -#endif repeat: /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. But shut RCU-lockdep up */ diff --git a/kernel/fork.c b/kernel/fork.c index 4251e3806640..caa23ca489bb 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -78,6 +78,7 @@ #include <linux/compiler.h> #include <linux/sysctl.h> #include <linux/kcov.h> +#include <linux/cpufreq_times.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -228,6 +229,7 @@ static void account_kernel_stack(unsigned long *stack, int account) void free_task(struct task_struct *tsk) { + cpufreq_task_times_exit(tsk); account_kernel_stack(tsk->stack, -1); arch_release_thread_stack(tsk->stack); free_thread_stack(tsk->stack); @@ -1366,6 +1368,8 @@ static struct task_struct *copy_process(unsigned long clone_flags, if (!p) goto fork_out; + cpufreq_task_times_init(p); + ftrace_graph_init_task(p); rt_mutex_init_task(p); @@ -1798,6 +1802,8 @@ long _do_fork(unsigned long clone_flags, struct completion vfork; struct pid *pid; + cpufreq_task_times_alloc(p); + trace_sched_process_fork(current, p); pid = get_task_pid(p, PIDTYPE_PID); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c1ecb07de762..6a5671bdb792 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2366,10 +2366,6 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) memset(&p->se.statistics, 0, sizeof(p->se.statistics)); #endif -#ifdef CONFIG_CPU_FREQ_TIMES - cpufreq_task_times_init(p); -#endif - RB_CLEAR_NODE(&p->dl.rb_node); init_dl_task_timer(&p->dl); __dl_clear_params(p); diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index bd4ef2bb551e..e6ec68c15aa3 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -162,10 +162,8 @@ void account_user_time(struct task_struct *p, cputime_t cputime, /* Account for user time used */ acct_account_cputime(p); -#ifdef CONFIG_CPU_FREQ_TIMES /* Account power usage for user time */ cpufreq_acct_update_power(p, cputime); -#endif } /* @@ -216,10 +214,9 @@ void __account_system_time(struct task_struct *p, cputime_t cputime, /* Account for system time used */ acct_account_cputime(p); -#ifdef CONFIG_CPU_FREQ_TIMES + /* Account power usage for system time */ cpufreq_acct_update_power(p, cputime); -#endif } /* diff --git a/kernel/sched/hmp.c b/kernel/sched/hmp.c index d9f0669ff683..ddcf7cfb7248 100644 --- a/kernel/sched/hmp.c +++ b/kernel/sched/hmp.c @@ -2081,14 +2081,11 @@ static u32 top_task_load(struct rq *rq) } } -static int load_to_index(u32 load) +static u32 load_to_index(u32 load) { - if (load < sched_load_granule) - return 0; - else if (load >= sched_ravg_window) - return NUM_LOAD_INDICES - 1; - else - return load / sched_load_granule; + u32 index = load / sched_load_granule; + + return min(index, (u32)(NUM_LOAD_INDICES - 1)); } static void update_top_tasks(struct task_struct *p, struct rq *rq, diff --git a/kernel/seccomp.c b/kernel/seccomp.c index efd384f3f852..9a9203b15cde 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -16,6 +16,8 @@ #include <linux/atomic.h> #include <linux/audit.h> #include <linux/compat.h> +#include <linux/nospec.h> +#include <linux/prctl.h> #include <linux/sched.h> #include <linux/seccomp.h> #include <linux/slab.h> @@ -214,8 +216,11 @@ static inline bool seccomp_may_assign_mode(unsigned long seccomp_mode) return true; } +void __weak arch_seccomp_spec_mitigate(struct task_struct *task) { } + static inline void seccomp_assign_mode(struct task_struct *task, - unsigned long seccomp_mode) + unsigned long seccomp_mode, + unsigned long flags) { assert_spin_locked(&task->sighand->siglock); @@ -225,6 +230,9 @@ static inline void seccomp_assign_mode(struct task_struct *task, * filter) is set. */ smp_mb__before_atomic(); + /* Assume default seccomp processes want spec flaw mitigation. */ + if ((flags & SECCOMP_FILTER_FLAG_SPEC_ALLOW) == 0) + arch_seccomp_spec_mitigate(task); set_tsk_thread_flag(task, TIF_SECCOMP); } @@ -292,7 +300,7 @@ static inline pid_t seccomp_can_sync_threads(void) * without dropping the locks. * */ -static inline void seccomp_sync_threads(void) +static inline void seccomp_sync_threads(unsigned long flags) { struct task_struct *thread, *caller; @@ -333,7 +341,8 @@ static inline void seccomp_sync_threads(void) * allow one thread to transition the other. */ if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) - seccomp_assign_mode(thread, SECCOMP_MODE_FILTER); + seccomp_assign_mode(thread, SECCOMP_MODE_FILTER, + flags); } } @@ -452,7 +461,7 @@ static long seccomp_attach_filter(unsigned int flags, /* Now that the new filter is in place, synchronize to all threads. */ if (flags & SECCOMP_FILTER_FLAG_TSYNC) - seccomp_sync_threads(); + seccomp_sync_threads(flags); return 0; } @@ -747,7 +756,7 @@ static long seccomp_set_mode_strict(void) #ifdef TIF_NOTSC disable_TSC(); #endif - seccomp_assign_mode(current, seccomp_mode); + seccomp_assign_mode(current, seccomp_mode, 0); ret = 0; out: @@ -805,7 +814,7 @@ static long seccomp_set_mode_filter(unsigned int flags, /* Do not free the successfully attached filter. */ prepared = NULL; - seccomp_assign_mode(current, seccomp_mode); + seccomp_assign_mode(current, seccomp_mode, flags); out: spin_unlock_irq(¤t->sighand->siglock); if (flags & SECCOMP_FILTER_FLAG_TSYNC) diff --git a/kernel/sys.c b/kernel/sys.c index cf40663a54c2..0df4753d4969 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2224,6 +2224,17 @@ static int prctl_set_vma(unsigned long opt, unsigned long start, } #endif +int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which) +{ + return -EINVAL; +} + +int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which, + unsigned long ctrl) +{ + return -EINVAL; +} + SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, unsigned long, arg4, unsigned long, arg5) { @@ -2445,6 +2456,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, case PR_SET_VMA: error = prctl_set_vma(arg2, arg3, arg4, arg5); break; + case PR_GET_SPECULATION_CTRL: + if (arg3 || arg4 || arg5) + return -EINVAL; + error = arch_prctl_spec_ctrl_get(me, arg2); + break; + case PR_SET_SPECULATION_CTRL: + if (arg4 || arg5) + return -EINVAL; + error = arch_prctl_spec_ctrl_set(me, arg2, arg3); + break; default: error = -EINVAL; break; diff --git a/kernel/trace/trace_events_trigger.c b/kernel/trace/trace_events_trigger.c index 2172dd61577e..b8a894adab2c 100644 --- a/kernel/trace/trace_events_trigger.c +++ b/kernel/trace/trace_events_trigger.c @@ -663,6 +663,8 @@ event_trigger_callback(struct event_command *cmd_ops, goto out_free; out_reg: + /* Up the trigger_data count to make sure reg doesn't free it on failure */ + event_trigger_init(trigger_ops, trigger_data); ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); /* * The above returns on success the # of functions enabled, @@ -670,11 +672,13 @@ event_trigger_callback(struct event_command *cmd_ops, * Consider no functions a failure too. */ if (!ret) { + cmd_ops->unreg(glob, trigger_ops, trigger_data, file); ret = -ENOENT; - goto out_free; - } else if (ret < 0) - goto out_free; - ret = 0; + } else if (ret > 0) + ret = 0; + + /* Down the counter of trigger_data or free it if not used anymore */ + event_trigger_free(trigger_ops, trigger_data); out: return ret; @@ -1227,6 +1231,9 @@ event_enable_trigger_func(struct event_command *cmd_ops, goto out; } + /* Up the trigger_data count to make sure nothing frees it on failure */ + event_trigger_init(trigger_ops, trigger_data); + if (trigger) { number = strsep(&trigger, ":"); @@ -1277,6 +1284,7 @@ event_enable_trigger_func(struct event_command *cmd_ops, goto out_disable; /* Just return zero, not the number of enabled functions */ ret = 0; + event_trigger_free(trigger_ops, trigger_data); out: return ret; @@ -1287,7 +1295,7 @@ event_enable_trigger_func(struct event_command *cmd_ops, out_free: if (cmd_ops->set_filter) cmd_ops->set_filter(NULL, trigger_data, NULL); - kfree(trigger_data); + event_trigger_free(trigger_ops, trigger_data); kfree(enable_data); goto out; } diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index f2682799c215..f0ee722be520 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -349,11 +349,10 @@ static struct trace_kprobe *find_trace_kprobe(const char *event, static int enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file) { + struct event_file_link *link = NULL; int ret = 0; if (file) { - struct event_file_link *link; - link = kmalloc(sizeof(*link), GFP_KERNEL); if (!link) { ret = -ENOMEM; @@ -373,6 +372,18 @@ enable_trace_kprobe(struct trace_kprobe *tk, struct trace_event_file *file) else ret = enable_kprobe(&tk->rp.kp); } + + if (ret) { + if (file) { + /* Notice the if is true on not WARN() */ + if (!WARN_ON_ONCE(!link)) + list_del_rcu(&link->list); + kfree(link); + tk->tp.flags &= ~TP_FLAG_TRACE; + } else { + tk->tp.flags &= ~TP_FLAG_PROFILE; + } + } out: return ret; } diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 51282f579760..37ea94b636a3 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -670,8 +670,16 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_stop); static size_t rounded_hashtable_size(const struct rhashtable_params *params) { - return max(roundup_pow_of_two(params->nelem_hint * 4 / 3), - (unsigned long)params->min_size); + size_t retsize; + + if (params->nelem_hint) + retsize = max(roundup_pow_of_two(params->nelem_hint * 4 / 3), + (unsigned long)params->min_size); + else + retsize = max(HASH_DEFAULT_SIZE, + (unsigned long)params->min_size); + + return retsize; } static u32 rhashtable_jhash2(const void *key, u32 length, u32 seed) @@ -728,8 +736,6 @@ int rhashtable_init(struct rhashtable *ht, struct bucket_table *tbl; size_t size; - size = HASH_DEFAULT_SIZE; - if ((!params->key_len && !params->obj_hashfn) || (params->obj_hashfn && !params->obj_cmpfn)) return -EINVAL; @@ -756,8 +762,7 @@ int rhashtable_init(struct rhashtable *ht, ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE); - if (params->nelem_hint) - size = rounded_hashtable_size(&ht->p); + size = rounded_hashtable_size(&ht->p); /* The maximum (not average) chain length grows with the * size of the hash table, at a rate of (log N)/(log log N). diff --git a/mm/filemap.c b/mm/filemap.c index f3d6d89cfd61..6aa6f0cc67e1 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -396,19 +396,17 @@ static int __filemap_fdatawait_range(struct address_space *mapping, goto out; pagevec_init(&pvec, 0); - while ((index <= end) && - (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, - PAGECACHE_TAG_WRITEBACK, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1)) != 0) { + while (index <= end) { unsigned i; + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, + end, PAGECACHE_TAG_WRITEBACK); + if (!nr_pages) + break; + for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; - /* until radix tree lookup accepts end_index */ - if (page->index > end) - continue; - wait_on_page_writeback(page); if (TestClearPageError(page)) ret = -EIO; @@ -1426,9 +1424,10 @@ repeat: EXPORT_SYMBOL(find_get_pages_contig); /** - * find_get_pages_tag - find and return pages that match @tag + * find_get_pages_range_tag - find and return pages in given range matching @tag * @mapping: the address_space to search * @index: the starting page index + * @end: The final page index (inclusive) * @tag: the tag index * @nr_pages: the maximum number of pages * @pages: where the resulting pages are placed @@ -1436,8 +1435,9 @@ EXPORT_SYMBOL(find_get_pages_contig); * Like find_get_pages, except we only return pages which are tagged with * @tag. We update @index to index the next page for the traversal. */ -unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index, - int tag, unsigned int nr_pages, struct page **pages) +unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, + pgoff_t end, int tag, unsigned int nr_pages, + struct page **pages) { struct radix_tree_iter iter; void **slot; @@ -1451,6 +1451,9 @@ restart: radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter, *index, tag) { struct page *page; + + if (iter.index > end) + break; repeat: page = radix_tree_deref_slot(slot); if (unlikely(!page)) @@ -1489,18 +1492,28 @@ repeat: } pages[ret] = page; - if (++ret == nr_pages) - break; + if (++ret == nr_pages) { + *index = pages[ret - 1]->index + 1; + goto out; + } } + /* + * We come here when we got at @end. We take care to not overflow the + * index @index as it confuses some of the callers. This breaks the + * iteration when there is page at index -1 but that is already broken + * anyway. + */ + if (end == (pgoff_t)-1) + *index = (pgoff_t)-1; + else + *index = end + 1; +out: rcu_read_unlock(); - if (ret) - *index = pages[ret - 1]->index + 1; - return ret; } -EXPORT_SYMBOL(find_get_pages_tag); +EXPORT_SYMBOL(find_get_pages_range_tag); /* * CD/DVDs are error prone. When a medium error occurs, the driver may fail diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 55a9facb8e8d..9a8e688724b1 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -996,7 +996,7 @@ static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg) int nid, zid; int i; - while ((memcg = parent_mem_cgroup(memcg))) { + for (; memcg; memcg = parent_mem_cgroup(memcg)) { for_each_node(nid) { for (zid = 0; zid < MAX_NR_ZONES; zid++) { mz = &memcg->nodeinfo[nid]->zoneinfo[zid]; diff --git a/mm/page-writeback.c b/mm/page-writeback.c index a98dae1bdcff..3ce079e4ed65 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2204,30 +2204,14 @@ retry: while (!done && (index <= end)) { int i; - nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag, - min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1); + nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, + tag); if (nr_pages == 0) break; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; - /* - * At this point, the page may be truncated or - * invalidated (changing page->mapping to NULL), or - * even swizzled back from swapper_space to tmpfs file - * mapping. However, page->index will not change - * because we have a reference on the page. - */ - if (page->index > end) { - /* - * can't be range_cyclic (1st pass) because - * end == -1 in that case. - */ - done = 1; - break; - } - done_index = page->index; lock_page(page); diff --git a/mm/slub.c b/mm/slub.c index fd3a044aaa4a..675e6efd5c8f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -702,7 +702,7 @@ void object_err(struct kmem_cache *s, struct page *page, slab_panic(reason); } -static void slab_err(struct kmem_cache *s, struct page *page, +static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page, const char *fmt, ...) { va_list args; diff --git a/mm/swap.c b/mm/swap.c index 39395fb549c0..8e6bcb688779 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -1130,15 +1130,25 @@ unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping, } EXPORT_SYMBOL(pagevec_lookup); -unsigned pagevec_lookup_tag(struct pagevec *pvec, struct address_space *mapping, - pgoff_t *index, int tag, unsigned nr_pages) +unsigned pagevec_lookup_range_tag(struct pagevec *pvec, + struct address_space *mapping, pgoff_t *index, pgoff_t end, + int tag) { - pvec->nr = find_get_pages_tag(mapping, index, tag, - nr_pages, pvec->pages); + pvec->nr = find_get_pages_range_tag(mapping, index, end, tag, + PAGEVEC_SIZE, pvec->pages); return pagevec_count(pvec); } -EXPORT_SYMBOL(pagevec_lookup_tag); +EXPORT_SYMBOL(pagevec_lookup_range_tag); +unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec, + struct address_space *mapping, pgoff_t *index, pgoff_t end, + int tag, unsigned max_pages) +{ + pvec->nr = find_get_pages_range_tag(mapping, index, end, tag, + min_t(unsigned int, max_pages, PAGEVEC_SIZE), pvec->pages); + return pagevec_count(pvec); +} +EXPORT_SYMBOL(pagevec_lookup_range_nr_tag); /* * Perform any setup for the swap system */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 32e83c8bd087..c3c28beaae11 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1535,7 +1535,7 @@ static void __vunmap(const void *addr, int deallocate_pages) addr)) return; - area = remove_vm_area(addr); + area = find_vmap_area((unsigned long)addr)->vm; if (unlikely(!area)) { WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", addr); @@ -1545,6 +1545,7 @@ static void __vunmap(const void *addr, int deallocate_pages) debug_check_no_locks_freed(addr, get_vm_area_size(area)); debug_check_no_obj_freed(addr, get_vm_area_size(area)); + remove_vm_area(addr); if (deallocate_pages) { int i; diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index 665fd87cc105..8b8a43fda6ca 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -404,6 +404,12 @@ ebt_check_watcher(struct ebt_entry_watcher *w, struct xt_tgchk_param *par, watcher = xt_request_find_target(NFPROTO_BRIDGE, w->u.name, 0); if (IS_ERR(watcher)) return PTR_ERR(watcher); + + if (watcher->family != NFPROTO_BRIDGE) { + module_put(watcher->me); + return -ENOENT; + } + w->u.watcher = watcher; par->target = watcher; @@ -724,6 +730,13 @@ ebt_check_entry(struct ebt_entry *e, struct net *net, goto cleanup_watchers; } + /* Reject UNSPEC, xtables verdicts/return values are incompatible */ + if (target->family != NFPROTO_BRIDGE) { + module_put(target->me); + ret = -ENOENT; + goto cleanup_watchers; + } + t->u.target = target; if (t->u.target == &ebt_standard_target) { if (gap < sizeof(struct ebt_standard_target)) { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 2017ffa5197a..96c9c0f0905a 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -2087,9 +2087,12 @@ int rtnl_configure_link(struct net_device *dev, const struct ifinfomsg *ifm) return err; } - dev->rtnl_link_state = RTNL_LINK_INITIALIZED; - - __dev_notify_flags(dev, old_flags, ~0U); + if (dev->rtnl_link_state == RTNL_LINK_INITIALIZED) { + __dev_notify_flags(dev, old_flags, 0U); + } else { + dev->rtnl_link_state = RTNL_LINK_INITIALIZED; + __dev_notify_flags(dev, old_flags, ~0U); + } return 0; } EXPORT_SYMBOL(rtnl_configure_link); diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 03928b406d4c..cfbf857b68d9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -854,6 +854,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) n->cloned = 1; n->nohdr = 0; n->peeked = 0; + C(pfmemalloc); n->destructor = NULL; C(tail); C(end); diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 119c04317d48..03fcf3ee1534 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -599,7 +599,7 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk, { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); struct dccp_sock *dp = dccp_sk(sk); - ktime_t now = ktime_get_real(); + ktime_t now = ktime_get(); s64 delta = 0; switch (fbtype) { @@ -624,15 +624,14 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk, case CCID3_FBACK_PERIODIC: delta = ktime_us_delta(now, hc->rx_tstamp_last_feedback); if (delta <= 0) - DCCP_BUG("delta (%ld) <= 0", (long)delta); - else - hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); + delta = 1; + hc->rx_x_recv = scaled_div32(hc->rx_bytes_recv, delta); break; default: return; } - ccid3_pr_debug("Interval %ldusec, X_recv=%u, 1/p=%u\n", (long)delta, + ccid3_pr_debug("Interval %lldusec, X_recv=%u, 1/p=%u\n", delta, hc->rx_x_recv, hc->rx_pinv); hc->rx_tstamp_last_feedback = now; @@ -679,7 +678,8 @@ static int ccid3_hc_rx_insert_options(struct sock *sk, struct sk_buff *skb) static u32 ccid3_first_li(struct sock *sk) { struct ccid3_hc_rx_sock *hc = ccid3_hc_rx_sk(sk); - u32 x_recv, p, delta; + u32 x_recv, p; + s64 delta; u64 fval; if (hc->rx_rtt == 0) { @@ -687,7 +687,9 @@ static u32 ccid3_first_li(struct sock *sk) hc->rx_rtt = DCCP_FALLBACK_RTT; } - delta = ktime_to_us(net_timedelta(hc->rx_tstamp_last_feedback)); + delta = ktime_us_delta(ktime_get(), hc->rx_tstamp_last_feedback); + if (delta <= 0) + delta = 1; x_recv = scaled_div32(hc->rx_bytes_recv, delta); if (x_recv == 0) { /* would also trigger divide-by-zero */ DCCP_WARN("X_recv==0\n"); diff --git a/net/dns_resolver/dns_key.c b/net/dns_resolver/dns_key.c index e26df2764e83..1689c7bdf1c9 100644 --- a/net/dns_resolver/dns_key.c +++ b/net/dns_resolver/dns_key.c @@ -87,35 +87,39 @@ dns_resolver_preparse(struct key_preparsed_payload *prep) opt++; kdebug("options: '%s'", opt); do { + int opt_len, opt_nlen; const char *eq; - int opt_len, opt_nlen, opt_vlen, tmp; + char optval[128]; next_opt = memchr(opt, '#', end - opt) ?: end; opt_len = next_opt - opt; - if (opt_len <= 0 || opt_len > 128) { + if (opt_len <= 0 || opt_len > sizeof(optval)) { pr_warn_ratelimited("Invalid option length (%d) for dns_resolver key\n", opt_len); return -EINVAL; } - eq = memchr(opt, '=', opt_len) ?: end; - opt_nlen = eq - opt; - eq++; - opt_vlen = next_opt - eq; /* will be -1 if no value */ + eq = memchr(opt, '=', opt_len); + if (eq) { + opt_nlen = eq - opt; + eq++; + memcpy(optval, eq, next_opt - eq); + optval[next_opt - eq] = '\0'; + } else { + opt_nlen = opt_len; + optval[0] = '\0'; + } - tmp = opt_vlen >= 0 ? opt_vlen : 0; - kdebug("option '%*.*s' val '%*.*s'", - opt_nlen, opt_nlen, opt, tmp, tmp, eq); + kdebug("option '%*.*s' val '%s'", + opt_nlen, opt_nlen, opt, optval); /* see if it's an error number representing a DNS error * that's to be recorded as the result in this key */ if (opt_nlen == sizeof(DNS_ERRORNO_OPTION) - 1 && memcmp(opt, DNS_ERRORNO_OPTION, opt_nlen) == 0) { kdebug("dns error number option"); - if (opt_vlen <= 0) - goto bad_option_value; - ret = kstrtoul(eq, 10, &derrno); + ret = kstrtoul(optval, 10, &derrno); if (ret < 0) goto bad_option_value; diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 554c2a961ad5..48b28a7ecc7a 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -1099,6 +1099,9 @@ int dsa_slave_suspend(struct net_device *slave_dev) { struct dsa_slave_priv *p = netdev_priv(slave_dev); + if (!netif_running(slave_dev)) + return 0; + netif_device_detach(slave_dev); if (p->phy) { @@ -1116,6 +1119,9 @@ int dsa_slave_resume(struct net_device *slave_dev) { struct dsa_slave_priv *p = netdev_priv(slave_dev); + if (!netif_running(slave_dev)) + return 0; + netif_device_attach(slave_dev); if (p->phy) { diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 7dc9f0680bf6..249a89491353 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -289,18 +289,19 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb) return ip_hdr(skb)->daddr; in_dev = __in_dev_get_rcu(dev); - BUG_ON(!in_dev); net = dev_net(dev); scope = RT_SCOPE_UNIVERSE; if (!ipv4_is_zeronet(ip_hdr(skb)->saddr)) { + bool vmark = in_dev && IN_DEV_SRC_VMARK(in_dev); struct flowi4 fl4 = { .flowi4_iif = LOOPBACK_IFINDEX, + .flowi4_oif = l3mdev_master_ifindex_rcu(dev), .daddr = ip_hdr(skb)->saddr, .flowi4_tos = RT_TOS(ip_hdr(skb)->tos), .flowi4_scope = scope, - .flowi4_mark = IN_DEV_SRC_VMARK(in_dev) ? skb->mark : 0, + .flowi4_mark = vmark ? skb->mark : 0, }; if (!fib_lookup(net, &fl4, &res, 0)) return FIB_RES_PREFSRC(net, res); diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index b34fa1bb278f..b2001b20e029 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -364,11 +364,6 @@ static struct inet_frag_queue *inet_frag_alloc(struct netns_frags *nf, { struct inet_frag_queue *q; - if (frag_mem_limit(nf) > nf->high_thresh) { - inet_frag_schedule_worker(f); - return NULL; - } - q = kmem_cache_zalloc(f->frags_cachep, GFP_ATOMIC); if (!q) return NULL; @@ -405,6 +400,11 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, struct inet_frag_queue *q; int depth = 0; + if (!nf->high_thresh || frag_mem_limit(nf) > nf->high_thresh) { + inet_frag_schedule_worker(f); + return NULL; + } + if (frag_mem_limit(nf) > nf->low_thresh) inet_frag_schedule_worker(f); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index c1d7dc433976..ac2966f02d07 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -480,6 +480,8 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from) to->dev = from->dev; to->mark = from->mark; + skb_copy_hash(to, from); + /* Copy the flags to each fragment. */ IPCB(to)->flags = IPCB(from)->flags; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index ce9a7fbb7c5f..88426a6a7a85 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -135,15 +135,18 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb) { struct sockaddr_in sin; const struct iphdr *iph = ip_hdr(skb); - __be16 *ports = (__be16 *)skb_transport_header(skb); + __be16 *ports; + int end; - if (skb_transport_offset(skb) + 4 > skb->len) + end = skb_transport_offset(skb) + 4; + if (end > 0 && !pskb_may_pull(skb, end)) return; /* All current transport protocols have the port numbers in the * first four bytes of the transport header and this function is * written with this assumption in mind. */ + ports = (__be16 *)skb_transport_header(skb); sin.sin_family = AF_INET; sin.sin_addr.s_addr = iph->daddr; diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c index 9d6b9c4c5f82..60f564db25a3 100644 --- a/net/ipv4/ipconfig.c +++ b/net/ipv4/ipconfig.c @@ -790,6 +790,11 @@ static void __init ic_bootp_init_ext(u8 *e) */ static inline void __init ic_bootp_init(void) { + /* Re-initialise all name servers to NONE, in case any were set via the + * "ip=" or "nfsaddrs=" kernel command line parameters: any IP addresses + * specified there will already have been decoded but are no longer + * needed + */ ic_nameservers_predef(); dev_add_pack(&bootp_packet_type); @@ -1423,6 +1428,13 @@ static int __init ip_auto_config(void) int err; unsigned int i; + /* Initialise all name servers to NONE (but only if the "ip=" or + * "nfsaddrs=" kernel command line parameters weren't decoded, otherwise + * we'll overwrite the IP addresses specified there) + */ + if (ic_set_manually == 0) + ic_nameservers_predef(); + #ifdef CONFIG_PROC_FS proc_create("pnp", S_IRUGO, init_net.proc_net, &pnp_seq_fops); #endif /* CONFIG_PROC_FS */ @@ -1640,6 +1652,7 @@ static int __init ip_auto_config_setup(char *addrs) return 1; } + /* Initialise all name servers to NONE */ ic_nameservers_predef(); /* Parse string for static IP assignment. */ diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 8233e27679f2..5d563615718d 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -145,8 +145,9 @@ static int ipv4_ping_group_range(struct ctl_table *table, int write, if (write && ret == 0) { low = make_kgid(user_ns, urange[0]); high = make_kgid(user_ns, urange[1]); - if (!gid_valid(low) || !gid_valid(high) || - (urange[1] < urange[0]) || gid_lt(high, low)) { + if (!gid_valid(low) || !gid_valid(high)) + return -EINVAL; + if (urange[1] < urange[0] || gid_lt(high, low)) { low = make_kgid(&init_user_ns, 1); high = make_kgid(&init_user_ns, 0); } @@ -232,8 +233,9 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, { struct ctl_table tbl = { .maxlen = (TCP_FASTOPEN_KEY_LENGTH * 2 + 10) }; struct tcp_fastopen_context *ctxt; - int ret; u32 user_key[4]; /* 16 bytes, matching TCP_FASTOPEN_KEY_LENGTH */ + __le32 key[4]; + int ret, i; tbl.data = kmalloc(tbl.maxlen, GFP_KERNEL); if (!tbl.data) @@ -242,11 +244,14 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, rcu_read_lock(); ctxt = rcu_dereference(tcp_fastopen_ctx); if (ctxt) - memcpy(user_key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); + memcpy(key, ctxt->key, TCP_FASTOPEN_KEY_LENGTH); else - memset(user_key, 0, sizeof(user_key)); + memset(key, 0, sizeof(key)); rcu_read_unlock(); + for (i = 0; i < ARRAY_SIZE(key); i++) + user_key[i] = le32_to_cpu(key[i]); + snprintf(tbl.data, tbl.maxlen, "%08x-%08x-%08x-%08x", user_key[0], user_key[1], user_key[2], user_key[3]); ret = proc_dostring(&tbl, write, buffer, lenp, ppos); @@ -262,12 +267,16 @@ static int proc_tcp_fastopen_key(struct ctl_table *ctl, int write, * first invocation of tcp_fastopen_cookie_gen */ tcp_fastopen_init_key_once(false); - tcp_fastopen_reset_cipher(user_key, TCP_FASTOPEN_KEY_LENGTH); + + for (i = 0; i < ARRAY_SIZE(user_key); i++) + key[i] = cpu_to_le32(user_key[i]); + + tcp_fastopen_reset_cipher(key, TCP_FASTOPEN_KEY_LENGTH); } bad_key: pr_debug("proc FO key set 0x%x-%x-%x-%x <- 0x%s: %u\n", - user_key[0], user_key[1], user_key[2], user_key[3], + user_key[0], user_key[1], user_key[2], user_key[3], (char *)tbl.data, ret); kfree(tbl.data); return ret; diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 55d7da1d2ce9..6300edf90e60 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -131,23 +131,14 @@ static void dctcp_ce_state_0_to_1(struct sock *sk) struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); - /* State has changed from CE=0 to CE=1 and delayed - * ACK has not sent yet. - */ - if (!ca->ce_state && ca->delayed_ack_reserved) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=0. */ - tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; + if (!ca->ce_state) { + /* State has changed from CE=0 to CE=1, force an immediate + * ACK to reflect the new CE state. If an ACK was delayed, + * send that first to reflect the prior CE state. + */ + if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); + tcp_enter_quickack_mode(sk, 1); } ca->prior_rcv_nxt = tp->rcv_nxt; @@ -161,23 +152,14 @@ static void dctcp_ce_state_1_to_0(struct sock *sk) struct dctcp *ca = inet_csk_ca(sk); struct tcp_sock *tp = tcp_sk(sk); - /* State has changed from CE=1 to CE=0 and delayed - * ACK has not sent yet. - */ - if (ca->ce_state && ca->delayed_ack_reserved) { - u32 tmp_rcv_nxt; - - /* Save current rcv_nxt. */ - tmp_rcv_nxt = tp->rcv_nxt; - - /* Generate previous ack with CE=1. */ - tp->ecn_flags |= TCP_ECN_DEMAND_CWR; - tp->rcv_nxt = ca->prior_rcv_nxt; - - tcp_send_ack(sk); - - /* Recover current rcv_nxt. */ - tp->rcv_nxt = tmp_rcv_nxt; + if (ca->ce_state) { + /* State has changed from CE=1 to CE=0, force an immediate + * ACK to reflect the new CE state. If an ACK was delayed, + * send that first to reflect the prior CE state. + */ + if (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_TIMER) + __tcp_send_ack(sk, ca->prior_rcv_nxt); + tcp_enter_quickack_mode(sk, 1); } ca->prior_rcv_nxt = tp->rcv_nxt; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c02fa0b4c7e0..e0714c3d8ee4 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -177,24 +177,27 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) } } -static void tcp_incr_quickack(struct sock *sk) +static void tcp_incr_quickack(struct sock *sk, unsigned int max_quickacks) { struct inet_connection_sock *icsk = inet_csk(sk); unsigned int quickacks = tcp_sk(sk)->rcv_wnd / (2 * icsk->icsk_ack.rcv_mss); if (quickacks == 0) quickacks = 2; + quickacks = min(quickacks, max_quickacks); if (quickacks > icsk->icsk_ack.quick) - icsk->icsk_ack.quick = min(quickacks, TCP_MAX_QUICKACKS); + icsk->icsk_ack.quick = quickacks; } -static void tcp_enter_quickack_mode(struct sock *sk) +void tcp_enter_quickack_mode(struct sock *sk, unsigned int max_quickacks) { struct inet_connection_sock *icsk = inet_csk(sk); - tcp_incr_quickack(sk); + + tcp_incr_quickack(sk, max_quickacks); icsk->icsk_ack.pingpong = 0; icsk->icsk_ack.ato = TCP_ATO_MIN; } +EXPORT_SYMBOL(tcp_enter_quickack_mode); /* Send ACKs quickly, if "quick" count is not exhausted * and the session is not interactive. @@ -226,8 +229,10 @@ static void tcp_ecn_withdraw_cwr(struct tcp_sock *tp) tp->ecn_flags &= ~TCP_ECN_DEMAND_CWR; } -static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) +static void __tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) { + struct tcp_sock *tp = tcp_sk(sk); + switch (TCP_SKB_CB(skb)->ip_dsfield & INET_ECN_MASK) { case INET_ECN_NOT_ECT: /* Funny extension: if ECT is not set on a segment, @@ -235,31 +240,31 @@ static void __tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) * it is probably a retransmit. */ if (tp->ecn_flags & TCP_ECN_SEEN) - tcp_enter_quickack_mode((struct sock *)tp); + tcp_enter_quickack_mode(sk, 2); break; case INET_ECN_CE: - if (tcp_ca_needs_ecn((struct sock *)tp)) - tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_IS_CE); + if (tcp_ca_needs_ecn(sk)) + tcp_ca_event(sk, CA_EVENT_ECN_IS_CE); if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { /* Better not delay acks, sender can have a very low cwnd */ - tcp_enter_quickack_mode((struct sock *)tp); + tcp_enter_quickack_mode(sk, 2); tp->ecn_flags |= TCP_ECN_DEMAND_CWR; } tp->ecn_flags |= TCP_ECN_SEEN; break; default: - if (tcp_ca_needs_ecn((struct sock *)tp)) - tcp_ca_event((struct sock *)tp, CA_EVENT_ECN_NO_CE); + if (tcp_ca_needs_ecn(sk)) + tcp_ca_event(sk, CA_EVENT_ECN_NO_CE); tp->ecn_flags |= TCP_ECN_SEEN; break; } } -static void tcp_ecn_check_ce(struct tcp_sock *tp, const struct sk_buff *skb) +static void tcp_ecn_check_ce(struct sock *sk, const struct sk_buff *skb) { - if (tp->ecn_flags & TCP_ECN_OK) - __tcp_ecn_check_ce(tp, skb); + if (tcp_sk(sk)->ecn_flags & TCP_ECN_OK) + __tcp_ecn_check_ce(sk, skb); } static void tcp_ecn_rcv_synack(struct tcp_sock *tp, const struct tcphdr *th) @@ -651,7 +656,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) /* The _first_ data packet received, initialize * delayed ACK engine. */ - tcp_incr_quickack(sk); + tcp_incr_quickack(sk, TCP_MAX_QUICKACKS); icsk->icsk_ack.ato = TCP_ATO_MIN; } else { int m = now - icsk->icsk_ack.lrcvtime; @@ -667,13 +672,13 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb) /* Too long gap. Apparently sender failed to * restart window, so that we send ACKs quickly. */ - tcp_incr_quickack(sk); + tcp_incr_quickack(sk, TCP_MAX_QUICKACKS); sk_mem_reclaim(sk); } } icsk->icsk_ack.lrcvtime = now; - tcp_ecn_check_ce(tp, skb); + tcp_ecn_check_ce(sk, skb); if (skb->len >= 128) tcp_grow_window(sk, skb); @@ -3219,6 +3224,15 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets, if (tcp_is_reno(tp)) { tcp_remove_reno_sacks(sk, pkts_acked); + + /* If any of the cumulatively ACKed segments was + * retransmitted, non-SACK case cannot confirm that + * progress was due to original transmission due to + * lack of TCPCB_SACKED_ACKED bits even if some of + * the packets may have been never retransmitted. + */ + if (flag & FLAG_RETRANS_DATA_ACKED) + flag &= ~FLAG_ORIG_SACK_ACKED; } else { int delta; @@ -4127,7 +4141,7 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb) if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq && before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST); - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS); if (tcp_is_sack(tp) && sysctl_tcp_dsack) { u32 end_seq = TCP_SKB_CB(skb)->end_seq; @@ -4355,7 +4369,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) struct sk_buff *skb1; u32 seq, end_seq; - tcp_ecn_check_ce(tp, skb); + tcp_ecn_check_ce(sk, skb); if (unlikely(tcp_try_rmem_schedule(sk, skb, skb->truesize))) { NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPOFODROP); @@ -4629,7 +4643,7 @@ queue_and_out: tcp_dsack_set(sk, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq); out_of_window: - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS); inet_csk_schedule_ack(sk); drop: __kfree_skb(skb); @@ -4640,8 +4654,6 @@ drop: if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt + tcp_receive_window(tp))) goto out_of_window; - tcp_enter_quickack_mode(sk); - if (before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) { /* Partial packet, seq < rcv_next < end_seq */ SOCK_DEBUG(sk, "partial packet: rcv_next %X seq %X - %X\n", @@ -4780,6 +4792,7 @@ restart: static void tcp_collapse_ofo_queue(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); + u32 range_truesize, sum_tiny = 0; struct sk_buff *skb = skb_peek(&tp->out_of_order_queue); struct sk_buff *head; u32 start, end; @@ -4789,6 +4802,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk) start = TCP_SKB_CB(skb)->seq; end = TCP_SKB_CB(skb)->end_seq; + range_truesize = skb->truesize; head = skb; for (;;) { @@ -4803,14 +4817,24 @@ static void tcp_collapse_ofo_queue(struct sock *sk) if (!skb || after(TCP_SKB_CB(skb)->seq, end) || before(TCP_SKB_CB(skb)->end_seq, start)) { - tcp_collapse(sk, &tp->out_of_order_queue, - head, skb, start, end); + /* Do not attempt collapsing tiny skbs */ + if (range_truesize != head->truesize || + end - start >= SKB_WITH_OVERHEAD(SK_MEM_QUANTUM)) { + tcp_collapse(sk, &tp->out_of_order_queue, + head, skb, start, end); + } else { + sum_tiny += range_truesize; + if (sum_tiny > sk->sk_rcvbuf >> 3) + return; + } + head = skb; if (!skb) break; /* Start new segment */ start = TCP_SKB_CB(skb)->seq; end = TCP_SKB_CB(skb)->end_seq; + range_truesize = skb->truesize; } else { if (before(TCP_SKB_CB(skb)->seq, start)) start = TCP_SKB_CB(skb)->seq; @@ -4866,6 +4890,9 @@ static int tcp_prune_queue(struct sock *sk) else if (tcp_under_memory_pressure(sk)) tp->rcv_ssthresh = min(tp->rcv_ssthresh, 4U * tp->advmss); + if (atomic_read(&sk->sk_rmem_alloc) <= sk->sk_rcvbuf) + return 0; + tcp_collapse_ofo_queue(sk); if (!skb_queue_empty(&sk->sk_receive_queue)) tcp_collapse(sk, &sk->sk_receive_queue, @@ -5653,7 +5680,7 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb, * to stand against the temptation 8) --ANK */ inet_csk_schedule_ack(sk); - tcp_enter_quickack_mode(sk); + tcp_enter_quickack_mode(sk, TCP_MAX_QUICKACKS); inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK, TCP_DELACK_MAX, TCP_RTO_MAX); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index e490c9a29034..b7c333c7de8b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -177,8 +177,13 @@ static void tcp_event_data_sent(struct tcp_sock *tp, } /* Account for an ACK we sent. */ -static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts) +static inline void tcp_event_ack_sent(struct sock *sk, unsigned int pkts, + u32 rcv_nxt) { + struct tcp_sock *tp = tcp_sk(sk); + + if (unlikely(rcv_nxt != tp->rcv_nxt)) + return; /* Special ACK sent by DCTCP to reflect ECN */ tcp_dec_quickack_mode(sk, pkts); inet_csk_clear_xmit_timer(sk, ICSK_TIME_DACK); } @@ -901,8 +906,8 @@ out: * We are working here with either a clone of the original * SKB, or a fresh unique copy made by the retransmit engine. */ -static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, - gfp_t gfp_mask) +static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, + int clone_it, gfp_t gfp_mask, u32 rcv_nxt) { const struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet; @@ -962,7 +967,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, th->source = inet->inet_sport; th->dest = inet->inet_dport; th->seq = htonl(tcb->seq); - th->ack_seq = htonl(tp->rcv_nxt); + th->ack_seq = htonl(rcv_nxt); *(((__be16 *)th) + 6) = htons(((tcp_header_size >> 2) << 12) | tcb->tcp_flags); @@ -1005,7 +1010,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, icsk->icsk_af_ops->send_check(sk, skb); if (likely(tcb->tcp_flags & TCPHDR_ACK)) - tcp_event_ack_sent(sk, tcp_skb_pcount(skb)); + tcp_event_ack_sent(sk, tcp_skb_pcount(skb), rcv_nxt); if (skb->len != tcp_header_size) tcp_event_data_sent(tp, sk); @@ -1036,6 +1041,13 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, return net_xmit_eval(err); } +static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it, + gfp_t gfp_mask) +{ + return __tcp_transmit_skb(sk, skb, clone_it, gfp_mask, + tcp_sk(sk)->rcv_nxt); +} + /* This routine just queues the buffer for sending. * * NOTE: probe0 timer is not checked, do not forget tcp_push_pending_frames, @@ -3354,7 +3366,7 @@ void tcp_send_delayed_ack(struct sock *sk) } /* This routine sends an ack and also updates the window. */ -void tcp_send_ack(struct sock *sk) +void __tcp_send_ack(struct sock *sk, u32 rcv_nxt) { struct sk_buff *buff; @@ -3391,9 +3403,14 @@ void tcp_send_ack(struct sock *sk) /* Send it off, this clears delayed acks for us. */ skb_mstamp_get(&buff->skb_mstamp); - tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC)); + __tcp_transmit_skb(sk, buff, 0, sk_gfp_atomic(sk, GFP_ATOMIC), rcv_nxt); +} +EXPORT_SYMBOL_GPL(__tcp_send_ack); + +void tcp_send_ack(struct sock *sk) +{ + __tcp_send_ack(sk, tcp_sk(sk)->rcv_nxt); } -EXPORT_SYMBOL_GPL(tcp_send_ack); /* This routine sends a packet with an out of date sequence * number. It assumes the other end will try to ack it. diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index d7c1ee7cf0e2..7a62fd9173d0 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -663,13 +663,16 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg, } if (np->rxopt.bits.rxorigdstaddr) { struct sockaddr_in6 sin6; - __be16 *ports = (__be16 *) skb_transport_header(skb); + __be16 *ports; + int end; - if (skb_transport_offset(skb) + 4 <= skb->len) { + end = skb_transport_offset(skb) + 4; + if (end <= 0 || pskb_may_pull(skb, end)) { /* All current transport protocols have the port numbers in the * first four bytes of the transport header and this function is * written with this assumption in mind. */ + ports = (__be16 *)skb_transport_header(skb); sin6.sin6_family = AF_INET6; sin6.sin6_addr = ipv6_hdr(skb)->daddr; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 74786783834b..0feede45bd28 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -559,6 +559,8 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from) to->dev = from->dev; to->mark = from->mark; + skb_copy_hash(to, from); + #ifdef CONFIG_NET_SCHED to->tc_index = from->tc_index; #endif diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index dec4e7bda5f3..11282ffca567 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -692,7 +692,6 @@ static int ipip6_rcv(struct sk_buff *skb) if (iptunnel_pull_header(skb, 0, htons(ETH_P_IPV6))) goto out; - iph = ip_hdr(skb); err = IP_ECN_decapsulate(iph, skb); if (unlikely(err)) { diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 0fb27debd4fa..cff78b0d53ee 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -62,6 +62,7 @@ #include <asm/cacheflush.h> #include <linux/hash.h> #include <linux/genetlink.h> +#include <linux/nospec.h> #include <net/net_namespace.h> #include <net/sock.h> @@ -639,6 +640,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol, if (protocol < 0 || protocol >= MAX_LINKS) return -EPROTONOSUPPORT; + protocol = array_index_nospec(protocol, MAX_LINKS); netlink_lock_table(); #ifdef CONFIG_MODULES diff --git a/net/nfc/llcp_commands.c b/net/nfc/llcp_commands.c index d25212b135ea..04f060488686 100644 --- a/net/nfc/llcp_commands.c +++ b/net/nfc/llcp_commands.c @@ -754,11 +754,14 @@ int nfc_llcp_send_ui_frame(struct nfc_llcp_sock *sock, u8 ssap, u8 dsap, pr_debug("Fragment %zd bytes remaining %zd", frag_len, remaining_len); - pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, MSG_DONTWAIT, + pdu = nfc_alloc_send_skb(sock->dev, &sock->sk, 0, frag_len + LLCP_HEADER_SIZE, &err); if (pdu == NULL) { - pr_err("Could not allocate PDU\n"); - continue; + pr_err("Could not allocate PDU (error=%d)\n", err); + len -= remaining_len; + if (len == 0) + len = err; + break; } pdu = llcp_add_header(pdu, dsap, ssap, LLCP_PDU_UI); diff --git a/net/rds/loop.c b/net/rds/loop.c index 6b12b68541ae..05cab8c5a379 100644 --- a/net/rds/loop.c +++ b/net/rds/loop.c @@ -191,4 +191,5 @@ struct rds_transport rds_loop_transport = { .inc_copy_to_user = rds_message_inc_copy_to_user, .inc_free = rds_loop_inc_free, .t_name = "loopback", + .t_type = RDS_TRANS_LOOP, }; diff --git a/net/rds/rds.h b/net/rds/rds.h index 4588860f4c3b..254f1345cf7e 100644 --- a/net/rds/rds.h +++ b/net/rds/rds.h @@ -401,6 +401,11 @@ struct rds_notifier { int n_status; }; +/* Available as part of RDS core, so doesn't need to participate + * in get_preferred transport etc + */ +#define RDS_TRANS_LOOP 3 + /** * struct rds_transport - transport specific behavioural hooks * diff --git a/net/rds/recv.c b/net/rds/recv.c index 0514af3ab378..6275de19689c 100644 --- a/net/rds/recv.c +++ b/net/rds/recv.c @@ -76,6 +76,11 @@ static void rds_recv_rcvbuf_delta(struct rds_sock *rs, struct sock *sk, return; rs->rs_rcv_bytes += delta; + + /* loop transport doesn't send/recv congestion updates */ + if (rs->rs_transport->t_type == RDS_TRANS_LOOP) + return; + now_congested = rs->rs_rcv_bytes > rds_sk_rcvbuf(rs); rdsdebug("rs %p (%pI4:%u) recv bytes %d buf %d " diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c index 3fee70d9814f..562edd50fa94 100644 --- a/net/sched/sch_blackhole.c +++ b/net/sched/sch_blackhole.c @@ -20,7 +20,7 @@ static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch) { qdisc_drop(skb, sch); - return NET_XMIT_SUCCESS; + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS; } static struct sk_buff *blackhole_dequeue(struct Qdisc *sch) diff --git a/net/socket.c b/net/socket.c index 00a275ee227a..9f29a9ce6a92 100644 --- a/net/socket.c +++ b/net/socket.c @@ -91,6 +91,7 @@ #include <linux/xattr.h> #include <linux/seemp_api.h> #include <linux/seemp_instrumentation.h> +#include <linux/nospec.h> #include <asm/uaccess.h> #include <asm/unistd.h> @@ -2384,6 +2385,7 @@ SYSCALL_DEFINE2(socketcall, int, call, unsigned long __user *, args) if (call < 1 || call > SYS_SENDMMSG) return -EINVAL; + call = array_index_nospec(call, SYS_SENDMMSG + 1); len = nargs[call]; if (len > sizeof(a)) diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 8a1bb64f1dcd..0924291ef059 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -7,6 +7,7 @@ quote := " squote := ' empty := space := $(empty) $(empty) +pound := \# ### # Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o @@ -250,11 +251,11 @@ endif # Replace >$< with >$$< to preserve $ when reloading the .cmd file # (needed for make) -# Replace >#< with >\#< to avoid starting a comment in the .cmd file +# Replace >#< with >$(pound)< to avoid starting a comment in the .cmd file # (needed for make) # Replace >'< with >'\''< to be able to enclose the whole string in '...' # (needed for the shell) -make-cmd = $(call escsq,$(subst \#,\\\#,$(subst $$,$$$$,$(cmd_$(1))))) +make-cmd = $(call escsq,$(subst $(pound),$$(pound),$(subst $$,$$$$,$(cmd_$(1))))) # Find any prerequisites that is newer than target or that does not exist. # PHONY targets skipped in both cases. diff --git a/scripts/tags.sh b/scripts/tags.sh index 45e246595d10..2a61db329adf 100755 --- a/scripts/tags.sh +++ b/scripts/tags.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # Generate tags or cscope files # Usage tags.sh <mode> # @@ -135,11 +135,6 @@ all_kconfigs() find_other_sources 'Kconfig*' } -all_defconfigs() -{ - find_sources $ALLSOURCE_ARCHS "defconfig" -} - docscope() { (echo \-k; echo \-q; all_target_sources) > cscope.files @@ -151,8 +146,111 @@ dogtags() all_target_sources | gtags -i -f - } +# Basic regular expressions with an optional /kind-spec/ for ctags and +# the following limitations: +# - No regex modifiers +# - Use \{0,1\} instead of \?, because etags expects an unescaped ? +# - \s is not working with etags, use a space or [ \t] +# - \w works, but does not match underscores in etags +# - etags regular expressions have to match at the start of a line; +# a ^[^#] is prepended by setup_regex unless an anchor is already present +regex_asm=( + '/^\(ENTRY\|_GLOBAL\)(\([[:alnum:]_\\]*\)).*/\2/' +) +regex_c=( + '/^SYSCALL_DEFINE[0-9](\([[:alnum:]_]*\).*/sys_\1/' + '/^COMPAT_SYSCALL_DEFINE[0-9](\([[:alnum:]_]*\).*/compat_sys_\1/' + '/^TRACE_EVENT(\([[:alnum:]_]*\).*/trace_\1/' + '/^TRACE_EVENT(\([[:alnum:]_]*\).*/trace_\1_rcuidle/' + '/^DEFINE_EVENT([^,)]*, *\([[:alnum:]_]*\).*/trace_\1/' + '/^DEFINE_EVENT([^,)]*, *\([[:alnum:]_]*\).*/trace_\1_rcuidle/' + '/^DEFINE_INSN_CACHE_OPS(\([[:alnum:]_]*\).*/get_\1_slot/' + '/^DEFINE_INSN_CACHE_OPS(\([[:alnum:]_]*\).*/free_\1_slot/' + '/^PAGEFLAG(\([[:alnum:]_]*\).*/Page\1/' + '/^PAGEFLAG(\([[:alnum:]_]*\).*/SetPage\1/' + '/^PAGEFLAG(\([[:alnum:]_]*\).*/ClearPage\1/' + '/^TESTSETFLAG(\([[:alnum:]_]*\).*/TestSetPage\1/' + '/^TESTPAGEFLAG(\([[:alnum:]_]*\).*/Page\1/' + '/^SETPAGEFLAG(\([[:alnum:]_]*\).*/SetPage\1/' + '/\<__SETPAGEFLAG(\([[:alnum:]_]*\).*/__SetPage\1/' + '/\<TESTCLEARFLAG(\([[:alnum:]_]*\).*/TestClearPage\1/' + '/\<__TESTCLEARFLAG(\([[:alnum:]_]*\).*/TestClearPage\1/' + '/\<CLEARPAGEFLAG(\([[:alnum:]_]*\).*/ClearPage\1/' + '/\<__CLEARPAGEFLAG(\([[:alnum:]_]*\).*/__ClearPage\1/' + '/^__PAGEFLAG(\([[:alnum:]_]*\).*/__SetPage\1/' + '/^__PAGEFLAG(\([[:alnum:]_]*\).*/__ClearPage\1/' + '/^PAGEFLAG_FALSE(\([[:alnum:]_]*\).*/Page\1/' + '/\<TESTSCFLAG(\([[:alnum:]_]*\).*/TestSetPage\1/' + '/\<TESTSCFLAG(\([[:alnum:]_]*\).*/TestClearPage\1/' + '/\<SETPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/SetPage\1/' + '/\<CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/ClearPage\1/' + '/\<__CLEARPAGEFLAG_NOOP(\([[:alnum:]_]*\).*/__ClearPage\1/' + '/\<TESTCLEARFLAG_FALSE(\([[:alnum:]_]*\).*/TestClearPage\1/' + '/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/Page\1/' + '/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__SetPage\1/' + '/^PAGE_MAPCOUNT_OPS(\([[:alnum:]_]*\).*/__ClearPage\1/' + '/^TASK_PFA_TEST([^,]*, *\([[:alnum:]_]*\))/task_\1/' + '/^TASK_PFA_SET([^,]*, *\([[:alnum:]_]*\))/task_set_\1/' + '/^TASK_PFA_CLEAR([^,]*, *\([[:alnum:]_]*\))/task_clear_\1/' + '/^DEF_MMIO_\(IN\|OUT\)_[XD](\([[:alnum:]_]*\),[^)]*)/\2/' + '/^DEBUGGER_BOILERPLATE(\([[:alnum:]_]*\))/\1/' + '/^DEF_PCI_AC_\(\|NO\)RET(\([[:alnum:]_]*\).*/\2/' + '/^PCI_OP_READ(\(\w*\).*[1-4])/pci_bus_read_config_\1/' + '/^PCI_OP_WRITE(\(\w*\).*[1-4])/pci_bus_write_config_\1/' + '/\<DEFINE_\(MUTEX\|SEMAPHORE\|SPINLOCK\)(\([[:alnum:]_]*\)/\2/v/' + '/\<DEFINE_\(RAW_SPINLOCK\|RWLOCK\|SEQLOCK\)(\([[:alnum:]_]*\)/\2/v/' + '/\<DECLARE_\(RWSEM\|COMPLETION\)(\([[:alnum:]_]\+\)/\2/v/' + '/\<DECLARE_BITMAP(\([[:alnum:]_]*\)/\1/v/' + '/\(^\|\s\)\(\|L\|H\)LIST_HEAD(\([[:alnum:]_]*\)/\3/v/' + '/\(^\|\s\)RADIX_TREE(\([[:alnum:]_]*\)/\2/v/' + '/\<DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/' + '/\<DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/' + '/\<DECLARE_WAIT_QUEUE_HEAD(\([[:alnum:]_]*\)/\1/v/' + '/\<DECLARE_\(TASKLET\|WORK\|DELAYED_WORK\)(\([[:alnum:]_]*\)/\2/v/' + '/\(^\s\)OFFSET(\([[:alnum:]_]*\)/\2/v/' + '/\(^\s\)DEFINE(\([[:alnum:]_]*\)/\2/v/' + '/\<DEFINE_HASHTABLE(\([[:alnum:]_]*\)/\1/v/' +) +regex_kconfig=( + '/^[[:blank:]]*\(menu\|\)config[[:blank:]]\+\([[:alnum:]_]\+\)/\2/' + '/^[[:blank:]]*\(menu\|\)config[[:blank:]]\+\([[:alnum:]_]\+\)/CONFIG_\2/' +) +setup_regex() +{ + local mode=$1 lang tmp=() r + shift + + regex=() + for lang; do + case "$lang" in + asm) tmp=("${regex_asm[@]}") ;; + c) tmp=("${regex_c[@]}") ;; + kconfig) tmp=("${regex_kconfig[@]}") ;; + esac + for r in "${tmp[@]}"; do + if test "$mode" = "exuberant"; then + regex[${#regex[@]}]="--regex-$lang=${r}b" + else + # Remove ctags /kind-spec/ + case "$r" in + /*/*/?/) + r=${r%?/} + esac + # Prepend ^[^#] unless already anchored + case "$r" in + /^*) ;; + *) + r="/^[^#]*${r#/}" + esac + regex[${#regex[@]}]="--regex=$r" + fi + done + done +} + exuberant() { + setup_regex exuberant asm c all_target_sources | xargs $1 -a \ -I __initdata,__exitdata,__initconst, \ -I __initdata_memblock \ @@ -166,118 +264,22 @@ exuberant() -I EXPORT_SYMBOL,EXPORT_SYMBOL_GPL,ACPI_EXPORT_SYMBOL \ -I DEFINE_TRACE,EXPORT_TRACEPOINT_SYMBOL,EXPORT_TRACEPOINT_SYMBOL_GPL \ -I static,const \ - --extra=+f --c-kinds=+px \ - --regex-asm='/^(ENTRY|_GLOBAL)\(([^)]*)\).*/\2/' \ - --regex-c='/^SYSCALL_DEFINE[[:digit:]]?\(([^,)]*).*/sys_\1/' \ - --regex-c='/^COMPAT_SYSCALL_DEFINE[[:digit:]]?\(([^,)]*).*/compat_sys_\1/' \ - --regex-c++='/^TRACE_EVENT\(([^,)]*).*/trace_\1/' \ - --regex-c++='/^TRACE_EVENT\(([^,)]*).*/trace_\1_rcuidle/' \ - --regex-c++='/^DEFINE_EVENT\([^,)]*, *([^,)]*).*/trace_\1/' \ - --regex-c++='/^DEFINE_EVENT\([^,)]*, *([^,)]*).*/trace_\1_rcuidle/' \ - --regex-c++='/PAGEFLAG\(([^,)]*).*/Page\1/' \ - --regex-c++='/PAGEFLAG\(([^,)]*).*/SetPage\1/' \ - --regex-c++='/PAGEFLAG\(([^,)]*).*/ClearPage\1/' \ - --regex-c++='/TESTSETFLAG\(([^,)]*).*/TestSetPage\1/' \ - --regex-c++='/TESTPAGEFLAG\(([^,)]*).*/Page\1/' \ - --regex-c++='/SETPAGEFLAG\(([^,)]*).*/SetPage\1/' \ - --regex-c++='/__SETPAGEFLAG\(([^,)]*).*/__SetPage\1/' \ - --regex-c++='/TESTCLEARFLAG\(([^,)]*).*/TestClearPage\1/' \ - --regex-c++='/__TESTCLEARFLAG\(([^,)]*).*/TestClearPage\1/' \ - --regex-c++='/CLEARPAGEFLAG\(([^,)]*).*/ClearPage\1/' \ - --regex-c++='/__CLEARPAGEFLAG\(([^,)]*).*/__ClearPage\1/' \ - --regex-c++='/__PAGEFLAG\(([^,)]*).*/__SetPage\1/' \ - --regex-c++='/__PAGEFLAG\(([^,)]*).*/__ClearPage\1/' \ - --regex-c++='/PAGEFLAG_FALSE\(([^,)]*).*/Page\1/' \ - --regex-c++='/TESTSCFLAG\(([^,)]*).*/TestSetPage\1/' \ - --regex-c++='/TESTSCFLAG\(([^,)]*).*/TestClearPage\1/' \ - --regex-c++='/SETPAGEFLAG_NOOP\(([^,)]*).*/SetPage\1/' \ - --regex-c++='/CLEARPAGEFLAG_NOOP\(([^,)]*).*/ClearPage\1/' \ - --regex-c++='/__CLEARPAGEFLAG_NOOP\(([^,)]*).*/__ClearPage\1/' \ - --regex-c++='/TESTCLEARFLAG_FALSE\(([^,)]*).*/TestClearPage\1/' \ - --regex-c++='/__TESTCLEARFLAG_FALSE\(([^,)]*).*/__TestClearPage\1/' \ - --regex-c++='/_PE\(([^,)]*).*/PEVENT_ERRNO__\1/' \ - --regex-c++='/TASK_PFA_TEST\([^,]*,\s*([^)]*)\)/task_\1/' \ - --regex-c++='/TASK_PFA_SET\([^,]*,\s*([^)]*)\)/task_set_\1/' \ - --regex-c++='/TASK_PFA_CLEAR\([^,]*,\s*([^)]*)\)/task_clear_\1/'\ - --regex-c++='/DEF_MMIO_(IN|OUT)_(X|D)\(([^,]*),\s*[^)]*\)/\3/' \ - --regex-c++='/DEBUGGER_BOILERPLATE\(([^,]*)\)/\1/' \ - --regex-c='/PCI_OP_READ\((\w*).*[1-4]\)/pci_bus_read_config_\1/' \ - --regex-c='/PCI_OP_WRITE\((\w*).*[1-4]\)/pci_bus_write_config_\1/' \ - --regex-c='/DEFINE_(MUTEX|SEMAPHORE|SPINLOCK)\((\w*)/\2/v/' \ - --regex-c='/DEFINE_(RAW_SPINLOCK|RWLOCK|SEQLOCK)\((\w*)/\2/v/' \ - --regex-c='/DECLARE_(RWSEM|COMPLETION)\((\w*)/\2/v/' \ - --regex-c='/DECLARE_BITMAP\((\w*)/\1/v/' \ - --regex-c='/(^|\s)(|L|H)LIST_HEAD\((\w*)/\3/v/' \ - --regex-c='/(^|\s)RADIX_TREE\((\w*)/\2/v/' \ - --regex-c='/DEFINE_PER_CPU\(([^,]*,\s*)(\w*).*\)/\2/v/' \ - --regex-c='/DEFINE_PER_CPU_SHARED_ALIGNED\(([^,]*,\s*)(\w*).*\)/\2/v/' \ - --regex-c='/DECLARE_WAIT_QUEUE_HEAD\((\w*)/\1/v/' \ - --regex-c='/DECLARE_(TASKLET|WORK|DELAYED_WORK)\((\w*)/\2/v/' \ - --regex-c='/DEFINE_PCI_DEVICE_TABLE\((\w*)/\1/v/' \ - --regex-c='/(^\s)OFFSET\((\w*)/\2/v/' \ - --regex-c='/(^\s)DEFINE\((\w*)/\2/v/' \ - --regex-c='/DEFINE_HASHTABLE\((\w*)/\1/v/' + --extra=+fq --c-kinds=+px --fields=+iaS --langmap=c:+.h \ + "${regex[@]}" + setup_regex exuberant kconfig all_kconfigs | xargs $1 -a \ - --langdef=kconfig --language-force=kconfig \ - --regex-kconfig='/^[[:blank:]]*(menu|)config[[:blank:]]+([[:alnum:]_]+)/\2/' + --langdef=kconfig --language-force=kconfig "${regex[@]}" - all_kconfigs | xargs $1 -a \ - --langdef=kconfig --language-force=kconfig \ - --regex-kconfig='/^[[:blank:]]*(menu|)config[[:blank:]]+([[:alnum:]_]+)/CONFIG_\2/' - - all_defconfigs | xargs -r $1 -a \ - --langdef=dotconfig --language-force=dotconfig \ - --regex-dotconfig='/^#?[[:blank:]]*(CONFIG_[[:alnum:]_]+)/\1/' } emacs() { - all_target_sources | xargs $1 -a \ - --regex='/^\(ENTRY\|_GLOBAL\)(\([^)]*\)).*/\2/' \ - --regex='/^SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/sys_\1/' \ - --regex='/^COMPAT_SYSCALL_DEFINE[0-9]?(\([^,)]*\).*/compat_sys_\1/' \ - --regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1/' \ - --regex='/^TRACE_EVENT(\([^,)]*\).*/trace_\1_rcuidle/' \ - --regex='/^DEFINE_EVENT([^,)]*, *\([^,)]*\).*/trace_\1/' \ - --regex='/^DEFINE_EVENT([^,)]*, *\([^,)]*\).*/trace_\1_rcuidle/' \ - --regex='/PAGEFLAG(\([^,)]*\).*/Page\1/' \ - --regex='/PAGEFLAG(\([^,)]*\).*/SetPage\1/' \ - --regex='/PAGEFLAG(\([^,)]*\).*/ClearPage\1/' \ - --regex='/TESTSETFLAG(\([^,)]*\).*/TestSetPage\1/' \ - --regex='/TESTPAGEFLAG(\([^,)]*\).*/Page\1/' \ - --regex='/SETPAGEFLAG(\([^,)]*\).*/SetPage\1/' \ - --regex='/__SETPAGEFLAG(\([^,)]*\).*/__SetPage\1/' \ - --regex='/TESTCLEARFLAG(\([^,)]*\).*/TestClearPage\1/' \ - --regex='/__TESTCLEARFLAG(\([^,)]*\).*/TestClearPage\1/' \ - --regex='/CLEARPAGEFLAG(\([^,)]*\).*/ClearPage\1/' \ - --regex='/__CLEARPAGEFLAG(\([^,)]*\).*/__ClearPage\1/' \ - --regex='/__PAGEFLAG(\([^,)]*\).*/__SetPage\1/' \ - --regex='/__PAGEFLAG(\([^,)]*\).*/__ClearPage\1/' \ - --regex='/PAGEFLAG_FALSE(\([^,)]*\).*/Page\1/' \ - --regex='/TESTSCFLAG(\([^,)]*\).*/TestSetPage\1/' \ - --regex='/TESTSCFLAG(\([^,)]*\).*/TestClearPage\1/' \ - --regex='/SETPAGEFLAG_NOOP(\([^,)]*\).*/SetPage\1/' \ - --regex='/CLEARPAGEFLAG_NOOP(\([^,)]*\).*/ClearPage\1/' \ - --regex='/__CLEARPAGEFLAG_NOOP(\([^,)]*\).*/__ClearPage\1/' \ - --regex='/TESTCLEARFLAG_FALSE(\([^,)]*\).*/TestClearPage\1/' \ - --regex='/__TESTCLEARFLAG_FALSE(\([^,)]*\).*/__TestClearPage\1/' \ - --regex='/TASK_PFA_TEST\([^,]*,\s*([^)]*)\)/task_\1/' \ - --regex='/TASK_PFA_SET\([^,]*,\s*([^)]*)\)/task_set_\1/' \ - --regex='/TASK_PFA_CLEAR\([^,]*,\s*([^)]*)\)/task_clear_\1/' \ - --regex='/_PE(\([^,)]*\).*/PEVENT_ERRNO__\1/' \ - --regex='/PCI_OP_READ(\([a-z]*[a-z]\).*[1-4])/pci_bus_read_config_\1/' \ - --regex='/PCI_OP_WRITE(\([a-z]*[a-z]\).*[1-4])/pci_bus_write_config_\1/'\ - --regex='/[^#]*DEFINE_HASHTABLE(\([^,)]*\)/\1/' - - all_kconfigs | xargs $1 -a \ - --regex='/^[ \t]*\(\(menu\)*config\)[ \t]+\([a-zA-Z0-9_]+\)/\3/' - - all_kconfigs | xargs $1 -a \ - --regex='/^[ \t]*\(\(menu\)*config\)[ \t]+\([a-zA-Z0-9_]+\)/CONFIG_\3/' + setup_regex emacs asm c + all_target_sources | xargs $1 -a "${regex[@]}" - all_defconfigs | xargs -r $1 -a \ - --regex='/^#?[ \t]?\(CONFIG_[a-zA-Z0-9_]+\)/\1/' + setup_regex emacs kconfig + all_kconfigs | xargs $1 -a "${regex[@]}" } xtags() diff --git a/sound/pci/emu10k1/emupcm.c b/sound/pci/emu10k1/emupcm.c index 14a305bd8a98..72e442d86bb1 100644 --- a/sound/pci/emu10k1/emupcm.c +++ b/sound/pci/emu10k1/emupcm.c @@ -1850,7 +1850,9 @@ int snd_emu10k1_pcm_efx(struct snd_emu10k1 *emu, int device) if (!kctl) return -ENOMEM; kctl->id.device = device; - snd_ctl_add(emu->card, kctl); + err = snd_ctl_add(emu->card, kctl); + if (err < 0) + return err; snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV, snd_dma_pci_data(emu->pci), 64*1024, 64*1024); diff --git a/sound/pci/emu10k1/memory.c b/sound/pci/emu10k1/memory.c index 4f1f69be1865..8c778fa33031 100644 --- a/sound/pci/emu10k1/memory.c +++ b/sound/pci/emu10k1/memory.c @@ -237,13 +237,13 @@ __found_pages: static int is_valid_page(struct snd_emu10k1 *emu, dma_addr_t addr) { if (addr & ~emu->dma_mask) { - dev_err(emu->card->dev, + dev_err_ratelimited(emu->card->dev, "max memory size is 0x%lx (addr = 0x%lx)!!\n", emu->dma_mask, (unsigned long)addr); return 0; } if (addr & (EMUPAGESIZE-1)) { - dev_err(emu->card->dev, "page is not aligned\n"); + dev_err_ratelimited(emu->card->dev, "page is not aligned\n"); return 0; } return 1; @@ -334,7 +334,7 @@ snd_emu10k1_alloc_pages(struct snd_emu10k1 *emu, struct snd_pcm_substream *subst else addr = snd_pcm_sgbuf_get_addr(substream, ofs); if (! is_valid_page(emu, addr)) { - dev_err(emu->card->dev, + dev_err_ratelimited(emu->card->dev, "emu: failure page = %d\n", idx); mutex_unlock(&hdr->block_mutex); return NULL; diff --git a/sound/pci/fm801.c b/sound/pci/fm801.c index 1fdd92b6f18f..d6e89a6d0bb9 100644 --- a/sound/pci/fm801.c +++ b/sound/pci/fm801.c @@ -1050,11 +1050,19 @@ static int snd_fm801_mixer(struct fm801 *chip) if ((err = snd_ac97_mixer(chip->ac97_bus, &ac97, &chip->ac97_sec)) < 0) return err; } - for (i = 0; i < FM801_CONTROLS; i++) - snd_ctl_add(chip->card, snd_ctl_new1(&snd_fm801_controls[i], chip)); + for (i = 0; i < FM801_CONTROLS; i++) { + err = snd_ctl_add(chip->card, + snd_ctl_new1(&snd_fm801_controls[i], chip)); + if (err < 0) + return err; + } if (chip->multichannel) { - for (i = 0; i < FM801_CONTROLS_MULTI; i++) - snd_ctl_add(chip->card, snd_ctl_new1(&snd_fm801_controls_multi[i], chip)); + for (i = 0; i < FM801_CONTROLS_MULTI; i++) { + err = snd_ctl_add(chip->card, + snd_ctl_new1(&snd_fm801_controls_multi[i], chip)); + if (err < 0) + return err; + } } return 0; } diff --git a/sound/pci/hda/patch_ca0132.c b/sound/pci/hda/patch_ca0132.c index 29e1ce2263bc..c55c0131be0a 100644 --- a/sound/pci/hda/patch_ca0132.c +++ b/sound/pci/hda/patch_ca0132.c @@ -38,6 +38,10 @@ /* Enable this to see controls for tuning purpose. */ /*#define ENABLE_TUNING_CONTROLS*/ +#ifdef ENABLE_TUNING_CONTROLS +#include <sound/tlv.h> +#endif + #define FLOAT_ZERO 0x00000000 #define FLOAT_ONE 0x3f800000 #define FLOAT_TWO 0x40000000 @@ -3067,8 +3071,8 @@ static int equalizer_ctl_put(struct snd_kcontrol *kcontrol, return 1; } -static const DECLARE_TLV_DB_SCALE(voice_focus_db_scale, 2000, 100, 0); -static const DECLARE_TLV_DB_SCALE(eq_db_scale, -2400, 100, 0); +static const SNDRV_CTL_TLVD_DECLARE_DB_SCALE(voice_focus_db_scale, 2000, 100, 0); +static const SNDRV_CTL_TLVD_DECLARE_DB_SCALE(eq_db_scale, -2400, 100, 0); static int add_tuning_control(struct hda_codec *codec, hda_nid_t pnid, hda_nid_t nid, diff --git a/sound/soc/pxa/brownstone.c b/sound/soc/pxa/brownstone.c index 6147e86e9b0f..55ca9c9364b8 100644 --- a/sound/soc/pxa/brownstone.c +++ b/sound/soc/pxa/brownstone.c @@ -136,3 +136,4 @@ module_platform_driver(mmp_driver); MODULE_AUTHOR("Leo Yan <leoy@marvell.com>"); MODULE_DESCRIPTION("ALSA SoC Brownstone"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:brownstone-audio"); diff --git a/sound/soc/pxa/mioa701_wm9713.c b/sound/soc/pxa/mioa701_wm9713.c index 29bc60e85e92..6cd28f95d548 100644 --- a/sound/soc/pxa/mioa701_wm9713.c +++ b/sound/soc/pxa/mioa701_wm9713.c @@ -203,3 +203,4 @@ module_platform_driver(mioa701_wm9713_driver); MODULE_AUTHOR("Robert Jarzmik (rjarzmik@free.fr)"); MODULE_DESCRIPTION("ALSA SoC WM9713 MIO A701"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:mioa701-wm9713"); diff --git a/sound/soc/pxa/mmp-pcm.c b/sound/soc/pxa/mmp-pcm.c index 51e790d006f5..96df9b2d8fc4 100644 --- a/sound/soc/pxa/mmp-pcm.c +++ b/sound/soc/pxa/mmp-pcm.c @@ -248,3 +248,4 @@ module_platform_driver(mmp_pcm_driver); MODULE_AUTHOR("Leo Yan <leoy@marvell.com>"); MODULE_DESCRIPTION("MMP Soc Audio DMA module"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:mmp-pcm-audio"); diff --git a/sound/soc/pxa/mmp-sspa.c b/sound/soc/pxa/mmp-sspa.c index eca60c29791a..ca8b23f8c525 100644 --- a/sound/soc/pxa/mmp-sspa.c +++ b/sound/soc/pxa/mmp-sspa.c @@ -482,3 +482,4 @@ module_platform_driver(asoc_mmp_sspa_driver); MODULE_AUTHOR("Leo Yan <leoy@marvell.com>"); MODULE_DESCRIPTION("MMP SSPA SoC Interface"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:mmp-sspa-dai"); diff --git a/sound/soc/pxa/palm27x.c b/sound/soc/pxa/palm27x.c index 4e74d9573f03..bcc81e920a67 100644 --- a/sound/soc/pxa/palm27x.c +++ b/sound/soc/pxa/palm27x.c @@ -161,3 +161,4 @@ module_platform_driver(palm27x_wm9712_driver); MODULE_AUTHOR("Marek Vasut <marek.vasut@gmail.com>"); MODULE_DESCRIPTION("ALSA SoC Palm T|X, T5 and LifeDrive"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:palm27x-asoc"); diff --git a/sound/soc/pxa/pxa-ssp.c b/sound/soc/pxa/pxa-ssp.c index da03fad1b9cd..3cad990dad2c 100644 --- a/sound/soc/pxa/pxa-ssp.c +++ b/sound/soc/pxa/pxa-ssp.c @@ -833,3 +833,4 @@ module_platform_driver(asoc_ssp_driver); MODULE_AUTHOR("Mark Brown <broonie@opensource.wolfsonmicro.com>"); MODULE_DESCRIPTION("PXA SSP/PCM SoC Interface"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:pxa-ssp-dai"); diff --git a/sound/soc/pxa/pxa2xx-ac97.c b/sound/soc/pxa/pxa2xx-ac97.c index f3de615aacd7..9615e6de1306 100644 --- a/sound/soc/pxa/pxa2xx-ac97.c +++ b/sound/soc/pxa/pxa2xx-ac97.c @@ -287,3 +287,4 @@ module_platform_driver(pxa2xx_ac97_driver); MODULE_AUTHOR("Nicolas Pitre"); MODULE_DESCRIPTION("AC97 driver for the Intel PXA2xx chip"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:pxa2xx-ac97"); diff --git a/sound/soc/pxa/pxa2xx-pcm.c b/sound/soc/pxa/pxa2xx-pcm.c index 9f390398d518..410d48b93031 100644 --- a/sound/soc/pxa/pxa2xx-pcm.c +++ b/sound/soc/pxa/pxa2xx-pcm.c @@ -117,3 +117,4 @@ module_platform_driver(pxa_pcm_driver); MODULE_AUTHOR("Nicolas Pitre"); MODULE_DESCRIPTION("Intel PXA2xx PCM DMA module"); MODULE_LICENSE("GPL"); +MODULE_ALIAS("platform:pxa-pcm-audio"); diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 9694fe13021f..609b9b294429 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -1796,8 +1796,10 @@ int dpcm_be_dai_shutdown(struct snd_soc_pcm_runtime *fe, int stream) continue; if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_HW_FREE) && - (be->dpcm[stream].state != SND_SOC_DPCM_STATE_OPEN)) - continue; + (be->dpcm[stream].state != SND_SOC_DPCM_STATE_OPEN)) { + soc_pcm_hw_free(be_substream); + be->dpcm[stream].state = SND_SOC_DPCM_STATE_HW_FREE; + } dev_dbg(be->dev, "ASoC: close BE %s\n", dpcm->fe->dai_link->name); diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c index 07d9ae103988..b95f8c50c607 100644 --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -1358,7 +1358,7 @@ static void retire_capture_urb(struct snd_usb_substream *subs, if (bytes % (runtime->sample_bits >> 3) != 0) { int oldbytes = bytes; bytes = frames * stride; - dev_warn(&subs->dev->dev, + dev_warn_ratelimited(&subs->dev->dev, "Corrected urb data len. %d->%d\n", oldbytes, bytes); } diff --git a/tools/arch/x86/include/asm/unistd_32.h b/tools/arch/x86/include/asm/unistd_32.h new file mode 100644 index 000000000000..cf33ab09273d --- /dev/null +++ b/tools/arch/x86/include/asm/unistd_32.h @@ -0,0 +1,9 @@ +#ifndef __NR_perf_event_open +# define __NR_perf_event_open 336 +#endif +#ifndef __NR_futex +# define __NR_futex 240 +#endif +#ifndef __NR_gettid +# define __NR_gettid 224 +#endif diff --git a/tools/arch/x86/include/asm/unistd_64.h b/tools/arch/x86/include/asm/unistd_64.h new file mode 100644 index 000000000000..2c9835695b56 --- /dev/null +++ b/tools/arch/x86/include/asm/unistd_64.h @@ -0,0 +1,9 @@ +#ifndef __NR_perf_event_open +# define __NR_perf_event_open 298 +#endif +#ifndef __NR_futex +# define __NR_futex 202 +#endif +#ifndef __NR_gettid +# define __NR_gettid 186 +#endif diff --git a/tools/build/Build.include b/tools/build/Build.include index 1c570528baf7..0340d8a51dab 100644 --- a/tools/build/Build.include +++ b/tools/build/Build.include @@ -12,6 +12,7 @@ # Convenient variables comma := , squote := ' +pound := \# ### # Name of target with a '.' as filename prefix. foo/bar.o => foo/.bar.o @@ -43,11 +44,11 @@ echo-cmd = $(if $($(quiet)cmd_$(1)),\ ### # Replace >$< with >$$< to preserve $ when reloading the .cmd file # (needed for make) -# Replace >#< with >\#< to avoid starting a comment in the .cmd file +# Replace >#< with >$(pound)< to avoid starting a comment in the .cmd file # (needed for make) # Replace >'< with >'\''< to be able to enclose the whole string in '...' # (needed for the shell) -make-cmd = $(call escsq,$(subst \#,\\\#,$(subst $$,$$$$,$(cmd_$(1))))) +make-cmd = $(call escsq,$(subst $(pound),$$(pound),$(subst $$,$$$$,$(cmd_$(1))))) ### # Find any prerequisites that is newer than target or that does not exist. diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile index 405c1c1e2975..9a4988cf7b38 100644 --- a/tools/perf/config/Makefile +++ b/tools/perf/config/Makefile @@ -200,6 +200,7 @@ CFLAGS += -I$(src-perf)/arch/$(ARCH)/include CFLAGS += -I$(srctree)/tools/include/ CFLAGS += -I$(srctree)/arch/$(ARCH)/include/uapi CFLAGS += -I$(srctree)/arch/$(ARCH)/include +CFLAGS += -I$(srctree)/tools/arch/$(ARCH)/include CFLAGS += -I$(srctree)/include/uapi CFLAGS += -I$(srctree)/include diff --git a/tools/perf/perf-sys.h b/tools/perf/perf-sys.h index 83a25cef82fd..5cee8a3d0455 100644 --- a/tools/perf/perf-sys.h +++ b/tools/perf/perf-sys.h @@ -11,29 +11,11 @@ #if defined(__i386__) #define cpu_relax() asm volatile("rep; nop" ::: "memory"); #define CPUINFO_PROC {"model name"} -#ifndef __NR_perf_event_open -# define __NR_perf_event_open 336 -#endif -#ifndef __NR_futex -# define __NR_futex 240 -#endif -#ifndef __NR_gettid -# define __NR_gettid 224 -#endif #endif #if defined(__x86_64__) #define cpu_relax() asm volatile("rep; nop" ::: "memory"); #define CPUINFO_PROC {"model name"} -#ifndef __NR_perf_event_open -# define __NR_perf_event_open 298 -#endif -#ifndef __NR_futex -# define __NR_futex 202 -#endif -#ifndef __NR_gettid -# define __NR_gettid 186 -#endif #endif #ifdef __powerpc__ diff --git a/tools/perf/util/include/asm/unistd_32.h b/tools/perf/util/include/asm/unistd_32.h deleted file mode 100644 index 8b137891791f..000000000000 --- a/tools/perf/util/include/asm/unistd_32.h +++ /dev/null @@ -1 +0,0 @@ - diff --git a/tools/perf/util/include/asm/unistd_64.h b/tools/perf/util/include/asm/unistd_64.h deleted file mode 100644 index 8b137891791f..000000000000 --- a/tools/perf/util/include/asm/unistd_64.h +++ /dev/null @@ -1 +0,0 @@ - diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include index 19edc1a7a232..7ea4438b801d 100644 --- a/tools/scripts/Makefile.include +++ b/tools/scripts/Makefile.include @@ -92,3 +92,5 @@ ifneq ($(silent),1) QUIET_INSTALL = @printf ' INSTALL %s\n' $1; endif endif + +pound := \# diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 882fe83a3554..b3f345433ec7 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -1476,15 +1476,19 @@ TEST_F(TRACE_syscall, syscall_dropped) #define SECCOMP_SET_MODE_FILTER 1 #endif -#ifndef SECCOMP_FLAG_FILTER_TSYNC -#define SECCOMP_FLAG_FILTER_TSYNC 1 +#ifndef SECCOMP_FILTER_FLAG_TSYNC +#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) +#endif + +#ifndef SECCOMP_FILTER_FLAG_SPEC_ALLOW +#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) #endif #ifndef seccomp -int seccomp(unsigned int op, unsigned int flags, struct sock_fprog *filter) +int seccomp(unsigned int op, unsigned int flags, void *args) { errno = 0; - return syscall(__NR_seccomp, op, flags, filter); + return syscall(__NR_seccomp, op, flags, args); } #endif @@ -1576,6 +1580,78 @@ TEST(seccomp_syscall_mode_lock) } } +/* + * Test detection of known and unknown filter flags. Userspace needs to be able + * to check if a filter flag is supported by the current kernel and a good way + * of doing that is by attempting to enter filter mode, with the flag bit in + * question set, and a NULL pointer for the _args_ parameter. EFAULT indicates + * that the flag is valid and EINVAL indicates that the flag is invalid. + */ +TEST(detect_seccomp_filter_flags) +{ + unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC, + SECCOMP_FILTER_FLAG_SPEC_ALLOW }; + unsigned int flag, all_flags; + int i; + long ret; + + /* Test detection of known-good filter flags */ + for (i = 0, all_flags = 0; i < ARRAY_SIZE(flags); i++) { + int bits = 0; + + flag = flags[i]; + /* Make sure the flag is a single bit! */ + while (flag) { + if (flag & 0x1) + bits ++; + flag >>= 1; + } + ASSERT_EQ(1, bits); + flag = flags[i]; + + ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL); + ASSERT_NE(ENOSYS, errno) { + TH_LOG("Kernel does not support seccomp syscall!"); + } + EXPECT_EQ(-1, ret); + EXPECT_EQ(EFAULT, errno) { + TH_LOG("Failed to detect that a known-good filter flag (0x%X) is supported!", + flag); + } + + all_flags |= flag; + } + + /* Test detection of all known-good filter flags */ + ret = seccomp(SECCOMP_SET_MODE_FILTER, all_flags, NULL); + EXPECT_EQ(-1, ret); + EXPECT_EQ(EFAULT, errno) { + TH_LOG("Failed to detect that all known-good filter flags (0x%X) are supported!", + all_flags); + } + + /* Test detection of an unknown filter flag */ + flag = -1; + ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL); + EXPECT_EQ(-1, ret); + EXPECT_EQ(EINVAL, errno) { + TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported!", + flag); + } + + /* + * Test detection of an unknown filter flag that may simply need to be + * added to this test + */ + flag = flags[ARRAY_SIZE(flags) - 1] << 1; + ret = seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL); + EXPECT_EQ(-1, ret); + EXPECT_EQ(EINVAL, errno) { + TH_LOG("Failed to detect that an unknown filter flag (0x%X) is unsupported! Does a new flag need to be added to this test?", + flag); + } +} + TEST(TSYNC_first) { struct sock_filter filter[] = { @@ -1592,7 +1668,7 @@ TEST(TSYNC_first) TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); } - ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC, + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &prog); ASSERT_NE(ENOSYS, errno) { TH_LOG("Kernel does not support seccomp syscall!"); @@ -1810,7 +1886,7 @@ TEST_F(TSYNC, two_siblings_with_ancestor) self->sibling_count++; } - ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC, + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &self->apply_prog); ASSERT_EQ(0, ret) { TH_LOG("Could install filter on all threads!"); @@ -1871,7 +1947,7 @@ TEST_F(TSYNC, two_siblings_with_no_filter) TH_LOG("Kernel does not support PR_SET_NO_NEW_PRIVS!"); } - ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC, + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &self->apply_prog); ASSERT_NE(ENOSYS, errno) { TH_LOG("Kernel does not support seccomp syscall!"); @@ -1919,7 +1995,7 @@ TEST_F(TSYNC, two_siblings_with_one_divergence) self->sibling_count++; } - ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC, + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &self->apply_prog); ASSERT_EQ(self->sibling[0].system_tid, ret) { TH_LOG("Did not fail on diverged sibling."); @@ -1971,7 +2047,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter) TH_LOG("Kernel does not support SECCOMP_SET_MODE_FILTER!"); } - ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC, + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &self->apply_prog); ASSERT_EQ(ret, self->sibling[0].system_tid) { TH_LOG("Did not fail on diverged sibling."); @@ -2000,7 +2076,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter) /* Switch to the remaining sibling */ sib = !sib; - ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC, + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &self->apply_prog); ASSERT_EQ(0, ret) { TH_LOG("Expected the remaining sibling to sync"); @@ -2023,7 +2099,7 @@ TEST_F(TSYNC, two_siblings_not_under_filter) while (!kill(self->sibling[sib].system_tid, 0)) sleep(0.1); - ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FLAG_FILTER_TSYNC, + ret = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, &self->apply_prog); ASSERT_EQ(0, ret); /* just us chickens */ } diff --git a/tools/usb/usbip/src/usbip_detach.c b/tools/usb/usbip/src/usbip_detach.c index 9db9d21bb2ec..6a8db858caa5 100644 --- a/tools/usb/usbip/src/usbip_detach.c +++ b/tools/usb/usbip/src/usbip_detach.c @@ -43,7 +43,7 @@ void usbip_detach_usage(void) static int detach_port(char *port) { - int ret; + int ret = 0; uint8_t portnum; char path[PATH_MAX+1]; @@ -73,9 +73,12 @@ static int detach_port(char *port) } ret = usbip_vhci_detach_device(portnum); - if (ret < 0) - return -1; + if (ret < 0) { + ret = -1; + goto call_driver_close; + } +call_driver_close: usbip_vhci_driver_close(); return ret; diff --git a/verity_dev_keys.x509 b/verity_dev_keys.x509 new file mode 100644 index 000000000000..86399c3c1dd7 --- /dev/null +++ b/verity_dev_keys.x509 @@ -0,0 +1,24 @@ +-----BEGIN CERTIFICATE----- +MIID/TCCAuWgAwIBAgIJAJcPmDkJqolJMA0GCSqGSIb3DQEBBQUAMIGUMQswCQYD +VQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwNTW91bnRhaW4g +VmlldzEQMA4GA1UECgwHQW5kcm9pZDEQMA4GA1UECwwHQW5kcm9pZDEQMA4GA1UE +AwwHQW5kcm9pZDEiMCAGCSqGSIb3DQEJARYTYW5kcm9pZEBhbmRyb2lkLmNvbTAe +Fw0xNDExMDYxOTA3NDBaFw00MjAzMjQxOTA3NDBaMIGUMQswCQYDVQQGEwJVUzET +MBEGA1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwNTW91bnRhaW4gVmlldzEQMA4G +A1UECgwHQW5kcm9pZDEQMA4GA1UECwwHQW5kcm9pZDEQMA4GA1UEAwwHQW5kcm9p +ZDEiMCAGCSqGSIb3DQEJARYTYW5kcm9pZEBhbmRyb2lkLmNvbTCCASIwDQYJKoZI +hvcNAQEBBQADggEPADCCAQoCggEBAOjreE0vTVSRenuzO9vnaWfk0eQzYab0gqpi +6xAzi6dmD+ugoEKJmbPiuE5Dwf21isZ9uhUUu0dQM46dK4ocKxMRrcnmGxydFn6o +fs3ODJMXOkv2gKXL/FdbEPdDbxzdu8z3yk+W67udM/fW7WbaQ3DO0knu+izKak/3 +T41c5uoXmQ81UNtAzRGzGchNVXMmWuTGOkg6U+0I2Td7K8yvUMWhAWPPpKLtVH9r +AL5TzjYNR92izdKcz3AjRsI3CTjtpiVABGeX0TcjRSuZB7K9EK56HV+OFNS6I1NP +jdD7FIShyGlqqZdUOkAUZYanbpgeT5N7QL6uuqcGpoTOkalu6kkCAwEAAaNQME4w +HQYDVR0OBBYEFH5DM/m7oArf4O3peeKO0ZIEkrQPMB8GA1UdIwQYMBaAFH5DM/m7 +oArf4O3peeKO0ZIEkrQPMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEFBQADggEB +AHO3NSvDE5jFvMehGGtS8BnFYdFKRIglDMc4niWSzhzOVYRH4WajxdtBWc5fx0ix +NF/+hVKVhP6AIOQa+++sk+HIi7RvioPPbhjcsVlZe7cUEGrLSSveGouQyc+j0+m6 +JF84kszIl5GGNMTnx0XRPO+g8t6h5LWfnVydgZfpGRRg+WHewk1U2HlvTjIceb0N +dcoJ8WKJAFWdcuE7VIm4w+vF/DYX/A2Oyzr2+QRhmYSv1cusgAeC1tvH4ap+J1Lg +UnOu5Kh/FqPLLSwNVQp4Bu7b9QFfqK8Moj84bj88NqRGZgDyqzuTrFxn6FW7dmyA +yttuAJAEAymk1mipd9+zp38= +-----END CERTIFICATE----- diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 49001fa84ead..1203829316b2 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -119,8 +119,12 @@ irqfd_shutdown(struct work_struct *work) { struct kvm_kernel_irqfd *irqfd = container_of(work, struct kvm_kernel_irqfd, shutdown); + struct kvm *kvm = irqfd->kvm; u64 cnt; + /* Make sure irqfd has been initalized in assign path. */ + synchronize_srcu(&kvm->irq_srcu); + /* * Synchronize with the wait-queue and unhook ourselves to prevent * further events. @@ -387,7 +391,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) idx = srcu_read_lock(&kvm->irq_srcu); irqfd_update(kvm, irqfd); - srcu_read_unlock(&kvm->irq_srcu, idx); list_add_tail(&irqfd->list, &kvm->irqfds.items); @@ -419,6 +422,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) irqfd->consumer.token, ret); #endif + srcu_read_unlock(&kvm->irq_srcu, idx); return 0; fail: |
