summaryrefslogtreecommitdiff
path: root/drivers/net/wireguard (follow)
Commit message (Collapse)AuthorAge
* Merge remote-tracking branch 'msm8998/lineage-20' into lineage-20Raghuram Subramani2024-10-17
| | | | Change-Id: I126075a330f305c85f8fe1b8c9d408f368be95d1
* wireguard: compat: udp_tunnel: Account for kernel 4.4 BPF backportBruno Martins2022-04-19
| | | | | | | Function 'udp_tunnel6_xmit_skb' had been redefined in commit 4d5805d. Compat hack must be removed to fix compilation issue. Change-Id: I155b1c45ef57ca2be4fb3f005a5df174fc9041b9
* wireguard: version: bumpJason A. Donenfeld2021-07-23
| | | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Bruno Martins <bgcngm@gmail.com> Change-Id: I769fd320e84f87264ad2920bee6527033d7d32c0
* allowedips: add missing __rcu annotation to satisfy sparseJason A. Donenfeld2021-07-23
| | | | | | | | A __rcu annotation got lost during refactoring, which caused sparse to become enraged. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I4cd36bc2c91f1417ceae18aba00fb7c58e67e554
* allowedips: free empty intermediate nodes when removing single nodeJason A. Donenfeld2021-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When removing single nodes, it's possible that that node's parent is an empty intermediate node, in which case, it too should be removed. Otherwise the trie fills up and never is fully emptied, leading to gradual memory leaks over time for tries that are modified often. There was originally code to do this, but was removed during refactoring in 2016 and never reworked. Now that we have proper parent pointers from the previous commits, we can implement this properly. In order to reduce branching and expensive comparisons, we want to keep the double pointer for parent assignment (which lets us easily chain up to the root), but we still need to actually get the parent's base address. So encode the bit number into the last two bits of the pointer, and pack and unpack it as needed. This is a little bit clumsy but is the fastest and less memory wasteful of the compromises. Note that we align the root struct here to a minimum of 4, because it's embedded into a larger struct, and we're relying on having the bottom two bits for our flag, which would only be 16-bit aligned on m68k. The existing macro-based helpers were a bit unwieldy for adding the bit packing to, so this commit replaces them with safer and clearer ordinary functions. We add a test to the randomized/fuzzer part of the selftests, to free the randomized tries by-peer, refuzz it, and repeat, until it's supposed to be empty, and then then see if that actually resulted in the whole thing being emptied. That combined with kmemcheck should hopefully make sure this commit is doing what it should. Along the way this resulted in various other cleanups of the tests and fixes for recent graphviz. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: Iec08d8fbb46e611514749fafa0f4f2dd08531518
* allowedips: allocate nodes in kmem_cacheJason A. Donenfeld2021-07-23
| | | | | | | | | | | | | | | | The previous commit moved from O(n) to O(1) for removal, but in the process introduced an additional pointer member to a struct that increased the size from 60 to 68 bytes, putting nodes in the 128-byte slab. With deployed systems having as many as 2 million nodes, this represents a significant doubling in memory usage (128 MiB -> 256 MiB). Fix this by using our own kmem_cache, that's sized exactly right. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: Iabf9dabe6a6a36aceff23e3021d51c079c9525e4
* allowedips: remove nodes in O(1)Jason A. Donenfeld2021-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, deleting peers would require traversing the entire trie in order to rebalance nodes and safely free them. This meant that removing 1000 peers from a trie with a half million nodes would take an extremely long time, during which we're holding the rtnl lock. Large-scale users were reporting 200ms latencies added to the networking stack as a whole every time their userspace software would queue up significant removals. That's a serious situation. This commit fixes that by maintaining a double pointer to the parent's bit pointer for each node, and then using the already existing node list belonging to each peer to go directly to the node, fix up its pointers, and free it with RCU. This means removal is O(1) instead of O(n), and we don't use gobs of stack. The removal algorithm has the same downside as the code that it fixes: it won't collapse needlessly long runs of fillers. We can enhance that in the future if it ever becomes a problem. This commit documents that limitation with a TODO comment in code, a small but meaningful improvement over the prior situation. Currently the biggest flaw, which the next commit addresses, is that because this increases the node size on 64-bit machines from 60 bytes to 68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up using twice as much memory per node, because of power-of-two allocations, which is a big bummer. We'll need to figure something out there. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: Ia250b8ae19a61ccd6d5deea189c9cb4fa2dfa851
* allowedips: initialize list head in selftestJason A. Donenfeld2021-07-23
| | | | | | | | | | | | | The randomized trie tests weren't initializing the dummy peer list head, resulting in a NULL pointer dereference when used. Fix this by initializing it in the randomized trie test, just like we do for the static unit test. While we're at it, all of the other strings like this have the word "self-test", so add it to the missing place here. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I8a9b8466427c0357231013b4a988762ab6d75d7e
* peer: allocate in kmem_cacheJason A. Donenfeld2021-07-23
| | | | | | | | | | | | | | | With deployments having upwards of 600k peers now, this somewhat heavy structure could benefit from more fine-grained allocations. Specifically, instead of using a 2048-byte slab for a 1544-byte object, we can now use 1544-byte objects directly, thus saving almost 25% per-peer, or with 600k peers, that's a savings of 303 MiB. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I6564caccfeb39a7c016050b832a284f2cac7b99f
* global: use synchronize_net rather than synchronize_rcuJason A. Donenfeld2021-07-23
| | | | | | | | | | | | | | Many of the synchronization points are sometimes called under the rtnl lock, which means we should use synchronize_net rather than synchronize_rcu. Under the hood, this expands to using the expedited flavor of function in the event that rtnl is held, in order to not stall other concurrent changes. This fixes some very, very long delays when removing multiple peers at once, which would cause some operations to take several minutes. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I0801d60862f1272ed01b8f3ab2592001d5325a35
* wireguard: version: bumpJason A. Donenfeld2021-07-23
| | | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Bruno Martins <bgcngm@gmail.com> Change-Id: Ia3bbae508040960235f859aeb1aee5dde1c86445
* Revert "compat: skb_mark_not_on_list will be backported to Ubuntu 18.04"Thadeu Lima de Souza Cascardo2021-07-23
| | | | | | | | | | | | | | | | | This reverts commit cad80597c7947f0def83caf8cb56aff0149c83a8. Because this commit has not been backported so far, due to the implications of building Ubuntu's backport of wireguard in a timely manner. For now, reverting this fix would allow wireguard-linux-compat CI to work on Ubuntu 18.04. A different fix or the same one can be applied again when the time is right. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I5eadc2e7ca7495a012c7f3fb1a9dcd8eabdbe139
* compat: update and improve detection of CentOS Stream 8Peter Georg2021-07-23
| | | | | | | | | | | CentOS Stream 8 by now (4.18.0-301.1.el8) reports RHEL_MINOR=5. The current RHEL 8 minor release is still 3. RHEL 8.4 is in beta. Replace equal comparison by greater equal to (hopefully) be a little bit more future proof. Signed-off-by: Peter Georg <peter.georg@physik.uni-regensburg.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I94a227b5bfcfe7bb7cbadd9caac5d7b5b3f0fd7d
* compat: icmp_ndo_send functions were backported extensivelyJason A. Donenfeld2021-07-23
| | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: Ic8e394cf71535f494d500b3955d3c0a9df1c655b
* wireguard: version: bumpJason A. Donenfeld2021-07-23
| | | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Bruno Martins <bgcngm@gmail.com> Change-Id: I469cf3ae8a7c67b7c48afbea6fe339b97deff2d6
* compat: zero out skb->cb before icmpJason A. Donenfeld2021-07-23
| | | | | | | | This corresponds to the fancier upstream commit that's still on lkml, which passes a zeroed ip_options struct to __icmp_send. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I8853014649d6b4595b9f407045991361299d9560
* compat: skb_mark_not_on_list will be backported to Ubuntu 18.04Thadeu Lima de Souza Cascardo2021-07-23
| | | | | | | | | | linux commit 22f6bbb7bcfcef0b373b0502a7ff390275c575dd ("net: use skb_list_del_init() to remove from RX sublists") will be backported to Ubuntu 18.04 default kernel, which is based on linux 4.15. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: Ib10df4441e52ce85a4285f41253f2d08dd955a07
* queueing: get rid of per-peer ring buffersJason A. Donenfeld2021-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having two ring buffers per-peer means that every peer results in two massive ring allocations. On an 8-core x86_64 machine, this commit reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which is an 90% reduction. Ninety percent! With some single-machine deployments approaching 500,000 peers, we're talking about a reduction from 7 gigs of memory down to 700 megs of memory. In order to get rid of these per-peer allocations, this commit switches to using a list-based queueing approach. Currently GSO fragments are chained together using the skb->next pointer (the skb_list_* singly linked list approach), so we form the per-peer queue around the unused skb->prev pointer (which sort of makes sense because the links are pointing backwards). Use of skb_queue_* is not possible here, because that is based on doubly linked lists and spinlocks. Multiple cores can write into the queue at any given time, because its writes occur in the start_xmit path or in the udp_recv path. But reads happen in a single workqueue item per-peer, amounting to a multi-producer, single-consumer paradigm. The MPSC queue is implemented locklessly and never blocks. However, it is not linearizable (though it is serializable), with a very tight and unlikely race on writes, which, when hit (some tiny fraction of the 0.15% of partial adds on a fully loaded 16-core x86_64 system), causes the queue reader to terminate early. However, because every packet sent queues up the same workqueue item after it is fully added, the worker resumes again, and stopping early isn't actually a problem, since at that point the packet wouldn't have yet been added to the encryption queue. These properties allow us to avoid disabling interrupts or spinning. The design is based on Dmitry Vyukov's algorithm [1]. Performance-wise, ordinarily list-based queues aren't preferable to ringbuffers, because of cache misses when following pointers around. However, we *already* have to follow the adjacent pointers when working through fragments, so there shouldn't actually be any change there. A potential downside is that dequeueing is a bit more complicated, but the ptr_ring structure used prior had a spinlock when dequeueing, so all and all the difference appears to be a wash. Actually, from profiling, the biggest performance hit, by far, of this commit winds up being atomic_add_unless(count, 1, max) and atomic_ dec(count), which account for the majority of CPU time, according to perf. In that sense, the previous ring buffer was superior in that it could check if it was full by head==tail, which the list-based approach cannot do. But all and all, this enables us to get massive memory savings, allowing WireGuard to scale for real world deployments, without taking much of a performance hit. [1] http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I0ff1f036b5cff6d7be89042459e40a86e7e7840c
* device: do not generate ICMP for non-IP packetsJason A. Donenfeld2021-07-23
| | | | | | | | | | If skb->protocol doesn't match the actual skb->data header, it's probably not a good idea to pass it off to icmp{,v6}_ndo_send, which is expecting to reply to a valid IP packet. So this commit has that early mismatch case jump to a later error label. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I89049b5c67889f73600a0d219cb39ad47f5e65d9
* peer: put frequently used members above cache linesJason A. Donenfeld2021-07-23
| | | | | | | | | | | The is_dead boolean is checked for every single packet, while the internal_id member is used basically only for pr_debug messages. So it makes sense to hoist up is_dead into some space formerly unused by a struct hole, while demoting internal_api to below the lowest struct cache line. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I9d95581f52ed78f7d8a390d899f4178246cea9b5
* compat: redefine version constants for sublevel>=256Jason A. Donenfeld2021-07-23
| | | | | | | | | With the 4.4.256 and 4.9.256 kernels, the previous calculation for integer comparison overflowed. This commit redefines the broken constants to have more space for the sublevel. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I38b2b592ce769adbdca56955cac18dbd394d3258
* compat: remove unused version.h headersJason A. Donenfeld2021-07-23
| | | | | | | We don't need this in all files, and it just complicates things. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: Ie99fab32d50bb636ad7b99e38e05b9698e57e9d8
* wireguard: version: bumpJason A. Donenfeld2021-07-23
| | | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Bruno Martins <bgcngm@gmail.com> Change-Id: I4673c5e84efb6e0f8ab6835d090219149bd3f68b
* compat: skb_mark_not_on_list was backported to 4.14Jason A. Donenfeld2021-07-23
| | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I507bc2de13645d206a771b9e381e06bb453359d1
* compat: SYM_FUNC_* was backported to c8sJason A. Donenfeld2021-07-23
| | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I3bfa3b8eacc6d880e87749e705f3f96096ff9f37
* wireguard: version: bumpJason A. Donenfeld2021-07-23
| | | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Bruno Martins <bgcngm@gmail.com> Change-Id: Iff9be0ec593fd83cdc1819ef4a4531f22e0d65ab
* socket: remove bogus __be32 annotationJann Horn2021-07-23
| | | | | | | | | | | | | | | | | | | | | | | The endpoint->src_if4 has nothing to do with fixed-endian numbers; remove the bogus annotation. This was introduced in https://git.zx2c4.com/wireguard-monolithic-historical/commit?id=14e7d0a499a676ec55176c0de2f9fcbd34074a82 in the historical WireGuard repo because the old code used to zero-initialize multiple members as follows: endpoint->src4.s_addr = endpoint->src_if4 = fl.saddr = 0; Because fl.saddr is fixed-endian and an assignment returns a value with the type of its left operand, this meant that sparse detected an assignment between values of different endianness. Since then, this assignment was already split up into separate statements; just the cast survived. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I2b5bcadd8ecbcb27c97876f2b4a89cc73f65d79f
* global: avoid double unlikely() notation when using IS_ERR()Antonio Quartulli2021-07-23
| | | | | | | | | | | | | The definition of IS_ERR() already applies the unlikely() notation when checking the error status of the passed pointer. For this reason there is no need to have the same notation outside of IS_ERR() itself. Clean up code by removing redundant notation. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I7d6bb773908583c55db620a9c5263eb72476bf00
* simd: detect -rt kernels >= 5.4Jason A. Donenfeld2021-07-23
| | | | | | | | | | The 5.4 series of -rt kernels moved from PREEMPT_RT_BASE/PREEMPT_RT_FULL to PREEMPT_RT, so we have to account for it here. Otherwise users get scheduling-while-atomic splats. Reported-by: Erik Schuitema <erik@essd.nl> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: I307514d09a82b40d165eaf92c64f70e76c1bfae2
* compat: drop rhel 8.2, add rhel 8.4 supportJason A. Donenfeld2021-07-23
| | | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Change-Id: Ia04067608c91f6c39c71e2df1995ebb1ca73fb3c
* drivers: net: Modify WireGuard for backward compatBruno Martins2020-12-31
| | | | Change-Id: I1c8e130a514a7b0329f8df8099cc84f4cc8d5822
* UPSTREAM: wireguard: peerlookup: take lock before checking hash in replace ↵Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operation Eric's suggested fix for the previous commit's mentioned race condition was to simply take the table->lock in wg_index_hashtable_replace(). The table->lock of the hash table is supposed to protect the bucket heads, not the entires, but actually, since all the mutator functions are already taking it, it makes sense to take it too for the test to hlist_unhashed, as a defense in depth measure, so that it no longer races with deletions, regardless of what other locks are protecting individual entries. This is sensible from a performance perspective because, as Eric pointed out, the case of being unhashed is already the unlikely case, so this won't add common contention. And comparing instructions, this basically doesn't make much of a difference other than pushing and popping %r13, used by the new `bool ret`. More generally, I like the idea of locking consistency across table mutator functions, and this might let me rest slightly easier at night. Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/ Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 6147f7b1e90ff09bd52afc8b9206a7fcd133daf7) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3f3c44100fe655f3f278dc8a57cee1171ced4147
* UPSTREAM: wireguard: noise: take lock when removing handshake entry from tableJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric reported that syzkaller found a race of this variety: CPU 1 CPU 2 -------------------------------------------|--------------------------------------- wg_index_hashtable_replace(old, ...) | if (hlist_unhashed(&old->index_hash)) | | wg_index_hashtable_remove(old) | hlist_del_init_rcu(&old->index_hash) | old->index_hash.pprev = NULL hlist_replace_rcu(&old->index_hash, ...) | *old->index_hash.pprev | Syzbot wasn't actually able to reproduce this more than once or create a reproducer, because the race window between checking "hlist_unhashed" and calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or similar there helps make this demonstrable using this simple script: #!/bin/bash set -ex trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT ip link add wg0 type wireguard ip link add wg1 type wireguard wg set wg0 private-key <(wg genkey) listen-port 9999 wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1 wg set wg0 peer $(wg show wg1 public-key) ip link set wg0 up yes link set wg1 up | ip -force -batch - & pid1=$! yes link set wg1 down | ip -force -batch - & pid2=$! wait The fundumental underlying problem is that we permit calls to wg_index_ hashtable_remove(handshake.entry) without requiring the caller to take the handshake mutex that is intended to protect members of handshake during mutations. This is consistently the case with calls to wg_index_ hashtable_insert(handshake.entry) and wg_index_hashtable_replace( handshake.entry), but it's missing from a pertinent callsite of wg_ index_hashtable_remove(handshake.entry). So, this patch makes sure that mutex is taken. The original code was a little bit funky though, in the form of: remove(handshake.entry) lock(), memzero(handshake.some_members), unlock() remove(handshake.entry) The original intention of that double removal pattern outside the lock appears to be some attempt to prevent insertions that might happen while locks are dropped during expensive crypto operations, but actually, all callers of wg_index_hashtable_insert(handshake.entry) take the write lock and then explicitly check handshake.state, as they should, which the aforementioned memzero clears, which means an insertion should already be impossible. And regardless, the original intention was necessarily racy, since it wasn't guaranteed that something else would run after the unlock() instead of after the remove(). So, from a soundness perspective, it seems positive to remove what looks like a hack at best. The crash from both syzbot and from the script above is as follows: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline] RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174 Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5 RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010 RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000 R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000 R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500 FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820 wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline] wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/wireguard/20200908145911.4090480-1-edumazet@google.com/ Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 9179ba31367bcf481c3c79b5f028c94faad9f30a) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I5d1ca2bb77b61c654b2f56660a6b1c3f5fb2446f
* UPSTREAM: wireguard: queueing: make use of ip_tunnel_parse_protocolJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | Now that wg_examine_packet_protocol has been added for general consumption as ip_tunnel_parse_protocol, it's possible to remove wg_examine_packet_protocol and simply use the new ip_tunnel_parse_protocol function directly. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 1a574074ae7d1d745c16f7710655f38a53174c27) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ia65b9100a5647c0084108b732f445f3d8a146c3a
* UPSTREAM: wireguard: receive: account for napi_gro_receive never returning ↵Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | GRO_DROP The napi_gro_receive function no longer returns GRO_DROP ever, making handling GRO_DROP dead code. This commit removes that dead code. Further, it's not even clear that device drivers have any business in taking action after passing off received packets; that's arguably out of their hands. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Fixes: 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in napi_gro_receive()") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit df08126e3833e9dca19e2407db5f5860a7c194fb) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I3fb4902fcb0e8bb522317518120651defed7804f
* UPSTREAM: wireguard: device: avoid circular netns referencesJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | Before, we took a reference to the creating netns if the new netns was different. This caused issues with circular references, with two wireguard interfaces swapping namespaces. The solution is to rather not take any extra references at all, but instead simply invalidate the creating netns pointer when that netns is deleted. In order to prevent this from happening again, this commit improves the rough object leak tracking by allowing it to account for created and destroyed interfaces, aside from just peers and keys. That then makes it possible to check for the object leak when having two interfaces take a reference to each others' namespaces. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 900575aa33a3eaaef802b31de187a85c4a4b4bd0) Bug: 152722841 [Jason: netlink notifier uses exit instead of pre_exit] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iea52fe3ca0e41318c392d9e91edb1856de6c9528
* UPSTREAM: wireguard: noise: do not assign initiation time in if conditionFrank Werner-Krippendorf2020-12-31
| | | | | | | | | | | | | | | Fixes an error condition reported by checkpatch.pl which caused by assigning a variable in an if condition in wg_noise_handshake_consume_ initiation(). Signed-off-by: Frank Werner-Krippendorf <mail@hb9fxq.ch> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 558b353c9c2a717509f291c066c6bd8f5f5e21be) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ib44bfe39c4897ba774732e148efb73fbd9d6409c
* UPSTREAM: wireguard: noise: separate receive counter from send counterJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In "wireguard: queueing: preserve flow hash across packet scrubbing", we were required to slightly increase the size of the receive replay counter to something still fairly small, but an increase nonetheless. It turns out that we can recoup some of the additional memory overhead by splitting up the prior union type into two distinct types. Before, we used the same "noise_counter" union for both sending and receiving, with sending just using a simple atomic64_t, while receiving used the full replay counter checker. This meant that most of the memory being allocated for the sending counter was being wasted. Since the old "noise_counter" type increased in size in the prior commit, now is a good time to split up that union type into a distinct "noise_replay_ counter" for receiving and a boring atomic64_t for sending, each using neither more nor less memory than required. Also, since sometimes the replay counter is accessed without necessitating additional accesses to the bitmap, we can reduce cache misses by hoisting the always-necessary lock above the bitmap in the struct layout. We also change a "noise_replay_counter" stack allocation to kmalloc in a -DDEBUG selftest so that KASAN doesn't trigger a stack frame warning. All and all, removing a bit of abstraction in this commit makes the code simpler and smaller, in addition to the motivating memory usage recuperation. For example, passing around raw "noise_symmetric_key" structs is something that really only makes sense within noise.c, in the one place where the sending and receiving keys can safely be thought of as the same type of object; subsequent to that, it's important that we uniformly access these through keypair->{sending,receiving}, where their distinct roles are always made explicit. So this patch allows us to draw that distinction clearly as well. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit a9e90d9931f3a474f04bab782ccd9d77904941e9) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I164a6693651e8238fd24961939b1ade74f5f2d52
* UPSTREAM: wireguard: queueing: preserve flow hash across packet scrubbingJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's important that we clear most header fields during encapsulation and decapsulation, because the packet is substantially changed, and we don't want any info leak or logic bug due to an accidental correlation. But, for encapsulation, it's wrong to clear skb->hash, since it's used by fq_codel and flow dissection in general. Without it, classification does not proceed as usual. This change might make it easier to estimate the number of innerflows by examining clustering of out of order packets, but this shouldn't open up anything that can't already be inferred otherwise (e.g. syn packet size inference), and fq_codel can be disabled anyway. Furthermore, it might be the case that the hash isn't used or queried at all until after wireguard transmits the encrypted UDP packet, which means skb->hash might still be zero at this point, and thus no hash taken over the inner packet data. In order to address this situation, we force a calculation of skb->hash before encrypting packet data. Of course this means that fq_codel might transmit packets slightly more out of order than usual. Toke did some testing on beefy machines with high quantities of parallel flows and found that increasing the reply-attack counter to 8192 takes care of the most pathological cases pretty well. Reported-by: Dave Taht <dave.taht@gmail.com> Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@toke.dk> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit c78a0b4a78839d572d8a80f6a62221c0d7843135) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I12466d816c672380dc20e395f72e6ddc24c45d13
* UPSTREAM: wireguard: noise: read preshared key while taking lockJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | Prior we read the preshared key after dropping the handshake lock, which isn't an actual crypto issue if it races, but it's still not quite correct. So copy that part of the state into a temporary like we do with the rest of the handshake state variables. Then we can release the lock, operate on the temporary, and zero it out at the end of the function. In performance tests, the impact of this was entirely unnoticable, probably because those bytes are coming from the same cacheline as other things that are being copied out in the same manner. Reported-by: Matt Dunwoodie <ncon@noconroy.net> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit bc67d371256f5c47d824e2eec51e46c8d62d022e) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I41c2eeee65162fac791cd27812c6b9ba5a9eeec2
* UPSTREAM: wireguard: send/receive: use explicit unlikely branch instead of ↵Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | implicit coalescing It's very unlikely that send will become true. It's nearly always false between 0 and 120 seconds of a session, and in most cases becomes true only between 120 and 121 seconds before becoming false again. So, unlikely(send) is clearly the right option here. What happened before was that we had this complex boolean expression with multiple likely and unlikely clauses nested. Since this is evaluated left-to-right anyway, the whole thing got converted to unlikely. So, we can clean this up to better represent what's going on. The generated code is the same. Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 243f2148937adc72bcaaa590d482d599c936efde) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I0cc67841703296acab0892503d34d3ce84c40ddb
* UPSTREAM: wireguard: selftests: initalize ipv6 members to NULL to squelch ↵Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang warning Without setting these to NULL, clang complains in certain configurations that have CONFIG_IPV6=n: In file included from drivers/net/wireguard/ratelimiter.c:223: drivers/net/wireguard/selftest/ratelimiter.c:173:34: error: variable 'skb6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:123:29: note: initialize the variable 'skb6' to silence this warning struct sk_buff *skb4, *skb6; ^ = NULL drivers/net/wireguard/selftest/ratelimiter.c:173:40: error: variable 'hdr6' is uninitialized when used here [-Werror,-Wuninitialized] ret = timings_test(skb4, hdr4, skb6, hdr6, &test_count); ^~~~ drivers/net/wireguard/selftest/ratelimiter.c:125:22: note: initialize the variable 'hdr6' to silence this warning struct ipv6hdr *hdr6; ^ We silence this warning by setting the variables to NULL as the warning suggests. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 4fed818ef54b08d4b29200e416cce65546ad5312) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I8de706d1ba2563560b60df3b9a08b005bb313188
* UPSTREAM: wireguard: send/receive: cond_resched() when processing worker ↵Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | ringbuffers Users with pathological hardware reported CPU stalls on CONFIG_ PREEMPT_VOLUNTARY=y, because the ringbuffers would stay full, meaning these workers would never terminate. That turned out not to be okay on systems without forced preemption, which Sultan observed. This commit adds a cond_resched() to the bottom of each loop iteration, so that these workers don't hog the core. Note that we don't need this on the napi poll worker, since that terminates after its budget is expended. Suggested-by: Sultan Alsawaf <sultan@kerneltoast.com> Reported-by: Wang Jian <larkwang@gmail.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 4005f5c3c9d006157ba716594e0d70c88a235c5e) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I95440398070e8f6bf11894bd967a3fc4e06c29a8
* UPSTREAM: wireguard: socket: remove errant restriction on looping to selfJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's already possible to create two different interfaces and loop packets between them. This has always been possible with tunnels in the kernel, and isn't specific to wireguard. Therefore, the networking stack already needs to deal with that. At the very least, the packet winds up exceeding the MTU and is discarded at that point. So, since this is already something that happens, there's no need to forbid the not very exceptional case of routing a packet back to the same interface; this loop is no different than others, and we shouldn't special case it, but rather rely on generic handling of loops in general. This also makes it easier to do interesting things with wireguard such as onion routing. At the same time, we add a selftest for this, ensuring that both onion routing works and infinite routing loops do not crash the kernel. We also add a test case for wireguard interfaces nesting packets and sending traffic between each other, as well as the loop in this case too. We make sure to send some throughput-heavy traffic for this use case, to stress out any possible recursion issues with the locks around workqueues. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit b673e24aad36981f327a6570412ffa7754de8911) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I6e5cb7a9f76372af8e03b54e6c0b0d5d20787604
* UPSTREAM: wireguard: receive: use tunnel helpers for decapsulating ECN markingsToke Høiland-Jørgensen2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | WireGuard currently only propagates ECN markings on tunnel decap according to the old RFC3168 specification. However, the spec has since been updated in RFC6040 to recommend slightly different decapsulation semantics. This was implemented in the kernel as a set of common helpers for ECN decapsulation, so let's just switch over WireGuard to using those, so it can benefit from this enhancement and any future tweaks. We do not drop packets with invalid ECN marking combinations, because WireGuard is frequently used to work around broken ISPs, which could be doing that. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Reported-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> Cc: Dave Taht <dave.taht@gmail.com> Cc: Rodney W. Grimes <ietf@gndrsh.dnsmgr.net> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit eebabcb26ea1e3295704477c6cd4e772c96a9559) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I422eb94327f92d9436e998588e62041cb01902b9
* UPSTREAM: wireguard: queueing: cleanup ptr_ring in error path of ↵Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | packet_queue_init Prior, if the alloc_percpu of packet_percpu_multicore_worker_alloc failed, the previously allocated ptr_ring wouldn't be freed. This commit adds the missing call to ptr_ring_cleanup in the error case. Reported-by: Sultan Alsawaf <sultan@kerneltoast.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 130c58606171326c81841a49cc913cd354113dd9) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iad1c5bc2be2459b3dbe4791a5fec09a9403d7d56
* UPSTREAM: wireguard: send: remove errant newline from packet_encrypt_workerSultan Alsawaf2020-12-31
| | | | | | | | | | | | | | This commit removes a useless newline at the end of a scope, which doesn't add anything in the way of organization or readability. Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit d6833e42786e050e7522d6a91a9361e54085897d) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I47c458befe22630e82944f362a56585bbd5e2b7b
* UPSTREAM: wireguard: noise: error out precomputed DH during handshake rather ↵Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | than config We precompute the static-static ECDH during configuration time, in order to save an expensive computation later when receiving network packets. However, not all ECDH computations yield a contributory result. Prior, we were just not letting those peers be added to the interface. However, this creates a strange inconsistency, since it was still possible to add other weird points, like a valid public key plus a low-order point, and, like points that result in zeros, a handshake would not complete. In order to make the behavior more uniform and less surprising, simply allow all peers to be added. Then, we'll error out later when doing the crypto if there's an issue. This also adds more separation between the crypto layer and the configuration layer. Discussed-with: Mathias Hall-Andersen <mathias@hall-andersen.dk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 11a7686aa99c7fe4b3f80f6dcccd54129817984d) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Iae7e1688340109decefa565b848b97ce444c20b6
* UPSTREAM: wireguard: receive: remove dead code from default packet type caseJason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | The situation in which we wind up hitting the default case here indicates a major bug in earlier parsing code. It is not a usual thing that should ever happen, which means a "friendly" message for it doesn't make sense. Rather, replace this with a WARN_ON, just like we do earlier in the file for a similar situation, so that somebody sends us a bug report and we can fix it. Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 2b8765c52db24c0fbcc81bac9b5e8390f2c7d3c8) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I40e109e5bac6541d309b588958ef0e09b3b9c19b
* UPSTREAM: wireguard: queueing: account for skb->protocol==0Jason A. Donenfeld2020-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We carry out checks to the effect of: if (skb->protocol != wg_examine_packet_protocol(skb)) goto err; By having wg_skb_examine_untrusted_ip_hdr return 0 on failure, this means that the check above still passes in the case where skb->protocol is zero, which is possible to hit with AF_PACKET: struct sockaddr_pkt saddr = { .spkt_device = "wg0" }; unsigned char buffer[5] = { 0 }; sendto(socket(AF_PACKET, SOCK_PACKET, /* skb->protocol = */ 0), buffer, sizeof(buffer), 0, (const struct sockaddr *)&saddr, sizeof(saddr)); Additional checks mean that this isn't actually a problem in the code base, but I could imagine it becoming a problem later if the function is used more liberally. I would prefer to fix this by having wg_examine_packet_protocol return a 32-bit ~0 value on failure, which will never match any value of skb->protocol, which would simply change the generated code from a mov to a movzx. However, sparse complains, and adding __force casts doesn't seem like a good idea, so instead we just add a simple helper function to check for the zero return value. Since wg_examine_packet_protocol itself gets inlined, this winds up not adding an additional branch to the generated code, since the 0 return value already happens in a mergable branch. Reported-by: Fabian Freyer <fabianfreyer@radicallyopensecurity.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit a5588604af448664e796daf3c1d5a4523c60667b) Bug: 152722841 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I55d75c41f92b0eec88553ac5c2910a8400c427b4