summaryrefslogtreecommitdiff
path: root/net (follow)
Commit message (Collapse)AuthorAge
...
| * | | | | | | | | | | | | | nl802154: add support for security layerAlexander Aring2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for accessing mac802154 llsec implementation over nl802154. I added for a new Kconfig entry to provide this functionality CONFIG_IEEE802154_NL802154_EXPERIMENTAL. This interface is still in development. It provides to change security parameters and add/del/dump entries of security tables. Later we can add also a get to get an entry by unique identifier. Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | nl802154: use nla_get_le64 for get extended addrAlexander Aring2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch uses the nla_get_le64 function instead of doing a force converting to le64. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | Bluetooth: Add hci_cmd_sync functionLoic Poulain2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send a HCI command and wait for command complete event. This function serializes the requests by grabbing the req_lock. Signed-off-by: Loic Poulain <loic.poulain@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | mac802154: iface: assume big endian for af_packetAlexander Aring2015-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The callback "create" and "parse" from header_ops are called from netdev core upper-layer functionality, like af_packet. These callbacks assumes big endian for addresses and we should not introduce a special byteordering handling for ieee802154 over af_packet in userspace. We have an identical issue with setting the mac address which also assumes big endian byteordering. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | Bluetooth: Add BT_WARN and bt_dev_warn logging macrosFrederic Danis2015-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add warning logging macros to bluetooth subsystem logs. Signed-off-by: Frederic Danis <frederic.danis@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | mac802154: tx: add warning if MTU exceedsAlexander Aring2015-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sending over AF_PACKET RAW sockets we can sending frames which exceeds MTU size. To handling it correct we need to change things in AF_PACKET which knows on RAW sockets an additional FCS is set by hardware or mac802154 transmit functionality. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | ieee802154: change needed headroom/tailroomAlexander Aring2015-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch cleanups needed_headroom, needed_tailroom and hard_header_len fields for wpan and lowpan interfaces. For wpan interfaces the worst case mac header len should be part of needed_headroom, currently this is set as hard_header_len, but hard_header_len should be set to the minimum header length which xmit call assumes and this is the minimum frame length of 802.15.4. The hard_header_len value will check inside send callbacl of AF_PACKET raw sockets. For lowpan interfaces, if fragmentation isn't needed the skb will call dev_hard_header for 802154 layer and queue it afterwards. This happens without new skb allocation, so we need the same headroom and tailroom lengths like 802154 inside 802154 6lowpan layer. At least we assume as minimum header length an ipv6 header size. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | ieee802154: introduce wpan_dev_header_opsAlexander Aring2015-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current header_ops callback structure of net device are used mostly from 802.15.4 upper-layers. Because this callback structure is a very generic one, which is also used by e.g. DGRAM AF_PACKET sockets, we can't make this callback structure 802.15.4 specific which is currently is. I saw the smallest "constraint" for calling this callback with dev_hard_header/dev_parse_header by AF_PACKET which assign a 8 byte array for address void pointers. Currently 802.15.4 specific protocols like af802154 and 6LoWPAN will assign the "struct ieee802154_addr" as these parameters which is greater than 8 bytes. The current callback implementation for header_ops.create assumes always a complete "struct ieee802154_addr" which AF_PACKET can't never handled and is greater than 8 bytes. For that reason we introduce now a "generic" create/parse header_ops callback which allows handling with intra-pan extended addresses only. This allows a small use-case with AF_PACKET to send "somehow" a valid dataframe over DGRAM. To keeping the current dev_hard_header behaviour we introduce a similar callback structure "wpan_dev_header_ops" which contains 802.15.4 specific upper-layer header creation functionality, which can be called by wpan_dev_hard_header. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | ieee802154: header_ops: fix frame control settingAlexander Aring2015-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes upper-layer protocols wants to generate a new mac header by filling "struct ieee802154_hdr" only. These upper-layers sets for the address settings the source and dest fields, but not the fc fields for indicate the source and dest address mode. This patch changes the "ieee802154_hdr_push" function so the fc address fields are set according the source and dest fields of "struct ieee802154_hdr". Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | | | | | | | | | | | | mac802154: llsec: fix device deletion from listAlexander Aring2015-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a missing list_del when a device description will be deleted. Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* | | | | | | | | | | | | | | net: synack packets can be attached to request socketsEric Dumazet2015-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | selinux needs few changes to accommodate fact that SYNACK messages can be attached to a request socket, lacking sk_security pointer (Only syncookies are still attached to a TCP_LISTEN socket) Adds a new sk_listener() helper, and use it in selinux and sch_fq Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported by: kernel test robot <ying.huang@linux.intel.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Eric Paris <eparis@parisplace.org> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | bpf: fix cb access in socket filter programsAlexei Starovoitov2015-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | eBPF socket filter programs may see junk in 'u32 cb[5]' area, since it could have been used by protocol layers earlier. For socket filter programs used in af_packet we need to clean 20 bytes of skb->cb area if it could be used by the program. For programs attached to TCP/UDP sockets we need to save/restore these 20 bytes, since it's used by protocol layers. Remove SK_RUN_FILTER macro, since it's no longer used. Long term we may move this bpf cb area to per-cpu scratch, but that requires addition of new 'per-cpu load/store' instructions, so not suitable as a short term fix. Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net/sched: make sch_blackhole.c explicitly non-modularPaul Gortmaker2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Kconfig currently controlling compilation of this code is: net/sched/Kconfig:menuconfig NET_SCHED net/sched/Kconfig: bool "QoS and/or fair queueing" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We can change to one of the other priority initcalls (subsys?) at any later date, if desired. We also delete the MODULE_LICENSE tag since all that information is already contained at the top of the file in the comments. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net/dcb: make dcbnl.c explicitly non-modularPaul Gortmaker2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Kconfig currently controlling compilation of this code is: net/dcb/Kconfig:config DCB net/dcb/Kconfig: bool "Data Center Bridging support" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We can change to one of the other priority initcalls (subsys?) at any later date, if desired. We also delete the MODULE_LICENSE tag etc. since all that information is (or is now) already contained at the top of the file in the comments. Cc: "David S. Miller" <davem@davemloft.net> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Anish Bhatt <anish@chelsio.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Shani Michaeli <shanim@mellanox.com> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net/core: make sock_diag.c explicitly non-modularPaul Gortmaker2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Makefile currently controlling compilation of this code lists it under "obj-y" ...meaning that it currently is not being built as a module by anyone. Lets remove the modular code that is essentially orphaned, so that when reading the driver there is no doubt it is builtin-only. Since module_init translates to device_initcall in the non-modular case, the init ordering remains unchanged with this commit. We can change to one of the other priority initcalls (subsys?) at any later date, if desired. We can't remove module.h since the file uses other module related stuff even though it is not modular itself. We move the information from the MODULE_LICENSE tag to the top of the file, since that information is not captured anywhere else. The MODULE_ALIAS_NET_PF_PROTO becomes a no-op in the non modular case, so it is removed. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Craig Gallek <kraig@google.com> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net/core: lockdep_rtnl_is_held can be booleanYaowei Bai2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes lockdep_rtnl_is_held return bool due to this particular function only using either one or zero as its return value. In another patch lockdep_is_held is also made return bool. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net/dccp: dccp_bad_service_code can be booleanYaowei Bai2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes dccp_bad_service_code return bool due to these particular functions only using either one or zero as their return value. dccp_list_has_service is also been made return bool in this patchset. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net/nfnetlink: lockdep_nfnl_is_held can be booleanYaowei Bai2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes lockdep_nfnl_is_held return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net/netlink: lockdep_genl_is_held can be booleanYaowei Bai2015-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes lockdep_genl_is_held return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | bpf: split state from prandom_u32() and consolidate {c, e}BPF prngsDaniel Borkmann2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While recently arguing on a seccomp discussion that raw prandom_u32() access shouldn't be exposed to unpriviledged user space, I forgot the fact that SKF_AD_RANDOM extension actually already does it for some time in cBPF via commit 4cd3675ebf74 ("filter: added BPF random opcode"). Since prandom_u32() is being used in a lot of critical networking code, lets be more conservative and split their states. Furthermore, consolidate eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF, bpf_get_prandom_u32() was only accessible for priviledged users, but should that change one day, we also don't want to leak raw sequences through things like eBPF maps. One thought was also to have own per bpf_prog states, but due to ABI reasons this is not easily possible, i.e. the program code currently cannot access bpf_prog itself, and copying the rnd_state to/from the stack scratch space whenever a program uses the prng seems not really worth the trouble and seems too hacky. If needed, taus113 could in such cases be implemented within eBPF using a map entry to keep the state space, or get_random_bytes() could become a second helper in cases where performance would not be critical. Both sides can trigger a one-time late init via prandom_init_once() on the shared state. Performance-wise, there should even be a tiny gain as bpf_user_rnd_u32() saves one function call. The PRNG needs to live inside the BPF core since kernels could have a NET-less config as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net: move net_get_random_once to libHannes Frederic Sowa2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no good reason why users outside of networking should not be using this facility, f.e. for initializing their seeds. Therefore, make it accessible from there as get_random_once(). Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | net: Do not drop to make_route if oif is l3mdevDavid Ahern2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit deaa0a6a930 ("net: Lookup actual route when oif is VRF device") exposed a bug in __ip_route_output_key_hash for VRF devices: on FIB lookup failure if the oif is specified the current logic drops to make_route on the assumption that the route tables are wrong. For VRF/L3 master devices this leads to wrong dst entries and route lookups. For example: $ ip route ls table vrf-red unreachable default broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 $ ip route get oif vrf-red 1.1.1.1 1.1.1.1 dev vrf-red src 10.0.0.2 cache With this patch: $ ip route get oif vrf-red 1.1.1.1 RTNETLINK answers: No route to host which is the correct response based on the default route Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | bpf, skb_do_redirect: clear sender_cpu before xmitDaniel Borkmann2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit c29390c6dfee ("xps: must clear sender_cpu before forwarding"), we also need to clear the skb->sender_cpu when moving from RX to TX via skb_do_redirect() due to the shared location of napi_id (used on RX) and sender_cpu (used on TX). Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | Merge branch 'net/rds/4.3-v3' of ↵David S. Miller2015-10-08
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux Santosh Shilimkar says: ==================== RDS: connection scalability and performance improvements [v4] Re-sending the same patches from v3 again since my repost of patch 05/14 from v3 was whitespace damaged. [v3] Updated patch "[PATCH v2 05/14] RDS: defer the over_batch work to send worker" as per David Miller's comment [4] to avoid the magic value usage. Patch now makes use of already available but unused send_batch_count module parameter. Rest of the patches are same as earlier version v2 [3] [v2]: Dropped "[PATCH 05/15] RDS: increase size of hash-table to 8K" from earlier version [1]. I plan to address the hash table scalability using re-sizable hash tables as suggested by David Laight and David Miller [2] This series addresses RDS connection bottlenecks on massive workloads and improve the RDMA performance almost by 3X. RDS TCP also gets a small gain of about 12%. RDS is being used in massive systems with high scalability where several hundred thousand end points and tens of thousands of local processes are operating in tens of thousand sockets. Being RC(reliable connection), socket bind and release happens very often and any inefficiencies in bind hash look ups hurts the overall system performance. RDS bin hash-table uses global spin-lock which is the biggest bottleneck. To make matter worst, it uses rcu inside global lock for hash buckets. This is being addressed by simply using per bucket rw lock which makes the locking simple and very efficient. The hash table size is still an issue and I plan to address it by using re-sizable hash tables as suggested on the list. For RDS RDMA improvement, the completion handling is revamped so that we can do batch completions. Both send and receive completion handlers are split logically to achieve the same. RDS 8K messages being one of the key usecase, mr pool is adapted to have the 8K mrs along with default 1M mrs. And while doing this, few fixes and couple of bottlenecks seen with rds_sendmsg() are addressed. Series applies against 4.3-rc1 as well net-next. Its tested on Oracle hardware with IB fabric for both bcopy as well as RDMA mode. RDS TCP is tested with iXGB NIC. Like last time, iWARP transport is untested with these changes. The patchset is also available at below git repo: git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git net/rds/4.3-v3 As a side note, the IB HCA driver I used for testing misses at least 3 important patches in upstream to see the full blown IB performance and am hoping to get that in mainline with help of them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | | | | | | | | RDS: IB: split mr pool to improve 8K messages performanceSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 8K message sizes are pretty important usecase for RDS current workloads so we make provison to have 8K mrs available from the pool. Based on number of SG's in the RDS message, we pick a pool to use. Also to make sure that we don't under utlise mrs when say 8k messages are dominating which could lead to 8k pull being exhausted, we fall-back to 1m pool till 8k pool recovers for use. This helps to at least push ~55 kB/s bidirectional data which is a nice improvement. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: IB: use max_mr from HCA caps than max_fmrSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All HCA drivers seems to popullate max_mr caps and few of them do both max_mr and max_fmr. Hence update RDS code to make use of max_mr. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: IB: mark rds_ib_fmr_wq staticSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix below warning by marking rds_ib_fmr_wq static net/rds/ib_rdma.c:87:25: warning: symbol 'rds_ib_fmr_wq' was not declared. Should it be static? Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: IB: use already available pool handle from ibmrSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rds_ib_mr already keeps the pool handle which it associates with. Lets use that instead of round about way of fetching it from rds_ib_device. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: IB: fix the rds_ib_fmr_wq kick callSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDS IB mr pool has its own workqueue 'rds_ib_fmr_wq', so we need to use queue_delayed_work() to kick the work. This was hurting the performance since pool maintenance was less often triggered from other path. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: IB: handle rds_ibdev release case instead of crashing the kernelSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just in case we are still handling the QP receive completion while the rds_ibdev is released, drop the connection instead of crashing the kernel. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: IB: split send completion handling and do batch ackSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to what we did with receive CQ completion handling, we split the transmit completion handler so that it lets us implement batched work completion handling. We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to identify the send vs receive completion event handler invocation. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: IB: ack more receive completions to improve performanceSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For better performance, we split the receive completion IRQ handler. That lets us acknowledge several WCE events in one call. We also limit the WC to max 32 to avoid latency. Acknowledging several completions in one call instead of several calls each time will provide better performance since less mutual exclusion locks are being performed. In next patch, send completion is also split which re-uses the poll_cq() and hence the code is moved to ib_cm.c Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULLSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Transport indepedent rds_sendmsg(), we shouldn't make decisions based on RDS_LL_SEND_FULL which is used to manage the ring for RDMA based transports. We can safely issue rds_send_xmit() and the using its return value take decision on deferred work. This will also fix the scenario where at times we are seeing connections stuck with the LL_SEND_FULL bit getting set and never cleared. We kick krdsd after any time we see -ENOMEM or -EAGAIN from the ring allocation code. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: defer the over_batch work to send workerSantosh Shilimkar2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current process gives up if its send work over the batch limit. The work queue will get kicked to finish off any other requests. This fixes remainder condition from commit 443be0e5affe ("RDS: make sure not to loop forever inside rds_send_xmit"). The restart condition is only for the case where we reached to over_batch code for some other reason so just retrying again before giving up. While at it, make sure we use already available 'send_batch_count' parameter instead of magic value. The batch count threshold value of 1024 came via commit 443be0e5affe ("RDS: make sure not to loop forever inside rds_send_xmit"). The idea is to process as big a batch as we can but at the same time we don't hold other waiting processes for send. Hence back-off after the send_batch_count limit (1024) to avoid soft-lock ups. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: Use per-bucket rw lock for bind hash-tableSantosh Shilimkar2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One global lock protecting hash-tables with 1024 buckets isn't efficient and it shows up in a massive systems with truck loads of RDS sockets serving multiple databases. The perf data clearly highlights the contention on the rw lock in these massive workloads. When the contention gets worse, the code gets into a state where it decides to back off on the lock. So while it has disabled interrupts, it sits and backs off on this lock get. This causes the system to become sluggish and eventually all sorts of bad things happen. The simple fix is to move the lock into the hash bucket and use per-bucket lock to improve the scalability. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: fix rds_sock reference bug while doing bindSantosh Shilimkar2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One need to take rds socket reference while using it and release it once done with it. rds_add_bind() code path does not do that so lets fix it. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: make socket bind/release locking scheme simple and more efficientSantosh Shilimkar2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RDS bind and release locking scheme is very inefficient. It uses RCU for maintaining the bind hash-table which is great but it also needs to hold spinlock for [add/remove]_bound(). So overall usecase, the hash-table concurrent speedup doesn't pay off. In fact blocking nature of synchronize_rcu() makes the RDS socket shutdown too slow which hurts RDS performance since connection shutdown and re-connect happens quite often to maintain the RC part of the protocol. So we make the locking scheme simpler and more efficient by replacing spin_locks with reader/writer locks and getting rid off rcu for bind hash-table. In subsequent patch, we also covert the global lock with per-bucket lock to reduce the global lock contention. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
| * | | | | | | | | | | | | | | RDS: use kfree_rcu in rds_ib_remove_ipaddrSantosh Shilimkar2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | synchronize_rcu() slowing down un-necessarily the socket shutdown path. It is used just kfree() the ip addresses in rds_ib_remove_ipaddr() which is perfect usecase for kfree_rcu(); So lets use that to gain some speedup. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
* | | | | | | | | | | | | | | | dst: Pass net into dst->outputEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv4, ipv6: Pass net into ip_local_out and ip6_local_outEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_outEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv4: Cache net in ip_build_and_send_pkt and ip_queue_xmitEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compute net and store it in a variable in the functions ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be recomputed next time it is needed. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv4: Cache net in iptunnel_xmitEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Store net in a variable in ip_tunnel_xmit so it does not need to be recomputed when it is used again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv6: Merge ip6_local_out and ip6_local_out_skEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv6: Merge __ip6_local_out and __ip6_local_out_skEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk __ip6_local_out and remove the previous __ip6_local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv4: Merge ip_local_out and ip_local_out_skEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | ipv4: Merge __ip_local_out and __ip_local_out_skEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | dst: Pass a sk into .local_outEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | net: Pass net into dst_output and remove dst_output_okfnEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | | | | | xfrm: Only compute net once in xfrm_policy_queue_processEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>