Friday, June 1, 2018

Windows 2008 RTM Network Performance Tuning

Windows 2008 RTM Network Performance Tuning:
These NIC options are collectively known as the "TCP Chimney". Originally these options were designed to alleviate a servers compute CPU(s) from some of the stress of networking by offloading some functionality to the CPU on the NIC itself. Circa. 10 years ago, this caused issues because the NIC vendors made a bad job of leveraging the Microsoft APIs. In more recent times, this improved but the Microsoft APIs themselves were poorly implemented on Windows 2008 RTM (Vista kernel). I am not sure I would recommend baking the following recommendations in to a build image, but certainly for troubleshooting poor performance or dropped packets, these parameters can be useful. Note:


  • You may need to experiment.
  • These specifically worded parameters apply to the VMWare VMXNet3 NIC but should be found on all NICS.
  • Offloading networking traffic to the compute CPU assumes that the compute CPU is powerful enough

IPv4 Checksum Offload
When data comes in through a network, the data is checked against a checksum (or validation code) in the headers in the packets it was delivered in. If the data and checksum don't match, the packet is determined to be bad and has to be retransmitted. To speed things up, some network cards can "offload" the checksumming, i.e., perform the checksumming on the network card itself, rather than leave the job to the CPU. This frees up the CPU to do that much more work on its own, and on a server with extremely high network throughput, that much CPU savings can add up.
Recommendation: Disable

IPv4 TSO Offload
Using TSO and LRO on physical and virtual machine NICs improves the performance of ESX/ESXi hosts by reducing the CPU overhead for TCP/IP network operations. The host uses more CPU cycles to run applications. If TSO is enabled on the transmission path, the NIC divides larger data chunks into TCP segments. If TSO is disabled, the CPU performs segmentation for TCP/IP.
Note: TSO is referred to as LSO (Large Segment Offload or Large Send Offload) in the latest VMXNET3 driver attributes.
Recommendation: Disable

Large Send Offload V2 (IPv4)
Is a feature on modern Ethernet adapters that allows the TCP\IP network stack to build a large TCP message of up to 64KB in length before sending to the Ethernet adapter. Then the hardware on the Ethernet adapter — what I’ll call the LSO engine — segments it into smaller data packets (known as “frames” in Ethernet terminology) that can be sent over the wire. This is up to 1500 bytes for standard Ethernet frames and up to 9000 bytes for jumbo Ethernet frames.  In return, this frees up the server CPU from having to handle segmenting large TCP messages into smaller packets that will fit inside the supported frame size
Recommendation: Disable

Offload IP Options
Miscellaneous IP options
Recommendation: Disable

Offload TCP Options
Miscellaneous TCP Options
Recommendation: Disable

Receive Side Scalling
RSS enables driver stacks to process send and receive-side data for a given connection on the same CPU. Typically, an overlying driver (for example, TCP) sends part of a data block and waits for an acknowledgment before sending the balance of the data. The acknowledgment then triggers subsequent send requests. The RSS indirection table identifies a particular CPU for the receive data processing. By default, the send processing runs on the same CPU if it is triggered by the receive acknowledgment. A driver can also specify the CPU (for example, if a timer is used).
Recommended: Enable

TCP Checksum Offload (IPv4)
The TCP header contains a 16-bit checksum field which is used to verify the integrity of the header and data. For performance reasons the checksum calculation on the transmit side and verification on the receive side may be offloaded from the operating system to the network adapter. 
Recommendation: Disable

Rx Ring #1
 Modern and performance/server grade network interface have the capability of using transmit and receive buffer description ring into the main memory. They use direct memory access (DMA) to transfer packets from the main memory to carry packets independently from the CPU. The usual default buffering values for regular desktop NICs are 256 or 512 bytes. High performances NICs can achieve up to 4096 and/or 8192 bytes.
Recommendation: 4096

Small Rx Buffers
Where Rx Ring #1 defines the size of each buffers, 'Small Rx Buffers' defines how many buffers there are.
Recommendation: 8192

Cheers!

No comments:

Post a Comment