11 Jan

Linux network performance tuning

Why we need a fine tuning of network settings?

Usually the default network parameters supplied along with the OS should be able to handle the regular traffic. But if you are managing a high traffic server and if you are experiencing sluggishness in accessing your application, then it is recommended to do a linux network performance tuning of your linux operating system.

TCP Connection Establishment

As you know, web servers\application servers generally use Transmission Control Protocol(TCP) for their client-server communication. TCP is a connection oriented protocol, which means, the sender and receiver needs to establish a reliable connection between them to transmit the data. As the first step of establishing the connection, the sender will send a connection request to the receiver. If the receiver is ready to accept the data, then it will send back an acknowledgement(ACK) back with SYN bit set. Now the sender will acknowledges the receiver’s initial sequence Number and its ACK. Now the sender will start its data transfer.

performance tuningFlow Control & Window Scaling

Since the sender and receiver may not be having same network speed, the TCP uses a flow control mechanism named sliding window protocol, so that the sender and receiver will be transmitting the data at same rate. The receiver and the sender will exchange the information about the amount of data, they can accept, using a TCP segment field called receive window. The receiver updates the filed with the amount of data, that it can accept.

Upon seeing the value, the sender will adjust is data transmission, so that it will not send data above this window size, until an acknowledgement is received from the receiver. Once an acknowledgement is received and once the new receive window size is declared by the receiver, the sender can transmit the next set of data. Earlier, the maximum receive window size that can be mentioned in a TCP frame was 65,535 bytes. Now using a new feature called, Window Scaling, the limit is increased to a maximum of 1,073,725,440 bytes(1Gb)

Bandwidth Delay Product – BDP, the bits of data in transit between hosts is equal to Bandwidth * RTT

or in other words,

BDP (bytes) = total bandwidth (KBytes/sec) x round trip time (ms)

The network throughput of that network <= (TCP buffer size / RTT)

The TCP Windows size needs to be large enough to accommodate network bandwidth x maximum expected delay

or

TCP window size needs to be >= BW * RTT

On a 100 Mbps network with round trip time(RTT) value of 150 ms and with a TCP buffer size of 128 KB, the Bandwidth Delay Product will be 1.88 MB. The maximum throughput value will be <= 6.99 Mbps. To use the 100 Mbps with RTT 150ms, the TCP buffer size should be >= 1831.1 KB

Window Scaling

In our above mentioned network, we are wasting 1815 Kilo Bytes of window size(1880-65). So we need to enable the Window Scaling feature. We can modify the window scaling parameter in linux by editing the sysctl.conf file. You need to set the below parameter to 1.

1
net.ipv4.tcp_window_scaling = 1

You can do the same by executing the below command,

1
echo 'net.ipv4.tcp_window_scaling = 1' >> /etc/sysctl.conf

Obtain TCP Memory Values

Now obtain the TCP memory values by executing the below commands,

1
cat /proc/sys/net/ipv4/tcp_mem

To view receive socket memory size, please execute the below two commands,

1
2
cat /proc/sys/net/core/rmem_max
cat /proc/sys/net/core/rmem_default

To view the send socket memory size, please execute the below two commands. The first command will give its maximum value and the second command will provide you its default value.

1
2
cat /proc/sys/net/core/wmem_max
cat /proc/sys/net/core/wmem_default

To view the maximum amount of option memory buffers, please execute the below command,

1
cat /proc/sys/net/core/optmem_max

Performance Tuning

If the receive socket memory size is small, then sender will be able to send data equal to the receiver socket memory size. So we need to increase this value to a higher value,say 32MB. Likewise, we need the send socket memory size, also to be large, say 32MB.For a network with RTT value, 100ms and 10Gbps network, the value can be as higher as 64MB. If the RTT value is 50ms, then it can be increased to 128MB.

1
2
echo 'net.core.wmem_max=33554432' >> /etc/sysctl.conf
echo 'net.core.rmem_max=33554432' >> /etc/sysctl.conf

Next step is to increase the linux autotuning TCP buffer limit to 16MB. Here, we can set minimum amount of receive window size, which will be set to each TCP connection, even if the server is having a high load. The default value will be allocated against each TCP connection. Since we are employing the window scaling feature, the window size will grow dynamically till the maximum receive window size, set in bytes, 16777216. For a network with RTT value, 100ms and 10Gbps network, the value can be as higher as 32MB.If the RTT value is 50ms, then it can be increased to 128MB.

1
2
echo 'net.ipv4.tcp_rmem = 4096 87380 16777216' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_wmem = 4096 65536 16777216' >> /etc/sysctl.conf

Also recommended to set net.ipv4.tcp_timestamps and net.ipv4.tcp_sack to 1, so that it can reduce the CPU load.

1
2
echo 'net.ipv4.tcp_timestamps = 1' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_sack = 1' >> /etc/sysctl.conf

View congestion control algorithms

To view the available list of congestion control algorithms available for your machine, please execute the bwlo command. It is recommended to set htcp as the congestion control mechanism.

1
sysctl net.ipv4.tcp_available_congestion_control

To set htcp as your congestion control alogithm, please execute the below command,

1
sysctl -w net.ipv4.tcp_congestion_control=htcp

It is recommended to increase number of incoming connections backlog queue Sets the maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.

1
echo 'net.core.netdev_max_backlog = 65536' >> /etc/sysctl.conf

View the performance tuning done

To save and reload, please execute the below command,

1
sysctl -p

We can use the tcpdump to view the changes on eth1, if eth1 is your NIC.

1
tcpdump -ni eth1

Leave a Reply

Your email address will not be published. Required fields are marked *

five + 2 =