Testing Cloud Network Throughput – Data Guard

Part 1: iPerf Throughput Test

Disaster Recovery (DR) plays a large role in the digital transformation strategy of organizations. Businesses use public clouds to ensure fast recovery of their critical IT systems in events of outages caused by disasters. Organizations tend to deploy their DR site in public clouds such as AWS, Azure, OCI or GCP.

The DR site usually hosts standby database and infrastructure that replicate those in the primary data centre. In some cases, both the production (primary) and DR (standby) data centres are hosted in public clouds. If customers are running Oracle Database, then high availability, data protection, and disaster recovery for their enterprise data are often provided by Oracle Data Guard.

In order for companies to meet their DR Recovery Time Objective (RTO) and Recovery Point Objective (RPO), they need to have a predictable network performance between their primary and standby sites. Part of designing a predictable and high-performance network is outlining a concise procedure for testing the network throughput required to deliver the DR RTO and RPO objectives.

Next, we will discuss a simple enterprise production and disaster recovery scenario – and how to test the network throughput for Data Guard deployment.

A Basic Production and DR Setup

 

Scenario Overview

The diagram above shows the setup that will be used to test Data Guard throughput. We have a primary production site (simulated in Google cloud) and a standby site (simulated in Oracle cloud). Google cloud currently does not support Oracle database – and we are just using it for demo purposes only.

We will not be running actual Oracle databases. Instead, we will use Linux instances in both Google and Oracle cloud from where we will run network throughput test tools that will measure the throughput performance. This would be sufficient for network benchmarking.

In our scenario, the production primary database has its local redo log file – which holds changes made to the primary database. The primary local online redo log is transmitted from the primary database system to the standby database system using the Redo Transport Services. In this scenario, the transmission is done via secure IPsec tunnel over the public internet. A Log Apply service can automatically apply the redo data on the standby database to maintain consistency with the primary database. For more information about the operation of Data Guard, see the following Oracle documentation – Introduction to Oracle Data Guard.

Functionally, Data Guard addresses site failures in a situation where a disaster makes the primary site (housing the primary database) becomes unavailable. The goal for the Data Guard configuration is to transmit redo data to the remote disaster recovery site fast enough to meet the RTO and RPO objectives.

 

Network Performance Benchmarking – iPerf vs Oratcptest

There are many network performance benchmarking tools available. One of such tools is iPerf. The iPerf tool only measures network performance from layer 3 and layer 4 perspectives. On the other hand, Oracle provides a TCP test tool capable of measuring network performance whilst considering how Data Guard application works at an application layer. This would give a more accurate representation of network throughput than with iPerf.

The Oracle TCP test tool and the instructions for using it can be found in Oracle Support document – “Measuring Network Capacity using oratcptest (Doc ID 2064368.1)“. It is a cross-platform jar file that can be run in client and server modes.

The article, Benchmarking Data Guard Throughput, is also a great resource showing how the oratcptest.jar tool works. We will expand on this later by actually testing the throughput between Oracle cloud and Google cloud over an IPsec VPN connection over the internet. This represents a more realistic scenario that customers would encounter.

 

Connecting the Primary and Standby Database Sites

In this scenario, we are connecting the primary database to the secondary database via an IPsec tunnel over the public internet. We implemented the IPsec VPN in a prior article – IPsec VPN – Google and Oracle Cloud.

Testing Throughput using iPerf

Let’s take a look at the iPerf script on the server side (standby database).

[opc@localhost ~]$ iperf3 -s -V

iperf 3.1.7
Linux localhost 4.1.12-94.3.9.el7uek.x86_64 #2 SMP Fri Jul 14 20:09:40 PDT 2017 x86_64
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Time: Mon, 01 Jan 2018 17:03:17 GMT
Accepted connection from 172.16.10.2, port 44662
      Cookie: instance-1.1514826196.917776.7174dbc
      TCP MSS: 0 (default)
[  5] local 192.168.100.2 port 5201 connected to 172.16.10.2 port 44664
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  46.7 MBytes   392 Mbits/sec
[  5]   1.00-2.00   sec  31.5 MBytes   264 Mbits/sec
[  5]   2.00-3.00   sec  24.8 MBytes   208 Mbits/sec
[  5]   3.00-4.00   sec  27.3 MBytes   229 Mbits/sec
[  5]   4.00-5.00   sec  27.9 MBytes   234 Mbits/sec
[  5]   5.00-6.00   sec  30.4 MBytes   255 Mbits/sec
[  5]   6.00-7.00   sec  32.9 MBytes   276 Mbits/sec
[  5]   7.00-8.00   sec  34.5 MBytes   290 Mbits/sec
[  5]   8.00-9.00   sec  31.5 MBytes   264 Mbits/sec
[  5]   9.00-10.00  sec  32.6 MBytes   274 Mbits/sec
[  5]  10.00-10.04  sec  1.13 MBytes   232 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-10.04  sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-10.04  sec   321 MBytes   268 Mbits/sec                  receiver
CPU Utilization: local/receiver 1.7% (0.2%u/1.6%s), remote/sender 0.0% (0.0%u/0.0%s)
rcv_tcp_congestion cubic
iperf 3.1.7
Linux localhost 4.1.12-94.3.9.el7uek.x86_64 #2 SMP Fri Jul 14 20:09:40 PDT 2017 x86_64
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

 

Let’s now take a look at the iPerf script on the client side (primary database).

kayode@instance-1:~$ iperf3 -c server.cnt.com -V
iperf 3.1.3
Linux instance-1 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64
Time: Mon, 01 Jan 2018 17:03:16 GMT
Connecting to host server.cnt.com, port 5201
      Cookie: instance-1.1514826196.917776.7174dbc
      TCP MSS: 1308 (default)
[  4] local 172.16.10.2 port 44664 connected to 192.168.100.2 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  50.7 MBytes   425 Mbits/sec   61   2.55 KBytes
[  4]   1.00-2.00   sec  32.1 MBytes   269 Mbits/sec  369    396 KBytes
[  4]   2.00-3.00   sec  24.6 MBytes   207 Mbits/sec    0    436 KBytes
[  4]   3.00-4.00   sec  27.4 MBytes   230 Mbits/sec    0    478 KBytes
[  4]   4.00-5.00   sec  28.1 MBytes   235 Mbits/sec    0    516 KBytes
[  4]   5.00-6.00   sec  30.5 MBytes   256 Mbits/sec    0    557 KBytes
[  4]   6.00-7.00   sec  32.9 MBytes   276 Mbits/sec    0    597 KBytes
[  4]   7.00-8.00   sec  34.5 MBytes   289 Mbits/sec    0    635 KBytes
[  4]   8.00-9.00   sec  31.2 MBytes   262 Mbits/sec   35    499 KBytes
[  4]   9.00-10.00  sec  32.5 MBytes   273 Mbits/sec    0    557 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   324 MBytes   272 Mbits/sec  465             sender
[  4]   0.00-10.00  sec   321 MBytes   269 Mbits/sec                  receiver
CPU Utilization: local/sender 1.4% (0.2%u/1.2%s), remote/receiver 1.7% (0.2%u/1.6%s)
 
iperf Done.

 

The iPerf test shows that we have a bandwidth of about 270Mbit/sec – using the default values for Maximum Segment Size (MSS) and buffer length of 128 KB for TCP.

We can also do a quick check on the latency between the primary database and standby database using the ping utility:

kayode@instance-1:~$ ping server.cnt.com
PING server.cnt.com (192.168.100.2) 56(84) bytes of data.
64 bytes from server.cnt.com (192.168.100.2): icmp_seq=1 ttl=63 time=15.3 ms
64 bytes from server.cnt.com (192.168.100.2): icmp_seq=2 ttl=63 time=13.5 ms
64 bytes from server.cnt.com (192.168.100.2): icmp_seq=3 ttl=63 time=13.4 ms
64 bytes from server.cnt.com (192.168.100.2): icmp_seq=4 ttl=63 time=13.6 ms
64 bytes from server.cnt.com (192.168.100.2): icmp_seq=5 ttl=63 time=13.6 ms
64 bytes from server.cnt.com (192.168.100.2): icmp_seq=6 ttl=63 time=13.4 ms
64 bytes from server.cnt.com (192.168.100.2): icmp_seq=7 ttl=63 time=13.8 ms
^C
--- server.cnt.com ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6011ms
rtt min/avg/max/mdev = 13.461/13.848/15.369/0.643 ms

 

In many cases, IT and network administrators rely only on network throughput test tools without digging deeper into the applications that will be transported over the network. The applications themselves (such as Data Guard) have some characteristics that will affect the end-to-end application throughput and latency.

We will take a look at testing the throughput using OraTCPtest tool in Part 2 of this article.

Leave a Reply

Your email address will not be published. Required fields are marked *