NavigationContentFooter
Jump toSuggest an edit

Installing a multi-node Cockroach database with HA Proxy

Reviewed on 03 October 2024Published on 10 April 2018
  • Cockroach
  • cockroachDB
  • HA-proxy

In this tutorial, you will gain knowledge on how to deploy a multi-node Cockroach Database, which is a distributed SQL database system capable of handling data in a highly scalable and resilient manner. Additionally, you will also learn how to integrate the use of HAProxy, a popular open-source load-balancing software, to improve the performance and availability of your Cockroach Database. By following this tutorial, you will be able to set up a robust and efficient database system for your applications.

Before you start

To complete the actions presented below, you must have:

  • A Scaleway account logged into the console
  • Owner status or IAM permissions allowing you to perform actions in the intended Organization
  • An SSH key
  • An Instance running on Ubuntu 16.04

What is CockroachDB and why you should use it

In a traditional SQL setup, all the complexity of replication needs to be managed by the system administrators. This task is quite substantial and can increase the cost of running your infrastructure in case of a mistake.

CockroachDB is designed to avoid this kind of misconfiguration problem by having a design that is reliable by default and ready to be used with minimal configuration.

Architecture of CockroachDB

CockroachDB is a distributed SQL database built on top of a transactional and consistent key-value store. The primary design goals for the application are:

  • support for ACID transactions
  • horizontal scalability
  • survivability

Below you find a schema of the architecture of CockroachDB:

  • SQL: The SQL layer helps developers run SQL queries as in a traditional environment. It provides all the familiar terms and concepts such as schema, tables, and indexes. CockroachDB uses its own SQL feature set. Refer to the Cockroach documentation for the complete feature set.
  • Distributed Key-Value Store: The SQL layer communicates with the distributed key-value store so that we can develop large tables and indexes such as HBase, BigTable, and others.
  • Distributed Transactions: Transactions are the core part of this application, their implementation of this feature manages the transition from SQL to stores and ranges.
  • Node: Nodes are the servers that store your data. They can either be virtual or physical machines. The distributed key-value store routes messages to the different nodes of our cluster.
  • Store: Each node can contain one or more stores, and each store can hold many ranges. RocksDB, an open-source storage engine manages them.
  • Range: Ranges are the lowest level of key-value data. Each store contains ranges, and each range covers a segment of the larger key-space. Together they form the entire monolithic sorted map.

Installation of CockroachDB

  1. Spin up at least three Instances to guarantee redundancy and fault tolerance. We use Ubuntu Xenial (16.04) in this example.
  2. Once the servers have booted we install CockroachDB on them:
    wget -qO- https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz | tar xvz
  3. We copy the binary into the PATH. It makes it easy to execute cockroach commands from any shell:
    cp -i cockroach-latest.linux-amd64/cockroach /usr/local/bin
Tip

It is recommended to run CockroachDB on Instances with at least 1 GB of RAM for optimal performance.

Creation of certificates

SSL certificates are needed to encrypt the connection between our different Instances. It is possible to either use the cockroach cert commands or an alternative like OpenSSL. The example below makes use of cockroach cert:

Create a certificate and safe directory for the CA key.

Start by creating the required directories to store the certificates:

mkdir certs
mkdir cr-keys

Create the CA key pair:

Create the Certificate Authority (CA) certificate and key: ca.crt and ca.key.

cockroach cert create-ca --certs-dir=certs --ca-key=cr-keys/ca.key
Important

Keep this key secret. It is the key of your Certificate Authority and can be used to issue certificates that are signed by your server.

Creation of a client key pair for the root user:

Generate the key pair for the root user: client.root.crt and client.root.key. These files are used to secure the communication between the built-in SQL shell and the cluster.

cockroach cert create-client root --certs-dir=certs --ca-key=cr-keys/ca.key
Tip

In later steps, we use the root user’s certificate to run cockroachDB client commands from a local machine. The root user’s certificate and the key have to be copied to a remote node as well if you plan to run commands directly as root on that node (for local debugging).

Create a key pair for the nodes:

  1. Create a node certificate and key for each node of our cluster: node.crt and node.key. They will be used to secure communication between our different nodes:
    cockroach cert create-node <node1 internal IP address> <node1 external IP address> <node1 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key
  2. Create the remote directory and copy the files to the first node:
    ssh <username>@<node1 address> "mkdir certs"
    scp certs/ca.crt certs/node.crt certs/node.key <username>@<node1 ip address>:~/certs
  3. Delete the local files so that we can create the remaining certificates for our other nodes:
    rm certs/node.crt certs/node.key
Note

This step is required as the new files will also be named node.crt and node.key. An alternative method to create the files without deleting the original files is to use the --overwrite attribute which overwrites the existing files.

  1. Create the key pair for the second node:
    cockroach cert create-node <node2 internal IP address> <node2 external IP address> <node2 public FQDN> --certs-dir=certs --ca-key=cr-keys/ca.key
  2. Upload the files to the remote node:
    ssh <username>@<node2 address> "mkdir certs"
    scp certs/ca.crt certs/node.crt certs/node.key <username>@<node2 address>:~/certs

Repeat these steps for each additional node that we want to use in our cluster.

Note

It is recommended to use at least three nodes to have redundancy and replication services of CockroachDB enabled. It is possible to launch an unlimited number of nodes to scale your cluster as the workload grows.

Start the Instances

Start the cockroachDB daemon on the different nodes by typing the following command (This example is for the first node, you have to replace the parameter --host=node1_address with the IP of node2 and node3 on the two other Instances):

cockroach start --certs-dir=certs --host=<node1 address> --join=<node1 ip address>:26257,<node2 ip address>:26257,<node3 ip address>:26257 --cache=25% --max-sql-memory=25% --background

Repeat this step on each node to start the server (do not forget to edit the IP settings according to each node).

Initialize the cluster

Once we have started the different Instances of Cockroach, they will not automatically form a cluster. We have to initialize it by running the following command:

cockroach init --certs-dir=certs --host=<address of any node>

The following message displays on the first node:

build: CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3)
admin: https://10.10.193.23:8080
sql: postgresql://root@10.10.193.23:26257? application_name=cockroach&sslmode=verify-full&sslrootcert=certs%2Fca.crt
logs: /root/cockroach-data/logs
store[0]: path=/root/cockroach-data
status: initialized new cluster
clusterID: 4c613798-eb26-412b-8a3a-55654a446188
nodeID: 1
build: CCL v1.1.6 @ 2018/03/12 17:58:05 (go1.8.3)

On the other nodes, a message appears, confirming that they have joined the cluster:

status: initialized new node, joined pre-existing cluster
clusterID: 4c613798-eb26-412b-8a3a-55654a446188
nodeID: 3

It is possible to verify the status of the cluster from the admin interface of Cockroach DB, available at https://<node1 ip address>:8080. Some statistics of the different nodes connected to our cluster are visible.

Tip

The administration interface is available on any of the nodes.

Note

Your browser may show a warning about the connection to this site. This message is due to the self-signed certificate that is used by cockroachDB. You can ignore the advice and configure your web browser to accept the certificate.

Synchronizing time

The clocks of your nodes have to be synchronized to ensure that the replication of your cluster works well. If a node is out of sync with at least half of the other nodes in the cluster (a default offset of 500ms is set), it will shut itself down .

It is therefore recommended to synchronize the clock of your nodes with a time server.

We configure ntp with the time servers of Scaleway and Ubuntu in this example:

  1. Install the ntp client:
    sudo apt-get install ntp
  2. Stop the ntp service:
    sudo service ntp stop
  3. Edit the file /etc/ntp.conf in your favorite text editor and remove any lines starting with server or pool. Then add the following content to the file:
    server ntp.int.scaleway.com iburst
    server ntp.ubuntu.com iburst
  4. Restart the service:
    sudo service ntp start
  5. Verify that the node is using the correct NTP server:
sudo ntpq -p

The active NTP server is displayed with an asterisk.

Note

You should complete these steps on all of your nodes to make sure the clocks are synchronized.

Test your cluster

  1. On the primary node, launch the SQL client of CockroachDB:
    cockroach sql --certs-dir=certs --host=<address of node1>
  2. Create a cockroachtest database on the node:
    CREATE DATABASE cockroachtest;
  3. Quit the SQL client with \q or by pressing CRTL+C.
  4. Verify that the database has been replicated, by connecting to another node of the cluster:
    cockroach sql --certs-dir=certs --host=<address of node2>
  5. Do a listing of the databases, which includes cockroachtest:
    > SHOW DATABASES;

It will return something like:

+--------------------+
| Database |
+--------------------+
| crdb_internal |
| information_schema |
| cockroachtest |
| pg_catalog |
| system |
+--------------------+
(5 rows)

Scale the cluster

Your database needs more computing power? Scale it easily by adding additional nodes to the cluster.

Launch additional nodes like the ones you have configured during the initial setup of your cluster.

Configure HAproxy

The cluster is now ready to be used. We can use any of the nodes as an endpoint to our databases, but the load is not being distributed automatically between the nodes. CockroachDB provides built-in support for HAproxy, which allows us to equally distribute the load on any of the nodes.

A Load Balancer increases the Performance and Reliability of an application.

Tip

For performance and availability reasons it is not recommended to run the HAproxy on the same node as your cockroachDB.

  1. Start by running the cockroach gen haproxy command on the primary node to generate automatically the configuration file for HAproxy:
    cockroach gen haproxy --certs-dir=certs --host=<address of the primary node> --port=26257

It generates an haproxy.cfg file with the details of your cluster already configured. It looks like the following example:

global
maxconn 4096
defaults
mode tcp
timeout connect 10s
timeout client 1m
timeout server 1m
listen psql
bind :26257
mode tcp
balance roundrobin
server cockroach1 <node1 address>:26257
server cockroach2 <node2 address>:26257
server cockroach3 <node3 address>:26257
  1. Copy the file to the instance used for HAproxy:
scp haproxy.cfg <username>@<haproxy ip address>:~/
  1. Connect to the machine via SSH and install the proxy on it:
sudo apt-get install haproxy
  1. Start HAproxy with our configuration file:
    haproxy -f haproxy.cfg

The Load Balancer distributes the requests sent to our cluster equally between the different nodes. 5. Test the configuration of cockroachDB from the primary node:

cockroach sql --certs-dir=certs --host=<haproxy ip address>
  1. To check to which node you are connected, type the following command:
    > SHOW node_id;

The result will look like:

+---------+
| node_id |
+---------+
| 2 |
+---------+
(1 row)

It shows us the ID of the node to which we are connected.

Note

In this setup, the Load Balancer is a single point of failure. If the Load Balancer fails, our cluster becomes unavailable. It is therefore recommended to configure at least two load balancers and to use round-robin DNS to maximize the availability.

Conclusion

We were able to configure a highly available CockroachDB cluster in this tutorial. As you can see it is easy to configure and provides excellent possibilities to automatize the distribution of your databases, combined with the possibility to scale your cluster at any time if your requirements grow.

If you want to explore all configuration options, you can find more information in the documentation of CockroachDB.

API DocsScaleway consoleDedibox consoleScaleway LearningScaleway.comPricingBlogCareers
© 2023-2024 – Scaleway