To ensure survivability, you must run at least 3 nodes.
You can use any Instance type except for PLAY2-PICO Instances as they have 1 vCPU, which is below CockroachDB’s minimum requirement of 4 vCPUs.
For more details, refer to the Cockroach hardware recommendations and cluster topology.
Installing CockroachDB on Scaleway Instances
- cockroachdb
- sql
- database
CockroachDB is an open-source, distributed SQL database designed to be scalable, reliable, and consistent.
CockroachDB scales horizontally. This means that in case of increased data volume, you can add more servers to your CockroachDB cluster instead of scaling vertically by adding more resources (memory, CPU, or storage) to a single server.
Its reliability lies in the ability to overcome software and hardware failures. Indeed, CockroachDB uses replication, meaning data is stored across multiple nodes. Moreover, the Raft Consensus Algorithm ensures data consistency by requiring a majority of replicas to agree before a change is committed to the database. Additionally, the Raft algorithm can automatically detect and correct issues (such as identifying and restoring corrupted data) without human intervention.
Thanks to its serializable SQL transactions, CockroachDB also ensures that data remains consistent and accurate. To do so, it combines both the Raft algorithm to read data and timestamps and multiple versions of data (MVCC) to write data. MVCC provides you with a stable and unchanging view of the data whenever you start a transaction, regardless of other changes happening in the database at the same time.
This article shows you how to install CockroachDB using three nodes on a Private Network as well as a Scaleway Load Balancer to access the database and console.
Before you start
To complete the actions presented below, you must have:
- A Scaleway account logged into the console
- Owner status or IAM permissions allowing you to perform actions in the intended Organization
- Installed CockroachDB locally
- A network configuration allowing TCP communication on the following ports:
26257
for intra-cluster and client-cluster communication8080
to expose your DB Console
- Reviewed the production checklist and recommended topology patterns
Availability guidelines
-
Run each node on a separate machine. Since CockroachDB replicates across nodes, running more than one node per machine increases the risk of data loss if a machine fails. Likewise, if a machine has multiple disks or SSDs, run one node with multiple
--store
flags and not one node per disk.For more details about stores, see the Start a Node documentation page.
-
When starting each node, use the
--locality
flag to describe the node’s location, for example,--locality=region=fr-par,zone=fr-par-1
. The key-value pairs should be ordered from most to least inclusive, and the keys and order of key-value pairs must be the same on all nodes. -
When deploying in a single Availability Zone (AZ):
- To support the failure of any node, use at least 3 nodes with the
default
3-way replication factor. In this case, if one node fails, each range retains 2 of its 3 replicas, a majority. - To be able to tolerate 2 simultaneous node failures, use at least 5 nodes and increase the default replication factor for user data to 5. The replication factor for important internal data is 5 by default, so no adjustments are needed for internal data. In this case, if 2 nodes fail at the same time, each range retains 3 of its 5 replicas.
- To support the failure of any node, use at least 3 nodes with the
-
When deploying across multiple AZs:
- To support the failure of an entire AZ in a region, use at least 3 AZs per region and set
--locality
on each node to spread data evenly across regions and AZs. In this case, if one AZ goes offline, the 2 remaining AZs retain a majority of replicas. - To ensure that ranges are split evenly across nodes, use the same number of nodes in each AZ. This is to avoid overloading any nodes with excessive resource consumption.
- To support the failure of an entire AZ in a region, use at least 3 AZs per region and set
-
When deploying across multiple regions:
- To be able to tolerate the failure of 1 entire region, use at least 3 regions.
Access levels
You can configure different access levels for the DB Console. We recommend different configurations depending on your needs:
Access Level | Description |
---|---|
Partially open | Sets an ACL rule on the Load Balancer to allow only specific IP addresses to communicate on port 8080 . |
Completely open | Does not set an ACL rule on the Load Balancer. All IP addresses will be able to access the console on port 8080 . |
Completely closed | Does not create a console backend and frontend on the Load Balancer. In this case, a machine with SSH access to a node could use an SSH tunnel to access the DB Console. |
Creating Instances
This procedure shows you how to create Instances for each node you plan to have in your cluster. Make sure to replace all example values with your own.
-
Run the following command to create a Private Network:
scw vpc private-network create name=cockroachdb region=fr-par subnets.0=192.168.0.0/24 -
Create a Public Gateway to access Instances.
scw vpc-gw gateway create name=external-access type=VPC-GW-S enable-bastion=true -
Create a DHCP subnet that matches the one used in your Private Network.
scw vpc-gw dhcp create subnet=192.168.0.0/24 -
Configure your gateway with your Private Network using your newly created DHCP subnet.
scw vpc-gw gateway-network create enable-dhcp=true enable-masquerade=true \gateway-id=<id> dhcp-id=<id> private-network-id=<id> -
Run the following command to create an Instance in the
fr-par-1
zone with a data volume of 30 GB.scw instance server create name=db01 zone=fr-par-1 type=PRO2-XXS image=ubuntu_jammy ip=none additional-volumes.0=b:30G -
Create a private NIC (Network Interface Card) for your Instance.
scw instance private-nic create zone=fr-par-1 server-id=<id> private-network-id=<id> -
Repeat steps 5 and 6 to create the two other nodes in the
fr-par-2
andfr-par-3
AZs. -
Check your Instance’s DHCP entry on the Private Network and retrieve its private IP.
scw vpc-gw dhcp-entry list gateway-network-id=<id> -
Connect to the Instances using the gateway bastion.
ssh -J bastion@<gw-ip>:61000 root@db01.priv
Synchronizing clocks
CockroachDB requires moderate levels of clock synchronization to preserve data consistency.
For this reason, nodes spontaneously shut down when they detect that their clock is out of sync with at least half of the other nodes in the cluster by 80% of the maximum offset allowed (500ms by default). This is done to avoid consistency anomalies. However, it is best to prevent clocks from drifting too far off. To do so, a clock synchronization software can be run on each mode.
For the sake of this tutorial, we will be using NTP to keep offsets in single-digit milliseconds.
Note that other methods of clock synchronization are suitable as well.
-
Connect to one of your nodes using SSH.
-
Disable
timesyncd
as it tends to be active by default on some Linux distributions by running the following command.timedatectl set-ntp noNoteYou can also make sure
timesyncd
is off by running thetimedatectl
command. -
Run the following command to install the NTP package.
apt-get install ntp -y -
Run the following command to stop the NTP daemon.
service ntp stop -
Configure the machine’s clock with Google’s NTP service, in the
/etc/ntp.conf
file, remove or comment out any lines starting with server or pool and add the following lines.server time1.google.com iburstserver time2.google.com iburstserver time3.google.com iburstserver time4.google.com iburst -
Restart the NTP daemon:
service ntp startImportantWe recommend using Google’s NTP service because it handles “smearing” the leap second. If you use a different NTP service that does not smear the leap second, be sure to configure client-side smearing in the same way on each machine. See the production checklist for details.
-
Synchronize the machine’s clock with Google’s NTP service.
ntpd -b time.google.com -
Verify that the machine is using a Google NTP server.
ntpq -pNoteThe active NTP server will be marked with an asterisk.
ImportantYou need to repeat this step for each machine running a CockroachDB node.
Setting up load balancing
Each CockroachDB node is an equally suitable SQL gateway to your cluster, but to ensure client performance and reliability, it is important to use load balancing:
- Performance: Load balancers spread client traffic across nodes. This prevents nodes from being overwhelmed by requests and improves overall cluster performance (queries per second).
- Reliability: Load balancers decouple client health from the health of a single CockroachDB node. In cases where a node fails, the load balancer redirects client traffic to available nodes.
Scaleway offers fully-managed Load Balancers to distribute traffic between Instances.
-
Create a Load Balancer.
scw lb lb create name=cockroachdb assign-flexible-ip=true type=LB-S -
Attach the Load Balancer to the Private Network using the Private Network UUID.
scw lb private-network attach <lb-id> private-network-id=<id> -
Configure a Load Balancer IP in your DNS, using Scaleway Domains and DNS for example.
scw dns record add <domain> type=A name=<entry> data=<lb-ip> ttl=3600 -
Create a Load Balancer certificate using Let’s Encrypt.
scw lb certificate create lb-id=<lb-ip> name=cockroach letsencrypt-common-name=<fqdn> -
Create the DB communication backend and check the ready status of the node via the console.
scw lb backend create lb-id=<lb-ip> name=db forward-protocol=tcp forward-port=26257 \forward-port-algorithm=leastconn sticky-sessions=table health-check.check-max-retries=5 \server-ip.0=<ip db01> server-ip.1=<ip db02> server-ip.2=<ip db03> health-check.port=8080 \health-check.http-config.method=GET "health-check.http-config.uri=/health?ready=1" \health-check.http-config.code=200 -
Create the DB communication frontend. SSL is handled by the nodes.
scw lb frontend create lb-id=<lb-id> name=db inbound-port=26257 backend-id=<backend-id> -
Create the console backend. The health check can be done via HTTP, even if the console is exposed via HTTPS.
scw lb backend create lb-id=<lb-ip> name=console forward-protocol=https forward-port=8080 \forward-port-algorithm=leastconn sticky-sessions=table health-check.port=8080 \server-ip.0=<ip db01> server-ip.1=<ip db02> server-ip.2=<ip db03> \health-check.check-max-retries=5 health-check.http-config.method=GET \"health-check.http-config.uri=/health" health-check.http-config.code=200 \ssl-bridging=true ignore-ssl-server-verify=true -
Create the console frontend. We use the Let’s Encrypt certificate to avoid a self-signed CA warning.
scw lb frontend create lb-id=<lb-ip> name=console inbound-port=443\certificate-ids.0=<cert-id> backend-id=<backend-id> -
Use the following command to set up an ACL on the frontend:
scw lb acl create frontend-id=<console-frontend-id> name=admin action.type=deny \match.ip-subnet.0=<your ip> match.invert=true index=0
Configuring your network
Set up a firewall on each of your Instances, allowing TCP incoming communication on the following three ports:
26257
(tcp:26257
) for inter-node communication (i.e., working as a cluster), for applications to connect to the Load Balancer, and for routing from the Load Balancer to the nodes8080
(tcp:8080
) for exposing your DB Console22
(tcp:22
) for SSH access
For further information, refer to the Configuring Firewalls for Instances tutorial.
Generating certificates
You can use cockroach cert commands
, openssl commands, or Auto TLS cert generation (alpha) to generate security certificates. This section features the cockroach cert
commands.
Locally, you will need to create the following certificates and keys:
- A certificate authority (CA) key pair (
ca.crt
andca.key
). - A node key pair for each node, issued to its IP addresses and any common names the machine uses, as well as to the IP addresses and common names for machines running load balancers.
- A client key pair for the
root
user. You will use this to run a sample workload against the cluster as well as somecockroach
client commands from your local machine.
-
Install CockroachDB on your local machine.
-
Create two directories.
mkdir certsmkdir my-safe-directorycerts
- You will generate your CA certificate and all node and client certificates and keys in this directory and then upload some of the files to your nodes.
my-safe-directory
- You will generate your CA key in this directory and then reference the key when generating node and client certificates. After that, you will keep the key safe and secret; you will not upload it to your nodes.
-
Create the CA certificate and key.
cockroach cert create-ca --certs-dir=certs --ca-key=my-safe-directory/ca.key -
Create the certificate and key for the first node, issued to all common names you might use to refer to the node as well as to the load balancer instances:
cockroach cert create-node \<ip db01> \db01.priv \localhost \127.0.0.1 \<lb ip> \<lb fqdn> \--certs-dir=certs \--ca-key=my-safe-directory/ca.key -
Upload the CA certificate, node certificate, and key to the first node.
scp -J bastion@<gateway ip>:61000 -r certs/ root@db01.priv:~/ -
Delete the local copy of the node certificate and key:
rm -f certs/node.* -
Repeat steps 3 to 5 for each additional node.
-
Create a client certificate and key for the root user.
cockroach cert create-client root \--certs-dir=certs \--ca-key=my-safe-directory/ca.key
Starting the nodes
You will automate starting nodes using systemd.
After completing these steps, nodes will not yet be live. They will complete the startup process and join together to form a cluster as soon as the cluster is initialized in the next step.
For each initial node of your cluster, complete the following steps:
-
Connect to the machine where you want the node to run via SSH. Ensure you are logged in as the root user.
-
Download the CockroachDB archive for Linux, and extract the binary:
curl https://binaries.cockroachdb.com/cockroach-v23.2.3.linux-amd64.tgz | tar -xz -
Copy the binary into the PATH:
cp -i cockroach-v23.2.3.linux-amd64/cockroach /usr/local/bin/CockroachDB uses custom-built versions of the GEOS libraries.
-
Copy the libraries to the location where CockroachDB expects to find them:
mkdir -p /usr/local/lib/cockroachcp -i cockroach-v23.2.3.linux-amd64/lib/libgeos.so /usr/local/lib/cockroach/cp -i cockroach-v23.2.3.linux-amd64/lib/libgeos_c.so /usr/local/lib/cockroach/ -
Create the Cockroach directory.
mkdir /var/lib/cockroach -
Prepare and attach the data volume.
mkfs -t ext4 /dev/sdbe2label /dev/sdb DATAecho "LABEL=DATA /var/lib/cockroach ext4 defaults 0 1" >> /etc/fstabmount -a -
Move the
certs
directory to the cockroach directory.mv certs /var/lib/cockroach/ -
Create a Unix user named
cockroach
.useradd cockroach -
Change the ownership of the cockroach directory to the user cockroach.
chown -R cockroach /var/lib/cockroach -
Download the sample configuration template and save the file in the
/etc/systemd/system/
directory:curl -sSL -o /etc/systemd/system/cockroachdb.service https://raw.githubusercontent.com/cockroachdb/docs/master/_includes/v23.2/prod-deployment/securecockroachdb.serviceAlternatively, you can create the file yourself and copy the script into it:
[Unit]Description=Cockroach Database cluster nodeRequires=network.target[Service]Type=notifyWorkingDirectory=/var/lib/cockroachExecStart=/usr/local/bin/cockroach start --certs-dir=certs --advertise-addr=<node1 address> --join=<node1 address>,<node2 address>,<node3 address> --cache=.25 --max-sql-memory=.25TimeoutStopSec=300Restart=alwaysRestartSec=10StandardOutput=syslogStandardError=syslogSyslogIdentifier=cockroachUser=cockroach[Install]WantedBy=default.target -
In the sample configuration template, specify values for the following flags:
Flag Description --advertise-addr
Specifies the IP address/hostname and port to tell other nodes to use. The port number can be omitted, in which case it defaults to 26257
.
This value must route to an IP address the node is listening on (with--listen-addr
unspecified, the node listens on all IP addresses).
In some networking scenarios, you may need to use--advertise-addr
and/or--listen-addr
differently. For more details, see Networking.--join
Identifies the address of 3 of the initial nodes of the cluster. These addresses should match the addresses that the target nodes are advertising. When deploying across multiple data centers, or when there is otherwise high latency between nodes, it is recommended to set
--locality
as well, like for example--locality=region=fr-par,zone=fr-par-1
. It is also required to use certain enterprise features. For more details, see Locality.For other flags not explicitly set, the command uses default values. For example, the node stores data in
--store=cockroach-data
and binds DB Console HTTP requests to--http-port=8080
. To set these options manually, see the Start a Node documentation page. -
Start the CockroachDB cluster.
systemctl start cockroachdb -
Repeat these steps for each additional node that you want in your cluster.
Initializing the cluster
On your local machine, run the cockroach init command to complete the node startup process and have them join together as a cluster.
Run the cockroach init command, with the --host
flag set to the Load Balancer address to access any of the nodes:
cockroach init --certs-dir=certs --host=<lb-fqdn>
The console is available at https://<lb-fqdn>
.
Testing the cluster
CockroachDB replicates and distributes data behind the scenes and uses a Gossip protocol to enable each node to locate data across the cluster. Once a cluster is live, any node can be used as an SQL gateway.
When using a Load Balancer, you should issue commands directly to the Load Balancer, which then routes traffic to the nodes.
Use the built-in SQL client locally as follows:
- Launch the built-in SQL client on your local machine with the
--host
flag set to the address of the Load Balancer:cockroach sql --certs-dir=certs --host=<lb-fqdn> - Create a database called
securenodetest
.CREATE DATABASE securenodetest; - View the cluster’s databases, which will include
securenodetest
.SHOW DATABASES;+--------------------+| Database |+--------------------+| crdb_internal || information_schema || securenodetest || pg_catalog || system |+--------------------+(5 rows) - Create an admin user and password to log in to the console:
CREATE USER webadmin WITH LOGIN PASSWORD '<strong password>';GRANT admin TO webadmin;
- Use
\q
to exit the SQL shell. - Use the
webadmin
user to log into the console viahttps://<lb-fqdn>
.
Monitoring the cluster
Despite CockroachDB’s various built-in safeguards against failure, it is critical to actively monitor the overall health and performance of a cluster running in production and to create alerting rules that promptly send notifications when there are events that require investigation or intervention.
You can leverage Scaleway Cockpit to set up monitoring and alerting using CockroachDB Prometheus endpoint via Prometheus Remote Write capabilities. This can be done by installing Prometheus as an agent on each node.
For details about available monitoring options and the most important events and metrics to alert on, see [Monitoring and Alerting(https://www.cockroachlabs.com/docs/v23.2/monitoring-and-alerting).
Scaling the cluster
You can start the new nodes using the same procedure used to create the initial nodes (steps 1, 2, 4, 5, and 6). The join list can be set to the initial 3 nodes installed.
Using the database
Now that your deployment is working, you can:
- Implement your data model.
- Create users and grant them privileges.
- Connect your application. Be sure to connect your application to the load balancer, not to a CockroachDB node.
- Make backups of your data.
You may also want to adjust the way the cluster replicates data. For example, by default, a multi-node cluster replicates all data 3 times; you can change this replication factor or create additional rules for replicating individual databases and tables differently. For more information, see Configure Replication Zones.
When running a cluster of 5 nodes or more, it is safest to increase the replication factor for important internal data to 5, even if you do not do so for user data. For the cluster as a whole to remain available, the ranges for this internal data must always retain a majority of their replicas.