ClickHouse is a column-oriented database designed to address OLAP – Online Analytical Processing.
OLAP is a technique for advanced big data analysis.
The language ClickHouse uses is a variation of SQL, which helps beginners learn this query language faster.
Install of ClickHouse on Ubuntu involves a series of steps that includes adjusting the configuration file to enable listening over other IP address and remote access.
Here at Ibmi Media, as part of our Server Management Services, we regularly help our Customers to perform Ubuntu related Software Installation tasks.
In this context, we shall look into the steps to install ClickHouse on Ubuntu.
ClickHouse is an open-source analytics database developed for big data use cases.
Typically, the installation process involves the steps below:
i. Installing ClickHouse
ii. Starting the Service
iii. Change the listening IP address
iii. Enable Tabix
iv. Setting Up Firewall Rules
Next, Let us discuss each of them in detail.
In this section, we will install the ClickHouse server and client programs using apt.
Here, we need to add the repository of ClickHouse.
To add the repository's GPG key and then to add the repository to the APT repositories, use the command below:
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4
$ echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list
Now, we shall update the packages:
$ sudo apt update
Next, we shall proceed to install the clickhouse-server and clickhouse-client packages:
$ sudo apt install clickhouse-server clickhouse-client
Here, it will ask to set a password for the default ClickHouse user.
We will require this password to access ClickHouse server later.
After the ClickHouse server and client installation, let us proceed to start the database service.
i. We can start the clickhouse-server service and verify its status with the commands below:
$ sudo service clickhouse-server start
$ sudo service clickhouse-server status
The output of the status command confirms that the server is running.
ii. Further, we can also enable ClickHouse to run on boot with the command below:
$ sudo systemctl enable clickhouse-server
iii. Now that we have enabled ClickHouse, we can access ClickHouse with the password that we set during installation.
$ clickhouse-client --password
This will take us to the prompt where we can execute SQL statements create, update and delete databases, tables, etc.
We can also execute queries to retrieve filtered data from this client prompt.
For instance, to create a test database use the command below:
ch:) CREATE DATABASE test;
By default ClickHouse only listens on localhost.
Thus we can access it only from the same server. However, we can change the listening IP address, by editing the /etc/clickhouse-server/config.xml file.
Open this file with any available text editor, find the string listen_host, and uncomment the line.
To make the ClickHouse server listen only on a specific IP address, we can edit the line as follows:
<listen_host>xxx.xxx.xxx.xxx</listen_host>
Replace xxx.xxx.xxx.xxx with the actual IP address.
To access the ClickHouse server through a web browser, we need to enable Tabix by editing the config.xml file as mentioned before.
Open the config.xml file as we did earlier, find the string http_server_default_response and uncomment the line.
After making changes to the config.xml file, we need to restart our clickhouse-server service:
# sudo systemctl restart clickhouse-server
Now, we can access ClickHouse over a browser with the link http://Your_IP_Address:8123 and log in using the default user and password that we specified earlier in the installation process.
This step should be followed if we intend to connect to the ClickHouse database server remotely.
Remote connection requires ClickHouse to listen to an IP address other than localhost.
Thus changing the listening IP address that we discussed earlier is the primary step to enable remote access.
ClickHouse's server listens on port 8123 for HTTP connections and port 9000 for connections from clickhouse-client.
Allow access to both ports for the remote server IP address with the following command:
$ sudo ufw allow from remote_server_ip/32 to any port 8123
$ sudo ufw allow from remote_server_ip/32 to any port 9000
We can add additional IPs such as our local machine’s address in the same manner if required.
Now to verify the remote connection, install the clickhouse-client on the remote server with the steps that we followed initially. Then, start a client session by executing:
$ clickhouse-client --host your_server_ip --password
We will see an output that we have connected to the server.
To start working with ClickHouse databases, launch the ClickHouse client.
When you start a session, the procedure is similar to other SQL management systems.
To start the client, use the command:
$ clickhouse-client
You may get this error:
“Code: 516. DB::Exception: Received from localhost:9000. DB::Exception: default: Authentication failed: password is incorrect or there is no user with such name.”
When that error occurs, you need to define the password entered during the installation for the default user.
To do so, enter:
$ clickhouse-client --password test1234 --user default
Replace the sample password with your own.
The session starts.
This article covers how to install ClickHouse on Ubuntu. Basically, ClickHouse is an open-source analytics database developed for big data use cases.
Install of ClickHouse on Ubuntu involves a series of steps that includes adjusting the configuration file to enable listening over other IP address and remote access.
Column-oriented databases store records in blocks grouped by columns instead of rows.
By not loading data for columns absent in the query, column-oriented databases spend less time reading data while completing queries.
As a result, these databases can compute and return results much faster than traditional row-based systems for certain workloads, such as OLAP.
Online Analytics Processing (OLAP) systems allow for organizing large amounts of data and performing complex queries.
They are capable of managing petabytes of data and returning query results quickly.
In this way, OLAP is useful for work in areas like data science and business analytics.
Aggregation queries are queries that operate on a set of values and return single output values.
In analytics databases, these queries are run frequently and are well optimized by the database.
Some aggregate functions supported by ClickHouse are:
1. count: returns the count of rows matching the conditions specified.
2. sum: returns the sum of selected column values.
3. avg: returns the average of selected column values.
Some ClickHouse-specific aggregate functions include:
1. uniq: returns an approximate number of distinct rows matched.
2. topK: returns an array of the most frequent values of a specific column using an approximation algorithm.
You can set up a ClickHouse database instance on your server and create a database and table, add data, perform queries, and delete the database.
You can start, stop, and check the ClickHouse service with a few commands.
To start the clickhouse-server, use:
$ sudo systemctl start clickhouse-server
The output does not return a confirmation.
To check the ClickHouse service status, enter:
$ sudo systemctl status clickhouse-server
To stop the ClickHouse server, run this command:
$ sudo systemctl stop clickhouse-server
To enable ClickHouse on boot:
$ sudo systemctl enable clickhouse-server
To start working with ClickHouse databases, launch the ClickHouse client.
When you start a session, the procedure is similar to other SQL management systems.
To start the client, use the command:
$ clickhouse-client