Elasticsearch Setup Guide | Linux Full-text Search Engine

Scene with engineers setting up Elasticsearch on Linux in an IOFLOOD datacenter to enhance data search capabilities

To improve the navigation of local files on our servers at IOFLOOD we installed Elasticsearch as a search and analytics engine solution. After configuring, we’ve experienced it offers a powerful open-source search engine that allows users to store and search large volumes of data in real-time. To provide a concise tutorial on installing Elasticsearch on Linux we have written today’s article, so that our customers can create their own search applications and analytics platforms on their cloud server services.

In this tutorial, we will guide you on how to install Elasticsearch on your Linux system. We will delve into compiling Elasticsearch from source, installing a specific version, and finally, how to use Elasticsearch and ensure it’s installed correctly.

So, let’s dive in and begin installing Elasticsearch on your Linux system!

TL;DR: How Do I Install Elasticsearch on Linux?

To install Elasticsearch, download the package from the official Elasticsearch website or use a package manager like apt or yum. On Debian-based systems like Ubuntu, use sudo apt-get install elasticsearch. For RPM-based systems like CentOS, use sudo yum install elasticsearch.

Here’s an example

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get update && sudo apt-get install elasticsearch

This will add the Elasticsearch GPG key to your system, update your package lists, and then install Elasticsearch. After running these commands, Elasticsearch should be installed on your system.

But there’s more to installing Elasticsearch than just running a couple of commands. There are also other methods of installation, such as installing from source or using Docker, and there are considerations to take into account, like ensuring your system has enough resources to run Elasticsearch. So, continue reading for a more detailed guide on how to install Elasticsearch on Linux.

Getting Started with Elasticsearch

Elasticsearch is a powerful, open-source, distributed, RESTful search and analytics engine. It’s built on top of Apache Lucene and allows for real-time searching and analyzing of your data. Elasticsearch is widely used for log and event data analysis, full-text search, and as a key-value store. If you’re dealing with large amounts of data and need to retrieve information from it quickly, Elasticsearch is an excellent tool to have in your arsenal.

Installing Elasticsearch with APT

If you’re using a Debian-based distribution like Ubuntu, you can install Elasticsearch using the APT package manager. Here’s how you can do it:

# Import the Elasticsearch PGP Key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

# Install the HTTPS transport for APT
sudo apt-get install apt-transport-https

# Add the Elasticsearch source list to the sources.list.d directory
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

# Update your package lists
sudo apt-get update

# Install Elasticsearch
sudo apt-get install elasticsearch

This series of commands will install Elasticsearch on your system. It first adds the Elasticsearch PGP key to your system, then installs the HTTPS transport for APT, which is necessary to fetch packages from the Elasticsearch source list. The source list is then added to your APT sources, your package lists are updated, and finally, Elasticsearch is installed.

Installing Elasticsearch with YUM

If you’re using a RHEL-based distribution like CentOS or AlmaLinux, you can install Elasticsearch using the YUM package manager. Here’s how to do it:

# Import the Elasticsearch PGP Key
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

# Download and install Elasticsearch
sudo yum install https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.1-x86_64.rpm

# Enable and start the Elasticsearch service
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch.servicen
sudo systemctl start elasticsearch.service

These commands will install Elasticsearch on your system. It first adds the Elasticsearch PGP key to your system, then downloads and installs Elasticsearch. After installation, it reloads the system daemon, enables the Elasticsearch service to start on boot, and starts the Elasticsearch service.

Installing Elasticsearch from Source

While package managers make installation straightforward, sometimes you may need to install Elasticsearch from source. This allows you to access the latest features and bug fixes, or customize the build for specific needs.

First, clone the Elasticsearch repository from GitHub:

git clone https://github.com/elastic/elasticsearch.git
cd elasticsearch

Then, build Elasticsearch using the included Gradle wrapper:

./gradlew assemble

This will compile Elasticsearch and package it into a tarball which can be found in the distribution/archives directory.

Installing Specific Versions

From Source

If you need a specific version of Elasticsearch, you can check out the appropriate tag before building:

git checkout v7.10.1
./gradlew assemble

This will build version 7.10.1 of Elasticsearch.

Using Package Managers

APT

For Debian-based distributions, you can specify the version when installing with APT:

sudo apt-get install elasticsearch=7.10.1

YUM

For RHEL-based distributions, you can specify the version when installing with YUM:

sudo yum install elasticsearch-7.10.1

Version Comparison

Different versions of Elasticsearch come with various features and improvements. Here’s a brief comparison:

VersionKey FeaturesCompatibility
7.10.1Asynchronous search, searchable snapshotsCompatible with Java 11
7.9.3Data tiers, improved indexing speedCompatible with Java 11
7.8.1Improved geo capabilities, better resilienceCompatible with Java 11

Using and Verifying Elasticsearch

Basic Usage

Elasticsearch operates as a RESTful service, and you can interact with it using HTTP methods. For example, you can check the status of the cluster:

curl -X GET 'http://localhost:9200/_cluster/health?pretty'

Verification

To verify that Elasticsearch is installed and running correctly, you can request its version:

curl -X GET 'http://localhost:9200'

This should return information about the running instance, including the Elasticsearch version.

Other Install Methods: Elasticsearch

While the traditional package manager and source installation methods are popular, there are alternative approaches to installing Elasticsearch on a Linux system. One such method is using Docker, a platform that packages software in containers for easy deployment and scaling.

Installing Elasticsearch with Docker

Docker provides a way to run applications securely isolated in a container, packaged with all its dependencies and libraries. Here’s how you can install Elasticsearch using Docker:

# Pull the Elasticsearch Docker Image
sudo docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.1

# Run Elasticsearch Docker Container
sudo docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.10.1

This will pull the Elasticsearch Docker image and then run it in a Docker container. The -p options map the container’s ports to your host’s ports, and the -e option sets an environment variable that configures Elasticsearch to run as a single node.

Verifying Elasticsearch Docker Installation

You can check if Elasticsearch is running correctly in the Docker container by sending a request to its API. The command and expected output are as follows:

curl -X GET 'http://localhost:9200'

# Output:
# {
#   "name" : "elasticsearch",
#   "cluster_name" : "docker-cluster",
#   "cluster_uuid" : "OiRlRz5SReWspp2U9JZb3A",
#   "version" : {
#     "number" : "7.10.1",
#     "build_flavor" : "default",
#     "build_type" : "docker",
#     "build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
#     "build_date" : "2020-12-05T01:00:33.671820Z",
#     "build_snapshot" : false,
#     "lucene_version" : "8.7.0",
#     "minimum_wire_compatibility_version" : "6.8.0",
#     "minimum_index_compatibility_version" : "6.0.0-beta1"
#   },
#   "tagline" : "You Know, for Search"
# }

This command sends a GET request to the Elasticsearch API, and the output shows information about the running Elasticsearch instance.

Docker vs Traditional Installation

The Docker method has several advantages over traditional installation methods. It provides a consistent environment across different systems, simplifies the setup process, and makes it easier to run multiple instances of Elasticsearch. However, it requires a basic understanding of Docker and might not be suitable for all use cases.

Installation MethodAdvantagesDisadvantages
TraditionalDirect control over installation, No additional tools requiredMore complex setup, Harder to replicate environment
DockerConsistent environment, Simplified setup, Easier scalingRequires Docker knowledge, Might be overkill for simple use cases

In conclusion, the method you choose for installing Elasticsearch on Linux depends on your specific needs and expertise. For most users, the traditional package manager method should suffice. However, if you need more flexibility and scalability, or if you’re already using Docker, the Docker method might be a better fit.

Common Issues: Elasticsearch Install

While installing Elasticsearch on Linux is typically straightforward, you may encounter some issues along the way. Here are a few common problems and their solutions.

Elasticsearch Service Doesn’t Start

After installing Elasticsearch, you might find that the service doesn’t start. This could be due to insufficient system resources, particularly memory.

Elasticsearch requires at least 2GB of RAM to run smoothly. To check your system’s memory, you can use the free -h command:

free -h

# Output:
#               total        used        free      shared  buff/cache   available
# Mem:           7.7Gi       1.1Gi       4.8Gi       120Mi       1.7Gi       6.2Gi
# Swap:          2.0Gi          0B       2.0Gi

The output shows your total, used, and free memory. If your free memory is less than 2GB, consider closing some applications or upgrading your system memory.

Elasticsearch Service Is Unreachable

If you’ve started the Elasticsearch service but can’t connect to it, the service might not be listening on the correct network interface. By default, Elasticsearch listens on localhost (127.0.0.1).

You can check the network interfaces Elasticsearch is listening on using the netstat command:

sudo netstat -tuln | grep 9200

# Output:
# tcp6       0      0 127.0.0.1:9200          :::*                    LISTEN

The output shows that Elasticsearch is listening on 127.0.0.1 (localhost) on port 9200. If you need Elasticsearch to be accessible from other machines, you’ll need to configure it to listen on a public IP address or 0.0.0.0.

Elasticsearch Returns an Error After Starting

If Elasticsearch starts but returns an error when you try to interact with it, there might be an issue with your Java installation. Elasticsearch requires Java, and while it comes with a bundled JVM, you can use your own.

To check your Java version, you can use the java -version command:

java -version

# Output:
# openjdk version "11.0.10" 2021-01-19
# OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
# OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

The output shows your Java version. If you don’t have Java installed, or if your version is not compatible with Elasticsearch, consider updating or installing the appropriate Java version.

Considerations When Installing Elasticsearch

When installing Elasticsearch, keep in mind that it’s a resource-intensive service. Ensure your system has enough resources (particularly memory) to run it. Also, remember that Elasticsearch uses a fair amount of disk space to store data. Make sure you have enough free disk space to accommodate your data needs.

Lastly, keep your Elasticsearch installation secure. Restrict network access to your Elasticsearch instance and consider enabling features like SSL and authentication.

Understanding Elasticsearch

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real-time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.

Elasticsearch: A Real-time Data Analytics Powerhouse

In the world of big data, data analysis and visualization are crucial for understanding your information. Elasticsearch is a real-time distributed search and analytics engine that allows you to explore your data at a speed and at a scale never before possible. It’s used for log and event data analysis, as well as for search in applications of all types.

# To analyze data in Elasticsearch, you might use a command like this:
curl -X GET 'http://localhost:9200/_analyze' -H 'Content-Type: application/json' -d'{  "analyzer": "standard",  "text": "this is a test"}'

# Output:
# {
#   "tokens": [
#     {
#       "token": "this",
#       "start_offset": 0,
#       "end_offset": 4,
#       "type": "<ALPHANUM>",
#       "position": 0
#     },
#     ...
#   ]
# }

This command sends a GET request to the Elasticsearch _analyze API, passing in some text to be analyzed. The response includes a list of tokens that Elasticsearch has extracted from the text, which is the first step in indexing data.

The Importance of Elasticsearch in Data Analysis

Elasticsearch is not just a Search Engine, it’s a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. It centrally stores your data so you can discover the expected and uncover the unexpected. Elasticsearch lets you perform and combine many types of searches — structured, unstructured, geo, metric — any way you want.

As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease. And this is just the beginning. Elasticsearch is the gateway to an ecosystem of features and capabilities.

Big Data Uses with Elasticsearch

Elasticsearch is not just a tool for searching and analyzing real-time data; it’s also a critical component in the field of big data and machine learning. Its ability to handle large volumes of data in near real-time makes it an ideal platform for big data analytics.

Elasticsearch and Big Data

In the realm of big data, Elasticsearch is used as a tool to ingest, store, and analyze massive amounts of structured and unstructured data. The speed and scalability of Elasticsearch make it perfect for big data applications. Here’s an example of how you can use Elasticsearch to ingest and analyze big data:

# Let's say you have a JSON file with a large amount of data. You can use the following command to ingest that data into Elasticsearch:
curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/_bulk?pretty' --data-binary @yourfile.json

# Output:
# {
#   "took" : 30,
#   "errors" : false,
#   ...
#   "items" : [
#     ...
#   ]
# }

This command sends a POST request to the Elasticsearch _bulk API, which allows for bulk operations. The --data-binary option tells curl to read the data from the specified file and to send it as is.

Elasticsearch and Machine Learning

Elasticsearch’s machine learning features help you to detect anomalies and outliers in your data. By learning the normal behavior of your data, Elasticsearch can identify and alert you to any abnormalities.

Exploring Related Concepts: Kibana and Logstash

Elasticsearch is part of the Elastic Stack, which includes other powerful tools like Kibana and Logstash. Kibana is a data visualization tool that allows you to visualize your Elasticsearch data and navigate the Elastic Stack. Logstash is a server-side data processing pipeline that ingests data from multiple sources, transforms it, and then sends it to a “stash” like Elasticsearch.

Further Resources for Mastering Elasticsearch

To dive deeper into Elasticsearch and its capabilities, consider checking out the following resources:

  1. Elasticsearch: The Definitive Guide: A comprehensive guide provided by the creators of Elasticsearch.

  2. Elasticsearch in Action: A book that provides a deep understanding of Elasticsearch concepts.

  3. Elasticsearch Course on Udemy: Online courses that cover Elasticsearch and the Elastic Stack.

Recap: Elasticsearch Linux Installation

In this comprehensive guide, we’ve walked through the process of installing Elasticsearch on a Linux system. We’ve explored the power of Elasticsearch, a robust search engine that allows for real-time data analysis and visualization, and its importance in the realm of big data and machine learning.

We began with a basic introduction, discussing the need for Elasticsearch and its installation via package managers like APT and YUM. We then delved into more advanced topics, learning how to install Elasticsearch from source and how to handle specific versions. For the experts, we explored alternative methods of installation like using Docker, weighing their pros and cons.

Along the way, we tackled common issues you might encounter during the installation process and provided solutions to these problems. We also discussed key considerations to keep in mind when installing Elasticsearch, such as system resources and security.

Here’s a quick comparison of the installation methods we’ve discussed:

Installation MethodProsCons
Package ManagerEasy to use, Direct control over installationVersion might not be latest, System-specific
SourceAccess to latest features, Customizable buildMore complex, Requires Git and Gradle
DockerConsistent environment, Easy scalingRequires Docker knowledge, Might be overkill for simple use cases

Whether you’re a beginner looking to get started with Elasticsearch, or an experienced user aiming to deepen your understanding, we hope this guide has been a valuable resource. With the knowledge gained, you’re now equipped to install Elasticsearch on your Linux system and begin harnessing its power for data analysis. Happy data exploring!