1. Elasticsearch 简介
Elasticsearch 是一个基于 Lucene 的开源分布式搜索和分析引擎,由 Elastic 公司开发。它具有以下特点:
- 分布式:可以轻松扩展到数百台服务器,处理 PB 级数据
- 实时性:数据一旦被索引,立即可被搜索
- 全文检索:强大的全文搜索能力
- RESTful API:提供简单易用的 JSON 风格 API
- 多功能:不仅是搜索引擎,还是强大的分析引擎
2. 核心概念
在深入 Elasticsearch 之前,我们需要理解几个基本概念:
Elasticsearch | 关系型数据库 |
---|---|
索引 (Index) | 数据库 (Database) |
类型 (Type) | 表 (Table) |
文档 (Document) | 行 (Row) |
字段 (Field) | 列 (Column) |
映射 (Mapping) | 表结构 (Schema) |
分片 (Shard) | 数据分区 |
副本 (Replica) | 数据备份 |
3. 安装与设置
安装 Elasticsearch
# 下载 Elasticsearch(以 7.x 版本为例)
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.0-linux-x86_64.tar.gz# 解压
tar -xzf elasticsearch-7.17.0-linux-x86_64.tar.gz# 启动
cd elasticsearch-7.17.0/
./bin/elasticsearch
验证安装成功:
curl http://localhost:9200/
输出结果:
{"name" : "node-1","cluster_name" : "elasticsearch","cluster_uuid" : "xyzABCdefGHI123456","version" : {"number" : "7.17.0","build_flavor" : "default","build_type" : "tar","build_hash" : "abcd1234","build_date" : "2022-01-01T12:34:56.789Z","build_snapshot" : false,"lucene_version" : "8.11.1","minimum_wire_compatibility_version" : "6.8.0","minimum_index_compatibility_version" : "6.0.0-beta1"},"tagline" : "You Know, for Search"
}
4. 基本操作 (CRUD)
Elasticsearch 提供了 RESTful API 进行各种操作,常用的 HTTP 方法如下:
- GET:获取资源
- POST:创建资源
- PUT:创建或更新资源
- DELETE:删除资源
- HEAD:检查资源是否存在
4.1 创建索引
# 创建索引语法
PUT /索引名称
{"settings": {"number_of_shards": 分片数,"number_of_replicas": 副本数}
}
例子:
PUT /blog
{"settings": {"number_of_shards": 3,"number_of_replicas": 1}
}
响应:
{"acknowledged": true,"shards_acknowledged": true,"index": "blog"
}
4.2 添加文档
# 添加文档语法 - 指定ID
PUT /索引名称/_doc/文档ID
{"字段1": "值1","字段2": "值2",...
}# 添加文档语法 - 自动生成ID
POST /索引名称/_doc
{"字段1": "值1","字段2": "值2",...
}
例子:
PUT /blog/_doc/1
{"title": "Elasticsearch入门","author": "张三","content": "这是一篇关于Elasticsearch的入门文章","tags": ["搜索引擎", "Elasticsearch"],"created_at": "2023-01-01T10:00:00"
}
响应:
{"_index": "blog","_type": "_doc","_id": "1","_version": 1,"result": "created","_shards": {"total": 2,"successful": 2,"failed": 0},"_seq_no": 0,"_primary_term": 1
}
4.3 查询文档
# 查询文档语法 - 按ID查询
GET /索引名称/_doc/文档ID# 查询所有文档
GET /索引名称/_search
例子:
# 按ID查询
GET /blog/_doc/1
响应:
{"_index": "blog","_type": "_doc","_id": "1","_version": 1,"_seq_no": 0,"_primary_term": 1,"found": true,"_source": {"title": "Elasticsearch入门","author": "张三","content": "这是一篇关于Elasticsearch的入门文章","tags": ["搜索引擎", "Elasticsearch"],"created_at": "2023-01-01T10:00:00"}
}
4.4 更新文档
# 更新文档语法
POST /索引名称/_update/文档ID
{"doc": {"字段1": "新值1","字段2": "新值2"}
}
例子:
POST /blog/_update/1
{"doc": {"title": "Elasticsearch快速入门","tags": ["搜索引擎", "Elasticsearch", "教程"]}
}
响应:
{"_index": "blog","_type": "_doc","_id": "1","_version": 2,"result": "updated","_shards": {"total": 2,"successful": 2,"failed": 0},"_seq_no": 1,"_primary_term": 1
}
4.5 删除文档
# 删除文档语法
DELETE /索引名称/_doc/文档ID
例子:
DELETE /blog/_doc/1
响应:
{"_index": "blog","_type": "_doc","_id": "1","_version": 3,"result": "deleted","_shards": {"total": 2,"successful": 2,"failed": 0},"_seq_no": 2,"_primary_term": 1
}
4.6 删除索引
# 删除索引语法
DELETE /索引名称
例子:
DELETE /blog
响应:
{"acknowledged": true
}
5. 搜索功能
Elasticsearch 的核心功能是搜索,它提供了丰富的查询功能。
5.1 基本查询
# 查询语法
GET /索引名称/_search
{"query": {"查询类型": {"参数": "值"}}
}
例子:
# 查询标题中包含"Elasticsearch"的文档
GET /blog/_search
{"query": {"match": {"title": "Elasticsearch"}}
}
响应:
{"took": 5,"timed_out": false,"_shards": {"total": 3,"successful": 3,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 0.6931472,"hits": [{"_index": "blog","_type": "_doc","_id": "1","_score": 0.6931472,"_source": {"title": "Elasticsearch快速入门","author": "张三","content": "这是一篇关于Elasticsearch的入门文章","tags": ["搜索引擎", "Elasticsearch", "教程"],"created_at": "2023-01-01T10:00:00"}},{"_index": "blog","_type": "_doc","_id": "2","_score": 0.5753642,"_source": {"title": "深入理解Elasticsearch","author": "李四","content": "本文详细介绍Elasticsearch的内部原理","tags": ["Elasticsearch", "原理"],"created_at": "2023-01-02T15:30:00"}}]}
}
5.2 布尔查询
GET /索引名称/_search
{"query": {"bool": {"must": [{ "match": { "字段1": "值1" } }],"should": [{ "match": { "字段2": "值2" } }],"must_not": [{ "match": { "字段3": "值3" } }],"filter": [{ "term": { "字段4": "值4" } }]}}
}
例子:
# 查询标题包含"Elasticsearch"且作者不是"王五"的文档
GET /blog/_search
{"query": {"bool": {"must": [{ "match": { "title": "Elasticsearch" } }],"must_not": [{ "match": { "author": "王五" } }]}}
}
5.3 查询结果排序
GET /索引名称/_search
{"query": {"match_all": {}},"sort": [{ "字段1": { "order": "desc" } },{ "字段2": { "order": "asc" } }]
}
例子:
# 按创建时间排序查询
GET /blog/_search
{"query": {"match_all": {}},"sort": [{ "created_at": { "order": "desc" } }]
}
5.4 分页查询
GET /索引名称/_search
{"from": 起始位置,"size": 返回数量,"query": {"match_all": {}}
}
例子:
# 分页查询,返回第2页的10条数据
GET /blog/_search
{"from": 10,"size": 10,"query": {"match_all": {}}
}
5.5 聚合查询
GET /索引名称/_search
{"size": 0,"aggs": {"聚合名称": {"聚合类型": {"field": "字段名"}}}
}
例子:
# 获取作者发文数量统计
GET /blog/_search
{"size": 0,"aggs": {"authors": {"terms": {"field": "author.keyword","size": 10}}}
}
响应:
{"took": 10,"timed_out": false,"_shards": {"total": 3,"successful": 3,"skipped": 0,"failed": 0},"hits": {"total": {"value": 10,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"authors": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "张三","doc_count": 3},{"key": "李四","doc_count": 2},{"key": "王五","doc_count": 1}]}}
}
6. 实际应用场景
6.1 网站搜索
很多网站的搜索功能都是基于 Elasticsearch 实现的。用户可以通过关键词快速找到相关内容,同时支持高亮显示、搜索建议、拼写纠错等功能。
示例场景:电商网站商品搜索
# 创建商品索引
PUT /products
{"mappings": {"properties": {"name": { "type": "text", "analyzer": "ik_max_word" },"description": { "type": "text", "analyzer": "ik_max_word" },"price": { "type": "float" },"category": { "type": "keyword" },"tags": { "type": "keyword" },"stock": { "type": "integer" },"created_at": { "type": "date" }}}
}# 搜索名称或描述中包含"手机"的商品,按价格降序排列
GET /products/_search
{"query": {"multi_match": {"query": "手机","fields": ["name", "description"]}},"sort": [{ "price": { "order": "desc" } }]
}
6.2 日志分析
Elasticsearch 是 ELK 栈(Elasticsearch、Logstash、Kibana)的核心组件,广泛应用于日志收集和分析。
示例场景:Web服务器日志分析
# 查询特定时间范围内的错误日志
GET /logs/_search
{"query": {"bool": {"must": [{ "match": { "level": "ERROR" } }],"filter": [{"range": {"timestamp": {"gte": "2023-01-01T00:00:00","lte": "2023-01-31T23:59:59"}}}]}},"sort": [{ "timestamp": { "order": "desc" } }]
}
6.3 数据可视化
结合 Kibana,可以将 Elasticsearch 中的数据进行可视化展示,如仪表盘、折线图、饼图等。
示例场景:业务监控仪表盘
# 按小时统计API请求量
GET /api_logs/_search
{"size": 0,"aggs": {"requests_per_hour": {"date_histogram": {"field": "timestamp","calendar_interval": "hour"}}}
}
6.4 实时分析
Elasticsearch 支持实时数据分析,可以用于实时监控和报警系统。
示例场景:异常监控
# 监控最近5分钟内的异常请求
GET /system_logs/_search
{"query": {"bool": {"must": [{ "match": { "status": "error" } }],"filter": [{"range": {"timestamp": {"gte": "now-5m","lte": "now"}}}]}}
}
7. 高级功能
7.1 映射(Mapping)
映射是定义文档及其字段如何存储和索引的过程。
# 创建带映射的索引
PUT /users
{"mappings": {"properties": {"username": { "type": "keyword" },"email": { "type": "keyword" },"bio": { "type": "text" },"age": { "type": "integer" },"join_date": { "type": "date" },"location": { "type": "geo_point" }}}
}
7.2 分析器(Analyzer)
分析器用于处理文本字段,包括分词、过滤等操作。
# 创建自定义分析器
PUT /my_index
{"settings": {"analysis": {"analyzer": {"my_custom_analyzer": {"type": "custom","tokenizer": "standard","filter": ["lowercase", "asciifolding"]}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "my_custom_analyzer"}}}
}
7.3 集群管理
查看集群健康状态:
GET /_cluster/health
响应:
{"cluster_name": "elasticsearch","status": "green","timed_out": false,"number_of_nodes": 3,"number_of_data_nodes": 3,"active_primary_shards": 15,"active_shards": 30,"relocating_shards": 0,"initializing_shards": 0,"unassigned_shards": 0,"delayed_unassigned_shards": 0,"number_of_pending_tasks": 0,"number_of_in_flight_fetch": 0,"task_max_waiting_in_queue_millis": 0,"active_shards_percent_as_number": 100.0
}
8. 总结
Elasticsearch 是一个功能强大的搜索和分析引擎,具有以下优势:
- 强大的搜索能力:支持全文搜索、结构化搜索、复杂查询等
- 实时分析:数据一旦索引立即可被搜索和分析
- 分布式架构:易于水平扩展,支持高可用
- RESTful API:简单易用的接口
- 丰富的生态系统:与 Logstash、Kibana、Beats 等工具集成形成完整解决方案
本指南涵盖了 Elasticsearch 的基本概念和操作,包括索引管理、文档CRUD、各种查询方式以及实际应用场景。通过这些基础知识,你可以开始在项目中使用 Elasticsearch 来实现强大的搜索和分析功能。
随着对 Elasticsearch 的深入学习,你还可以探索更多高级功能,如聚合分析、地理位置搜索、机器学习等,以满足更复杂的业务需求。