Python工具箱系列(三十二)

王皮皮真皮 · 发表于 2023-5-15 18:22:08

Elasticsearch

Elasticsearch是一个基于Lucene的搜索引擎。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful 的API接口。Elasticsearch是用Java语言开发的，并作为Apache许可条款下的开放源码发布，是非常流行的企业级搜索引擎。官方支持的客户端语言包括Java、.NET（C#）、PHP、Python、Apache Groovy、Ruby等。根据DB-Engines的排名显示，Elasticsearch是最受欢迎的企业搜索引擎，其次是Apache Solr，而Solr也是基于Lucene开发的。
Elasticsearch的安装方式有许多，官方也特别希望能够在公有云上部署。本文选择最简单的方式，直接在自己掌握的主机(ip:172.29.30.155)上安装。其安装过程如下所述：

# 这个安装过程也有可能非常慢。
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg
sudo apt-get install apt-transport-https
echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update && sudo apt-get install -y elasticsearch

复制代码

另一个简单的办法就是直接下载安装包。从官网上下载：

# 在ubuntu bionic目标机的终端下
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.3.2-amd64.deb
sudo dpkg -i elasticsearch-8.3.2-amd64.deb

复制代码

这种方式的好处是可以复制deb文件以多个计算机上，从而节省下载时间。需要安装的目标计算机越多，这种方式越合算。
在ubuntu bionic下，可以使用systemd对其进行管理。相关命令如下：

sudo /bin/systemctl daemon-reload
# 自动启动
sudo /bin/systemctl enable elasticsearch
# 启动
sudo systemctl start elasticsearch
# 查看状态
sudo systemctl status elasticsearch
# 如果出现错误，可以查看日志。
journalctl -f
journalctl -u elasticsearch
# 停止
sudo systemctl stop elasticsearch
# 重置口令,人工指定
/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic -i
# 重置口令,自动生成
/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic
# 测试之
curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic https://localhost:9200
curl --cacert /etc/elasticsearch/certs/http_ca.crt -u elastic https://172.29.30.155:9200

复制代码

获得的响应类似下列信息：

{
"name" : "dbservers",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "LFs6cpSHTSqLqbx6lRgkvw",
"version" : {
"number" : "8.3.2",
"build_type" : "deb",
"build_hash" : "8b0b1f23fbebecc3c88e4464319dea8989f374fd",
"build_date" : "2022-07-06T15:15:15.901688194Z",
"build_snapshot" : false,
"lucene_version" : "9.2.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}

复制代码

Elasticsearch的功能非常复杂，需要下功夫学习，本文只从python的角度来使用这个工具。官方推荐的模块安装如下：

pip install elasticsearch
# 为了能够完成安全验证，需要下载相关的证书到本地
scp root@172.29.30.155:/etc/elasticsearch/certs/http_ca.crt .

复制代码

完成后，以下代码简单示例了如何插入记录：

from elasticsearch import Elasticsearch
from datetime import datetime
serverip = "172.29.30.155"
cafile = r"d:\http_ca.crt"
ELASTIC_PASSWORD = "88488848"
indexname = "poetry"
index = 0
def connect():
client = Elasticsearch(
f"https://{serverip}:9200", ca_certs=cafile, basic_auth=("elastic", ELASTIC_PASSWORD))
return client
def docgen(author, content):
doc = {'author': author, 'text': content, 'timestamp': datetime.now(), }
return doc
def insert(con, id, doc):
resp = con.index(index=indexname, id=id, document=doc)
return resp['result']
def getbyindex(con, id):
resp = con.get(index=indexname, id=id)
return resp['_source']
def list(con):
resp = con.search(index=indexname, query={"match_all": {}})
print("Got %d Hits:" % resp['hits']['total']['value'])
for hit in resp['hits']['hits']:
print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])
def search(con, str):
resp = con.search(index=indexname, query={"match": {"text": str}})
print("Got %d Hits:" % resp['hits']['total']['value'])
for hit in resp['hits']['hits']:
print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])
# 连接
con = connect()
# 插入记录
index += 1
doc = docgen("李白", "天生我才必有用")
print(insert(con, index, doc))
index += 1
doc = docgen("杜甫", "功盖三分国，名成八阵图，江流石不转，遗恨失吞吴")
print(insert(con, index, doc))
# 准确获得记录
print(getbyindex(con, 1))
# 列出所有记录
list(con)
# 使用搜索功能，找到相关记录
search(con, "天生")

复制代码

上述代码只是简单地插入了2条记录。真正要发挥作用搜索引擎的能力，必须要将大量的信息导入，同时也要建设集群系统，这部分的内容请阅读官网相关资料，本文不再重复。

来源:https://www.cnblogs.com/shanxihualu/p/17402413.html
免责声明：由于采集信息均来自互联网，如果侵犯了您的权益，请联系我们【E-Mail:cb@itdo.tech】我们会及时删除侵权内容，谢谢合作！