elasticsearch入门

安装

下载：curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.3.4/elasticsearch-2.3.4.tar.gz
解压：tar -xvf elasticsearch-2.3.4.tar.gz
cd elasticsearch-2.3.4/bin
运行：./elasticsearch

在此，运行es必须在非root用户下，否则会报错：不能在root用户下启动elasticsearch，所以通过以下步骤增加es用户，并修改拥有和组策略。

1. adduser es  #添加es用户
2. chown -R es elasticsearch-2.3.4
3. chgrp -R es elasticsearch-2.3.4

可以指定cluster和node的名称：

./elasticsearch --cluster.name my_cluster_name --node.name my_node_name

启动日志如下：

[INFO ][node                     ] [Bling] version[2.3.4], pid[28352], build[e455fd0/2016-06-30T11:24:31Z]
[node                     ] [Bling] initializing ...
[INFO ][plugins                  ] [Bling] modules [lang-groovy, reindex, lang-expression], plugins [], sites []
[INFO ][env                      ] [Bling] using [1] data paths, mounts [[/ (/dev/vda2)]], net usable_space [14.2gb], net total_space [23.6gb], spins? [possibly], types [ext4]
[INFO ][env                      ] [Bling] heap size [989.8mb], compressed ordinary object pointers [true]
[node                     ] [Bling] initialized
[node                     ] [Bling] starting ...
[transport                ] [Bling] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[discovery                ] [Bling] elasticsearch/2Rrwk_-SSqW0dMIrOYEfIg
[cluster.service          ] [Bling] new_master {Bling}{2Rrwk_-SSqW0dMIrOYEfIg}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
[http                     ] [Bling] publish_address {127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[node                     ] [Bling] started
[gateway                  ] [Bling] recovered [0] indices into cluster_state

注意：需要修改elasticsearch.yml文件中的network.host: 127.0.0.1属性，否则只能通过回环地址访问。

集群

在了解了cluster和node之后，需要了解如何与它们进行交互。es提供了一套很容易理解和强大的REST API，API提供以下功能

1. 检测cluster，node和index的健康情况，状态和统计。
2. 管理cluster，node和index等。
3. 提供在index上的crud和搜索操作。
4. 提供高级搜素如：分页，排序，过滤，scripting（这个不知道怎么翻译），聚合等。

若为集群配置host，需要以下配置进行扫描
discovery.zen.ping.unicast.hosts: [“需要扫描的ip1”, “需要扫描的ip1]

检测集群监控情况：

curl 'localhost:9200/_cat/health?v'
结果：
epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 
1469435794 16:36:34  elasticsearch green           1         1      0   0    0    0        0             0                  -                100.0%

curl 'http://localhost:9200/_cluster/health?pretty'
以json的形式返回结果

健康情况包括green，yellow，red。黄色意味着所有节点均可以提供服务，但是有些复制节点未分配；红色意味着某些数据不可以用，不能提供服务，但整体服务并不会中断。

curl 'localhost:9200/_cat/nodes?v'
host      ip        heap.percent ram.percent load node.role master name         
127.0.0.1 127.0.0.1            5          97 0.91 d         *      Doctor Druid

展示目录信息：结果说明并不包含任何数据信息。

curl 'localhost:9200/_cat/indices?v'
health index pri rep docs.count docs.deleted store.size pri.store.size

创建索引

curl -XPUT 'localhost:9200/customer?pretty'  #使用put方法创建一个名为customer的索引
curl 'localhost:9200/_cat/indices?v'
health status index    pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   customer   5   1          0            0       650b           650b 
状态是yellow的一个原因是rep没有分配，会在稍后一个新的node加入后得到分配，从而使状态变为green。

索引和查询

curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
{
  "name": "John Doe"
}'
返回结果:
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
 "_version" : 1,
  "created" : true
}    
curl -XGET 'localhost:9200/customer/external/1?pretty'
{
  "_index" : "customer",
  "_type" : "external",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "_source" : { "name": "John Doe" }
}

删除索引

curl -XDELETE 'localhost:9200/customer?pretty'
{
  "acknowledged" : true
}
curl 'localhost:9200/_cat/indices?v'
health index pri rep docs.count docs.deleted store.size pri.store.size

API总结为以下形式语言

curl -X<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID>

更新索引

覆盖式更新字段
curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '
{
  "name": "Jane Doe"
}'
如果ID的位置指定了一个新值，或者未指定，会在es中插入一条新记录，同时如果未指定id，将由es随机生成一个。

基于文档更新

curl -XPOST 'localhost:9200/customer/external/1/_update?pretty' -d '
{
      "doc": { "name": "Jane Doe", "age": 20 }
}'
修改name字段，同时增加新字段age

默认，在更新和查询到结果间会有1s延迟

批量处理_bulk

创建两个文档
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
'
更新文档1，删除文档2
curl -XPOST 'localhost:9200/customer/external/_bulk?pretty' -d '
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'

批处理命令中，某条命令失败，不会影响后续。最后会有返回结果状态，可用于检测各条命令是否成功执行。

查询

json形式的语言，我们称之为领域特定语言DSL。

#  REST request URI 
curl 'localhost:9200/bank/_search?q=*&pretty'
#  REST request body
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} }
}'

# 查询第11-20条，按balance降序
curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "from": 10,
  "size": 10，
  "sort": { "balance": { "order": "desc" } }
}'

一些场景查询
"query": { "match": { "address": "mill lane" } }
"query": { "match_phrase": { "address": "mill lane" } }
与
"query": {
    "bool": {
      "must": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
或
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
均不满足，均不是
"query": {
    "bool": {
      "must_not": [
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ]
    }
  }
逻辑组合，既是，同时又不是
"query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ]
    }
  }

filter

curl -XPOST 'localhost:9200/bank/_search?pretty' -d '
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}'
过滤（filter）是一个范围（range），balance字段，大于或等于（gte,greater than or equal)2000,且小于或等于（lte，lesst than or equal）3000

聚合aggregation

类似于sql中的group by或者是聚合函数（aggregation function）。

在es中，我们可以在获取到命中数据的同时，获取到聚合数据结果。