Elasticsearch——Query DSL语法入门-CFANZ编程社区

Query DSL入门

官网介绍链接： https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html

search api就是对存储在elastic search（以下简称es）中的数据进行查询的相关API，可以类比mysql中的select语句。es中的search主要分为URI Search和Query DSL，其中又以Query DSL语法为主，也是我们学习es所要重点掌握的。

DSL介绍

Domain Specific Language：领域特定语言

Elasticsearch基于JSON提供完整的查询DSL来定义查询。

一个查询可由两部分字句构成：

Leaf query clauses 叶子查询字句：Leaf query clauses 在指定的字段上查询指定的值，如：match、term or range queries. 叶子字句可以单独使用。
Compound query clauses 复合查询字句：以逻辑方式组合多个叶子、复合查询为一个查询。

一个查询字句的行为取决于它是用在query context 还是 filter context 中：

Query context 查询上下文：用在查询上下文中的字句回答“这个文档有多匹配这个查询?”。除了决定文档是否匹配，字句匹配的文档还会计算一个字句评分，来评定文档有多匹配，会参与相关性评分。查询上下文由 query 元素表示。
Filter context 过滤上下文：过滤上下文由 filter 元素或 bool 中的 must not 表示。用在过滤上下文中的字句回答“这个文档是否匹配这个查询？”，不参与相关性评分。被频繁使用的过滤器将被ES自动缓存，来提高查询性能。

如下语句：

GET /_search
{
  query: { 
    bool: { 
      must: [
        { match: { title:   Search        }}, 
        { match: { content: Elasticsearch }}  
      ],
      filter: [ 
        { term:  { status: published }}, 
        { range: { publish_date: { gte: 2015-01-01 }}} 
      ]
    }
  }
}

DSL

query string 后边的参数原来越多，搜索条件越来越复杂，不能满足需求。

GET /book/_search?q=name:java&size=10&from=0&sort=price:desc

DSL：Domain Specified Language，特定领域的语言。

es特有的搜索语言，可在请求体中携带搜索条件，功能强大。

查询全部 GET /book/_search

GET /book/_search
{
  query: { match_all: {} }
}

排序 GET /book/_search?sort=price:desc

GET /book/_search 
{
    query : {
        match : {
            name :  java
        }
    },
    sort: [
        { price: desc }
    ]
}

分页查询 GET /book/_search?size=10&from=0

GET  /book/_search 
{
  query: { match_all: {} },
  from: 0,
  size: 1
}

指定返回字段 GET /book/ _search? _source=name,studymodel

GET /book/_search 
{
  query: { match_all: {} },
  _source: [name, studymodel]
}

通过组合以上各种类型查询，实现复杂查询。

Query DSL语法

{
    QUERY_NAME: {
        ARGUMENT: VALUE,
        ARGUMENT: VALUE,...
    }
}

{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}

GET /test_index/_search 
{
  query: {
    match: {
      test_field: test
    }
  }
}

组合多个搜索条件

搜索需求：title必须包含elasticsearch，content可以包含elasticsearch也可以不包含，author_id必须不为111

sql where and or !=

初始数据：


POST /website/_doc/1
{
          title: my hadoop article,
          content: hadoop is very bad,
          author_id: 111
}

POST /website/_doc/2
{
          title: my elasticsearch  article,
          content: es is very bad,
          author_id: 112
}
POST /website/_doc/3
{
          title: my elasticsearch article,
          content: es is very goods,
          author_id: 111
}

搜索：

GET /website/_doc/_search
{
  query: {
    bool: {
      must: [
        {
          match: {
            title: elasticsearch
          }
        }
      ],
      should: [
        {
          match: {
            content: elasticsearch
          }
        }
      ],
      must_not: [
        {
          match: {
            author_id: 111
          }
        }
      ]
    }
  }
}

更复杂的搜索需求：

select * from test_index where name='tom' or (hired =true and (personality ='good' and rude != true ))

GET /test_index/_search
{
    query: {
            bool: {
                must: { match:{ name: tom }},
                should: [
                    { match:{ hired: true }},
                    { bool: {
                        must:{ match: { personality: good }},
                        must_not: { match: { rude: true }}
                    }}
                ],
                minimum_should_match: 1
            }
    }
}

Match all query

查询所有

GET /_search
{
    query: {
        match_all: {}
    }
}

GET /book/_search
{
    query: {
        match_none: {}
    }
}

full-text search 全文检索

官网：https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html

全文检索，用于对分词的字段进行搜索。会用查询字段的分词器对查询的文本进行分词生成查询。可用于短语查询、模糊查询、前缀查询、临近查询等查询场景。

新建book索引

PUT /book/
{
  settings: {
    number_of_shards: 1,
    number_of_replicas: 0
  },
  mappings: {
    properties: {
      name:{
        type: text,
        analyzer: ik_max_word,
        search_analyzer: ik_smart
      },
      description:{
        type: text,
        analyzer: ik_max_word,
        search_analyzer: ik_smart
      },
      studymodel:{
        type: keyword
      },
      price:{
        type: double
      },
      timestamp: {
         type: date,
         format: yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis
      },
      pic:{
        type:text,
        index:false
      }
    }
  }
}

插入数据

PUT /book/_doc/1
{
name: Bootstrap开发,
description: Bootstrap是由Twitter推出的一个前台页面开发css框架，是一个非常流行的开发框架，此框架集成了多种页面效果。此开发框架包含了大量的CSS、JS程序代码，可以帮助开发者（尤其是不擅长css页面开发的程序人员）轻松的实现一个css，不受浏览器限制的精美界面css效果。,
studymodel: 201002,
price:38.6,
timestamp:2019-08-25 19:11:35,
pic:group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg,
tags: [ bootstrap, dev]
}

PUT /book/_doc/2
{
name: java编程思想,
description: java语言是世界第一编程语言，在软件开发领域使用人数最多。,
studymodel: 201001,
price:68.6,
timestamp:2019-08-25 19:11:35,
pic:group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg,
tags: [ java, dev]
}

PUT /book/_doc/3
{
name: spring开发基础,
description: spring 在java领域非常流行，java程序员都在用。,
studymodel: 201001,
price:88.6,
timestamp:2019-08-24 19:11:35,
pic:group1/M00/00/00/wKhlQFs6RCeAY0pHAAJx5ZjNDEM428.jpg,
tags: [ spring, java]
}

match query

全文检索的标准查询，它可以对一个字段进行模糊、短语查询。 match queries 接收 text/numerics/dates, 对它们进行分词分析, 再组织成一个boolean查询。可通过operator 指定bool组合操作（or、and 默认是 or ），以及minimum_should_match 指定至少需多少个should(or)字句需满足。还可用ananlyzer指定查询用的特殊分析器。

GET  /book/_search 
{
    query : {
        match : {
            description : java程序员
        }
    }
}

执行查询：

GET  /book/_search 
{
    query : {
        match : {
            description : java程序员 #分词后用or
        }
    }
}

GET /book/_search 
{
  query: {
    match : {
      description : {
        query : java 程序员,
        operator: and #指定分词后用and
      }
    }
  }
}

模糊查询，可以指定fuzziness最大编辑数

最大编辑数为2，说明query字符串中分词后，每个词允许编辑两次单个字符，可删除、新增、修改字符。
fuzziness 参数可以被设置为 AUTO，此时字符串只有 1 到 2 个字符时是 0；字符串有 3 、4 或者 5 个字符时是 1；字符串大于 5 个字符时是 2。
有时编辑距离 2 仍然是太多了，返回的结果似乎并不相关。把最大fuzziness设置为1 ，可以得到更好的结果和更好的性能。

GET /book/_search
{
  query: {
    match: {
      description: {
        query: va 程序,
        fuzziness: 2
      }
    }
  }
}

关于模糊查询fuzziness的说明，可以参看：https://www.elastic.co/guide/cn/elasticsearch/guide/current/fuzziness.html

还可以使用minimum_should_match指定最少需要满足几个词匹配：

GET /book/_search
{
  query: {
    match: {
      description: {
        query: av 程序员 spring,
        fuzziness: 2,
        minimum_should_match: 2
      }
    }
  }
}

还可用max_expansions 指定模糊匹配的最大词项数，默认是50。比如：反向索引中有 100 个词项与 ava 模糊匹配，只选用前50 个。

GET /book/_search
{
  query: {
    match: {
      description: {
        query: ava 程序员 spring,
        fuzziness: 2,
        minimum_should_match: 2,
         max_expansions: 50 
      }
    }
  }
}

match phrase query

match_phrase 查询用来对一个字段进行短语查询，可以指定 analyzer、slop移动因子，和match的区别在于：match_query是有顺序要求的，而match是无序的。

GET /book/_search
{
  query: {
    match_phrase: {
      description: java 程序员
    }
  }
}

可以通过slop参数来控制单词之间的允许间隔

GET /book/_search
{
  query: {
    match_phrase: {
      description: {
        query: java 程序员,
        slop: 2
      }
    }
  }
}

match phrase prefix query

match_phrase_prefix 在 match_phrase 的基础上支持对短语的最后一个词进行前缀匹配

查询f开头的：

GET /book/_search
{
  query: {
    match_phrase_prefix : {
      description : spring 在 ja
    }
  }
}

指定前缀匹配选用的最大词项数量：

GET /book/_search
{
  query: {
    match_phrase_prefix : {
      message : {
        description : spring 在 ja,
        max_expansions : 10
      }
    }
  }
}

multi match query

如果你需要在多个字段上进行文本搜索，可用multi_match 。multi_match在 match的基础上支持对多个字段进行文本查询。

GET /book/_search
{
  query: {
    multi_match: {
      query: java程序员,
      fields: [name, description]
    }
  }
}

还可以使用*匹配多个字段：

GET /book/_search
{
  query: {
    multi_match: {
      query: java程序员,
      fields: [name*, desc*]
    }
  }
}

query string query

query_string 查询，让我们可以直接用lucene查询语法写一个查询串进行查询，ES中接到请求后，通过查询解析器解析查询串生成对应的查询。使用它要求掌握lucene的查询语法。

GET /book/_search
{
    query: {
        query_string : {
            default_field : description,
            query : java 程序员 spring
        }
    }
}

query_string支持多字段匹配

GET /book/_search
{
  query: {
    query_string : {
      fields : [description, name],
      query : java 程序员 spring
    }
  }
}

可与query同用的参数，如 default_field、fields，及query 串的语法请参考： https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

simple query string query

simple_query_string 查同 query_string 查询一样用lucene查询语法写查询串，较query_string不同的地方：更小的语法集；查询串有错误，它会忽略错误的部分，不抛出错误。更适合给用户使用。

GET /book/_search
{
  query: {
    simple_query_string : {
        query: \fried eggs\ +(eggplant | potato) -frittata,
        fields: [description^5, name$], # 明显错误的写法，但是不报错，查不出数据
        default_operator: and
    }
  }
}

语法请参考： https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html

词项查询

官网：https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html

term query

term 查询用于查询指定字段包含某个词项的文档。

GET /book/_search
{
  query: {
    term: {
      description: spring
    }
  }
}

terms query

terms 查询用于查询指定字段包含某些词项的文档。

GET /book/_search
{
  query: { 
    terms: { 
      tags: [ 
        search, 
        full_text, 
        dev 
      ]
    }
  }
}

Terms 查询支持嵌套查询的方式来获得查询词项，相当于 in (select term from other)

PUT /users/_doc/2
{
    followers : [1, 3]
}

PUT /tweets/_doc/1
{
    user : 1
}

GET /tweets/_search
{
    query : {
        terms : {
            user : {
                index : users,
                type : _doc,
                id : 2,
                path : followers
            }
        }    
	}
}

嵌套查询可用参数说明：

range query

范围查询

gte：大于等于
gt：大于
lte：小于等于
lt：小于
boost：查询权重

GET /book/_search
{
  query: {
    range: {
      price: {
        gte: 80,
        lte: 90,
        boost : 2.0
      }
    }
  }
}

GET /book/_search
{
    query: {
        range : {
            date : {
                gte : now-1d/d,  #当前时间减1天后转成天数
                lt :  now/d  #当前时间转成天数
            }
        }
    }
}

GET /book/_search
{
    query: {
        range : {
            born : {
                gte: 01/01/2012,
                lte: 2013,
                format: dd/MM/yyyy||yyyy
            }
        }
    }
}

时间舍入||说明：

gt：大于的情况下，四舍五入，比如2014-11-18||/M变成2014-11-30T23:59:59:999，不包含整个月。
gte：大于等于的情况下，向下取整，比如2014-11-18||/M变成2014-11-01，包含整个月。
lt：小于的情况下，向下取整，比如2014-11-18||/M变成2014-11-01，不包含整个月。
lte：小于等于的情况下，四舍五入，比如2014-11-18||/M变成2014-11-30T23:59:59:999，包含整个月。

时间数学计算规则请参考： https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#date-math

exits query

查询指定字段值不为空的文档。相当 SQL 中的 column is not null

GET /book/_search
{
    query: {
        exists: {
            field: description
        }
    }
}

prefix query 词项前缀查询

GET /book/_search
{
  query: {
    prefix: {
      name: {
        value: spring
      }
    }
  }
}

GET /book/_search
{
  query: {
    prefix: {
      name: spring
    }
  }
}

wildcard query 通配符查询

GET /book/_search
{
  query: {
    wildcard : { name : spr*g }
  }
}

GET /book/_search
{
  query: {
    wildcard: { 
      name : {
        value: spr*g,
        boost: 2
      }
    }
  }
}

regexp query 正则查询

GET /book/_search
{
  query: {
    regexp:{
      name: sp.*g
    }
  }
}

GET /book/_search
{
  query: {
    regexp: {
      description: {
        value: j.*a,
        flags : ALL,
        max_determinized_states: 10000,
        rewrite: constant_score
      }
    }
  }
}

正则语法参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax

fuzzy query 模糊查询

返回包含与搜索词类似的词的文档，该词由Levenshtein编辑距离度量。

包括以下几种情况：

更改角色（box→fox）
删除字符（aple→apple）
插入字符（sick→sic）
调换两个相邻字符（ACT→CAT）

GET /book/_search
{
  query: {
    fuzzy: {
      description: {
      value: jave
      }
    }
  }
}

GET /book/_search
{
  query: {
    fuzzy : {
      name : {
        value: sp,
        boost: 1.0,
        fuzziness: 2,
        prefix_length: 0,
        max_expansions: 100
      }
    }
  }
}

ids 根据文档id查询

GET /book/_search
{
  query: {
    ids : {
      values : [1, 4, 100]
    }
  }
}

Filter

filter与query示例

需求：用户查询description中有"java程序员"，并且价格大于80小于90的数据。

GET /book/_search
{
  query: {
    bool: {
      must: [
        {
          match: {
            description: java程序员
          }
        },
        {
          range: {
            price: {
              gte: 80,
              lte: 90
            }
          }
        }
      ]
    }
  }
}

使用filter:

GET /book/_search
{
  query: {
    bool: {
      must: [
        {
          match: {
            description: java程序员
          }
        }
      ],
      filter: {
        range: {
          price: {
            gte: 80,
             lte: 90
          }
        }
      }
    }
  }
}

filter与query对比

filter：仅仅只是按照搜索条件过滤出需要的数据而已，不计算任何相关度分数，对相关度没有任何影响。
query：会去计算每个document相对于搜索条件的相关度，并按照相关度进行排序。

应用场景：一般来说，如果你是在进行搜索，需要将最匹配搜索条件的数据先返回，那么用query 如果你只是要根据一些条件筛选出一部分数据，不关注其排序，那么用filter。

filter与query性能

filter：不需要计算相关度分数，不需要按照相关度分数进行排序，同时还有内置的自动cache最常使用filter的数据。
query：相反，要计算相关度分数，按照分数进行排序，而且无法cache结果。范围查询，keyword关键字查询。

定位错误语法

验证错误语句：

GET /book/_validate/query?explain
{
  query: {
    mach: {
      description: java程序员
    }
  }
}

{
  valid : false,
  error : org.elasticsearch.common.ParsingException: no [query] registered for [mach]
}

正确：

GET /book/_validate/query?explain
{
  query: {
    match: {
      description: java程序员
    }
  }
}

{
  _shards : {
    total : 1,
    successful : 1,
    failed : 0
  },
  valid : true,
  explanations : [
    {
      index : book,
      valid : true,
      explanation : description:java description:程序员
    }
  ]
}

一般用在那种特别复杂庞大的搜索下，比如你一下子写了上百行的搜索，这个时候可以先用validate api去验证一下，搜索是否合法。

合法以后，explain就像mysql的执行计划，可以看到搜索的目标等信息。

定制排序规则

默认排序规则

默认情况下，是按照_score降序排序的。

然而，某些情况下，可能没有用到_score，比如说filter。

但是query里面直接写filter会报错，这时就用到了constant_score。

只过滤的正确写法：

GET /book/_search 
{
  query: {
    constant_score: {
      filter : {
            term : {
                studymodel : 201001
            }
        }
    }
  }
}

定制排序规则

相当于sql中order by ?sort=sprice:desc

GET /book/_search 
{
  query: {
    constant_score: {
      filter : {
            term : {
                studymodel : 201001
            }
        }
    }
  },
  sort: [
    {
      price: {
        order: asc
      }
    }
  ]
}

Text字段排序问题

如果对一个text field进行排序，结果往往不准确，因为分词后是多个单词，再排序就不是我们想要的结果了。

通常解决方案是，将一个text field建立两次索引，一个分词，用来进行搜索；一个不分词，用来进行排序。

fielddate:true

PUT /website 
{
  mappings: {
  properties: {
    title: {
      type: text,
      fields: {
        keyword: {
          type: keyword
        }        
      }      
    },
    content: {
      type: text
    },
    post_date: {
      type: date
    },
    author_id: {
      type: long
    }
  }
 }
}

插入数据


PUT /website/_doc/1
{
  title: first article,
  content: this is my second article,
  post_date: 2019-01-01,
  author_id: 110
}

PUT /website/_doc/2
{
    title: second article,
    content: this is my second article,
     post_date: 2019-01-01,
    author_id: 110
}

PUT /website/_doc/3
{
     title: third article,
     content: this is my third article,
     post_date: 2019-01-02,
     author_id: 110
}

搜索

GET /website/_search
{
  query: {
    match_all: {}
  },
  sort: [
    {
      title.keyword: {
        order: desc
      }
    }
  ]
}

Scroll分批查询

场景：下载某一个索引中1亿条数据，到文件或是数据库。

不能一下全查出来，系统内存溢出。所以使用scoll滚动搜索技术，一批一批查询。

scoll搜索会在第一次搜索的时候，保存一个当时的视图快照，之后只会基于该旧的视图快照提供数据搜索，如果这个期间数据变更，是不会让用户看到的。

每次发送scroll请求，我们还需要指定一个scoll参数，指定一个时间窗口，每次搜索请求只要在这个时间窗口内能完成就可以了。

搜索

GET /book/_search?scroll=1m
{
  query: {
    match_all: {}
  },
  size: 3
}

{
  _scroll_id : DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ==,
  took : 3,
  timed_out : false,
  _shards : {
    total : 1,
    successful : 1,
    skipped : 0,
    failed : 0
  },
  hits : {
    total : {
      value : 3,
      relation : eq
    },
    max_score : 1.0,
    hits : [
     
    ]
  }
}

获得的结果会有一个scoll_id，下一次再发送scoll请求的时候，必须带上这个scoll_id

GET /_search/scroll
{
    scroll: 1m, 
    scroll_id : DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAMOkWTURBNDUtcjZTVUdKMFp5cXloVElOQQ==
}

与分页区别：

分页给用户看的 deep paging
scroll是用户系统内部操作，如下载批量数据，数据转移。零停机改变索引映射。

复合查询

官网：https://www.elastic.co/guide/en/elasticsearch/reference/current/compound-queries.html

constant score query

用来包装另一个查询，将查询匹配的文档的评分设为一个常值。

GET /_search
{
    query: {
        constant_score : {
            filter : {
                term : { user : kimchy}
            },
            boost : 1.2
        }
    }
}

bool query

复合查询就是指可以对多个字段过滤筛选，类比mysql的where多条件查询，es的复合查询包括Constant Score Query、Bool Query、Dis Max Query、Function Score Query、Boosting Query，这里详细说一说用的比较多的Bool Query。

Bool 查询用bool操作来组合多个查询字句为一个查询。可用的关键字：

must：根据must中的条件过滤文档，返回的结果文档必须严格匹配条件，会影响相关性算分。
filter：根据must中的条件过滤文档，返回的结果文档必须严格匹配条件，和must不同的是，filter不会影响相关性算分。
should：或，根据should中的条件进行筛选，返回的结果文档应该包含should的条件，影响相关性算分。
must_not：根据must_not中的条件过滤文档，返回的结果文档必须不包含must_not条件，会影响相关性算分，在filter上下文中执行，不参与、不影响评分。

GET /book/_search
{
  query: {
    bool : {
      must : {
        term : { name : spring }
      },
      filter: {
        term : { name : spring }
      },
      must_not : {
        range : {
          price : { gte : 10, lte : 20 }
        }
      },
      should : [
        { term : { tag : spring } },
        { term : { tag : java } }
      ],
      minimum_should_match : 4, # 表示命中4个词的文档才会返回
      boost : 1.0
    }
  }
}

1、must、must_not、should支持数组，同时filter的查询语句，es会对其进行智能缓存，因此执行效率较高，在不需要算分的查询语句中，可以考虑使用filter替代普通的query语句;
2、查询语句同时包含must和should时，可以不满足should的条件，因为must条件优先级高于should，但是如果也满足should的条件，则会提高相关性算分;
3、可以使用minimum_should_match参数来控制应当满足条件的个数或百分比;
4、must、must_not语句里面如果包含多个条件，则各个条件间是且的关系，而should的多个条件是或的关系。

参考： https://blog.csdn.net/supermao1013/article/details/84261526

https://blog.csdn.net/fy_java1995/article/details/106674644

https://blog.csdn.net/fanrenxiang/article/details/86477019

https://www.cnblogs.com/reycg-blog/p/10002794.html