0
点赞
收藏
分享

微信扫一扫

ES基本的聚合查询

E_topia 2021-09-28 阅读 75

ES提供了强大的聚合分析功能,按照操作上细化,可以主要分为四种,如下表所示:

聚合方式 解释
Bucket Aggregation 一些满足特定条件的文档的集合
Metric Aggregation 一些数学计算,可以对文档字段统计分析
Pipeline Aggregation 对其他的聚合结果进行二次聚合
Metrix Aggregation 支持对多个字段的操作并提供一个结果矩阵

     在我个人看来这些只是理论意义上的细化,在实际的应用过程中,我们并没有说针对那种场景使用那种聚合分析。都是为了满足我们的业务,在实现的过程中同时会使用到多种聚合的方式。

一. 四种聚合方式

1.1 Bucket(分桶)

     分桶就是将具有某一类共同特征的数据归为一类,然后求其总数,例如: 男女、公司同一工作岗位的员工、商品高中低档等。在对数据分桶后还可以进一步分桶,例如:0~ 20岁男性、21~50岁男性、50岁以上男性;同一工作岗位男性、女性;高档商品好评、中评、差评的商品。


1.2 Metric(计算)

     计算具有一类特征的数据的统计值,例如平均值、最大值、最小值等。

1.3 Pipeline(管道)

     pipeline与Linux操作系统中的管道操作(将上一步操作的结果作为下一步操作的数据源)类似。即将上一次聚合操作的结果作为下一次聚合操作的数据源。

1.4 Metrix(矩阵)

     矩阵就是同时可以支持多值的输出,例如对分桶的数据同时求平均、最大、最小值;

二. 具体的案例

     在说具体的案例的时候笔者并不会严格的去按照四种聚合方式去讲解。首先在ES中插入一批的测试数据,在插入测试数据之前先定义mapping.

2.1 mapping的定义

PUT employee
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "keyword"
},
"job": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"gender": {
"type": "keyword"
}
}
}
}

2.2 插入数据

PUT employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}

数据说明:插入的数据为员工信息,name是员工的姓名,job是员工的工种,age为员工的年龄,sal为员工的薪水,gender为员工的性别

2.3 聚合查询

GET employee/_search
{
"size": 0,
"aggs": {
"job_category_count": {
"cardinality": {
"field": "job"
}
}
}
}

GET employee/_search
{
"size": 0,
"aggs": {
"job_category_num": {
"cardinality": {
"field": "job"
}
}
}
}

GET employee/_search
{
"size": 0,
"aggs": {
"job_analysis": {
"terms": {
"field": "job"
},
"aggs": {
"age_top_1": {
"top_hits": {
"size": 1,
"sort": [
{
"age": {
"order": "desc"
}
}
]
}
}
}
}
}
}

GET employee/_search
{
"size": 0,
"aggs": {
"sal_range_info": {
"range": {
"field": "sal",
"ranges": [
{
"to": 5000
},
{
"from": 5001,
"to": 8000
},
{
"from": 8001,
"to": 12000
},
{
"from": 12001,
"to": 18000
},
{
"from": 18001
}
]
}
}
}
}

GET employee/_search
{
"size": 0,
"aggs": {
"sal_histogram": {
"histogram": {
"field": "sal",
"interval": 5000,
"extended_bounds": {
"min": 0,
"max": 25000
}
}
}
}
}

GET employee/_search
{
"size": 0,
"aggs": {
"job_and_salary_info": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}

GET employee/_search
{
"size": 0,
"aggs": {
"job_gender_sal_info": {
"terms": {
"field": "job"
},
"aggs": {
"gender_info": {
"terms": {
"field": "gender"
},
"aggs": {
"sal_info": {
"stats": {
"field": "sal"
}
}
}
}
}
}
}
}

GET employee/_search
{
"size": 0,
"aggs": {
"jobs": {
"terms": {
"field": "job"
},
"aggs": {
"sal_info": {
"avg": {
"field": "sal"
}
}
}
},
"min_avg_sal": {
"max_bucket": {
"buckets_path": "jobs>sal_info"
}
}
}
}

三. ES自带航空数据案例

GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
}
}
}
}

GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"max_ticket_price": {
"max": {
"field": "AvgTicketPrice"
}
},
"avg_ticket_price": {
"avg": {
"field": "AvgTicketPrice"
}
}
}
}
}
}

GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs": {
"dest_info": {
"terms": {
"field": "DestCountry"
},
"aggs": {
"ticket_info": {
"stats": {
"field": "AvgTicketPrice"
}
},
"weather_info": {
"terms": {
"field": "DestWeather"
}
}
}
}
}
}

鸣谢

全文几乎没有自己原创的内容,只是对极客时间中阮一鸣老师的 Elasticsearch核心技术与实战 自己稍稍的做了下总结。

举报

相关推荐

0 条评论