您当前的位置：首页 > 慢生活 > 程序人生网站首页 程序人生

elasticSearch内置分析器（停用词的使用）

发布时间：2021-09-09 21:19:25编辑：雪饮阅读（）

在上篇中了解了在一个索引中创建elasticSearch自定义分析器。

那么出了自定义分析器，开箱即用的还是内置分析器。

内置分析器可直接使用，无需任何配置。然而，其中一些支持配置选项来改变它们的行为。例如，standard分析器可以配置为支持停用词列表。

那么什么是停用词？

停用词.png

也就是说像是the、is之类这种没有实际意义的单词。

那么接下来我们给一个索引配置有如下内置分析器：

http://localhost:9200/my-index-000001 put

请求正文：

{

"settings": {

"analysis": {

"analyzer": {

"std_english": {

"type": "standard",

"stopwords": "_english_"

}

}

}

},

"mappings": {

"properties": {

"my_text": {

"type": "text",

"analyzer": "standard",

"fields": {

"english": {

"type": "text",

"analyzer": "std_english"

}

}

}

}

}

}

响应正文：

{

"acknowledged": true,

"shards_acknowledged": true,

"index": "my-index-000001"

}

我们将std_english分析器定义为基于standard 分析器，但配置为删除预定义的英语停用词列表。

接下来我们直接引用my_text字段即standard 分析器：

http://localhost:9200/my-index-000001/_analyze post

请求正文：

{

"field": "my_text",

"text": "The old brown cow"

}

响应正文：

{

"tokens": [

{

"token": "the",

"start_offset": 0,

"end_offset": 3,

"type": "<ALPHANUM>",

"position": 0

},

{

"token": "old",

"start_offset": 4,

"end_offset": 7,

"type": "<ALPHANUM>",

"position": 1

},

{

"token": "brown",

"start_offset": 8,

"end_offset": 13,

"type": "<ALPHANUM>",

"position": 2

},

{

"token": "cow",

"start_offset": 14,

"end_offset": 17,

"type": "<ALPHANUM>",

"position": 3

}

]

}

可见这个停止词是保留的，没有删除，就如这里的the。

接下来我们使用my_text.english字段调用即std_english分析器：

http://localhost:9200/my-index-000001/_analyze post

请求正文：

{

"field": "my_text.english",

"text": "The old brown cow"

}

响应正文：

{

"tokens": [

{

"token": "old",

"start_offset": 4,

"end_offset": 7,

"type": "<ALPHANUM>",

"position": 1

},

{

"token": "brown",

"start_offset": 8,

"end_offset": 13,

"type": "<ALPHANUM>",

"position": 2

},

{

"token": "cow",

"start_offset": 14,

"end_offset": 17,

"type": "<ALPHANUM>",

"position": 3

}

]

}

可以看到，这里the这个停止词就被删除了，

关键字词：elasticSearch,内置分析器,停用词

上一篇：elasticSearch索引中创建自定义分析器(custom)及按分析器名和按字段引用调用

下一篇：elasticSearch创建custom分析器搭载html条带字符过滤器

相关文章