程序人生网站首页 程序人生

elasticSearch创建custom分析器char_filter,tokenizer,filter的自定义

发布时间：2021-09-09 23:18:42编辑：雪饮阅读（）

前面的示例使用分词器、标记过滤器和字符过滤器及其默认配置，但可以创建每个的配置版本并在自定义分析器中使用它们。

这是一个更复杂的elasticSearch创建custom分析器的示例。

http://localhost:9200/my-index-000001 put

请求正文：

{

"settings": {

"analysis": {

"analyzer": {

"my_custom_analyzer": {

"char_filter": [

"emoticons"

"tokenizer": "punctuation",

"filter": [

"lowercase",

"english_stop"

]

}

"tokenizer": {

"punctuation": {

"type": "pattern",

"pattern": "[ .,!?]"

}

"char_filter": {

"emoticons": {

"type": "mapping",

"mappings": [

":) => _happy_",

":( => _sad_"

]

}

"filter": {

"english_stop": {

"type": "stop",

"stopwords": "_english_"

}

响应正文：

{

"acknowledged": true,

"shards_acknowledged": true,

"index": "my-index-000001"

}

这里简单分析下：

首先是这部分：

"my_custom_analyzer": {

"char_filter": [

"emoticons"

"tokenizer": "punctuation",

"filter": [

"lowercase",

"english_stop"

]

}

这里是定义了custom分析器，

其中字符过滤器使用的是一个名叫emoticons的字符过滤器

然后tokenizer标记器使用的是一个名叫punctuation的标记器

然后过滤器使用了lowercase和名叫english_stop的过滤器。

那么这里使用到的emoticons的字符过滤器定义又如：

"char_filter": {

"emoticons": {

"type": "mapping",

"mappings": [

":) => _happy_",

":( => _sad_"

]

}

这个定义意思是如果又字符“:）”就都替换为”_happy_”，如果有”:(”这个字符就都替换为”_sad_”

那么punctuation的标记器定义又如：

"tokenizer": {

"punctuation": {

"type": "pattern",

"pattern": "[ .,!?]"

}

这里用作拆分标点符号“ .,!?”就是说遇到“ .,!?”这几个标点符号就都会做为分隔符。

那么再接下来就是使用的过滤器lowercase是内置的，而使用的english_stop过滤器定义如：

"filter": {

"english_stop": {

"type": "stop",

"stopwords": "_english_"

}

这里的定义就是

https://www.gaojiupan.cn/manshenghuo/chengxurensheng/3967.html

这篇中的停用词。

那么至此总结下来就是，遇到“ .,!?”几个符号都视为拆词所用的分隔符号，并且遇到字串” :)”和” :(”就分别对应替换为字串” _happy_”和” _sad_”，最后再加上遇到英文停止词就舍去。

那么此时我们就可以逢迎上面这字符过滤器、分词器、令牌过滤器的要求。对一个符合这三个要求的文本内容进行分析。

http://localhost:9200/my-index-000001/_analyze post

请求正文：

{

"analyzer": "my_custom_analyzer",

"text": "I'm a :) person, and you?"

}

响应正文：

{

"tokens": [

{

"token": "i'm",

"start_offset": 0,

"end_offset": 3,

"type": "word",

"position": 0

{

"token": "_happy_",

"start_offset": 6,

"end_offset": 8,

"type": "word",

"position": 2

{

"token": "person",

"start_offset": 9,

"end_offset": 15,

"type": "word",

"position": 3

{

"token": "you",

"start_offset": 21,

"end_offset": 24,

"type": "word",

"position": 5

}

]

}

看来很是符合我们的自定义custom分析器呢。

关键字词：elasticSearch,custom,char_filter,tokenizer,filter

上一篇：elasticSearch创建custom分析器搭载html条带字符过滤器

下一篇：elasticSearch创建指定字段分析器

您当前的位置： 首页 > 慢生活 > 程序人生 网站首页程序人生

elasticSearch创建custom分析器char_filter,tokenizer,filter的自定义

相关文章

您当前的位置：首页 > 慢生活 > 程序人生网站首页程序人生