Tuning BM25
One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned:
k1
This parameter controls how quickly an increase in term frequency results in term-frequency saturation. The default value is 1.2. Lower values result in quicker saturation, and higher values in slower saturation.
b
This parameter controls how much effect field-length normalization should have. A value of 0.0disables normalization completely, and a value of 1.0 normalizes fully. The default is 0.75.
The practicalities of tuning BM25 are another matter. The default values for k1 and b should be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again.
The similarity algorithm can be set on a per-field basis. It’s just a matter of specifying the chosen algorithm in the field’s mapping:
PUT /my_index
{
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type":       "string",
          "similarity": "BM25"
          },
        "body": {
          "type":       "string",
          "similarity": "default"
          }
      }
  }
}The   | |
The   | 
Currently, it is not possible to change the similarity mapping for an existing field. You would need to reindex your data in order to do that.
Configuring BM25
Configuring a similarity is much like configuring an analyzer. Custom similarities can be specified when creating an index. For instance:
PUT /my_index
{
  "settings": {
    "similarity": {
      "my_bm25": {
"type": "BM25",
        "b":    0}
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type":       "string",
          "similarity": "my_bm25"},
        "body": {
          "type":       "string",
          "similarity": "BM25"}
      }
    }
  }
}
参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html









