Search relevancy in content and place search

Getting the relevant results is critical for the success of the community. Here you can what parameters impact the relevancy score for content items and places and the rank a result will get when you search for a specific search phrase.

Jive searches object fields. Stop words are excluded. The parameters, described in this article, impact the rank of a content item and can provide a boost to get it to the top of the search results.

The relevancy rank is calculated as follows:
Rank = (SimilarityScore + ProximityScore) * OutcomeType * ObjectType * Recency

The resulting rank determines what will be displayed in the user's search results and in what order, with the objective to surface the most relevant content first.

Fields that are searched

Cloud Search uses information from the following fields:

  • subject: Title field of Jive objects, such as place or doc title.
  • body: Content of objects, such as blog post text.
  • tag: Tags added to objects.
Attention: Stop words are removed before searching content from the object fields, as well as from the Analyzed and Edgengram fields.

Stop words

The Cloud Search service uses a predefined set of stop words for the specified languages. Stop words are common words often occurring in any text. For example, the stop words for the English language include of, the, this, a, and and.

When a search request is processed, these words are skipped from the searched object fields and the Analyzed and Edgengram sub-fields.

Similarity score

When searching for a phrase the system looks at each word in the phrase and checks the match type and place of match for this work. Each match type and place has its own boost score. The boost score is normalized with the number of times the searched term appears in the given content (the more it appears the better), as well as with the number of times this term appears in the search index (the more common the term is, the less impact it has on the rank). The default settings are listed in Table 1.

Match types reflect how well your search query matches the results:

  • Raw: Exact matches of the search term.
  • Analyzed: Matches that are created by language analyzer. In this case, stemming is used, that is, looking for the root of the word. Stop words are ignored. For example, focusing will also find focus, focused, and other related words with the same stem.
  • Edgengram: Partial match, used for wildcard search matches and matches in search-as-you-type queries. Stop words are ignored.
Table 1. Similarity boosts
Match place Match type
Raw Analyzed Edgengram
subject 1.0 1.0 1.0
body 0.1 0.1 0.1
tag 0.5   0.5
Attention: Stop words are removed before searching content from the object fields, as well as from the Analyzed and Edgengram fields.

Proximity score

The proximity score checks how close is the term the user searches for to what appears in the content – to boost more relevant results. When a user searches for a phrase built from several words, this phrase may appear exactly the same way in the content or it may appear in the content in a slightly different way. For example, content with the term product one-pager brochure is an approximate match when searching for product brochure.

Types of proximity boosts:

  • Exact match: When all the search terms appear in the content next to each other
  • Proximity match: When all the search terms appear less than three words apart from each other

The default settings are listed in Table 2.

Table 2. Proximity and exact match boosts
Place Proximity boost Exact match boost
Subject 0.5 1.6
Body 0.5 1.0
Tags** 0.1 1.0

** Having proximity score on tags is unlikely to happen.

Note: Exact and proximity matches are boosted using only raw sub-field for tags and only analyzed sub-field for other search fields.

A higher the boost value, more matches in the given field, and the match method get ranked higher. Stop words are ignored.

Besides the matches, we also look for frequency. The score has a lot to do with how many occurrences of the word user is searching for exists in the field. For example, if a 20,000-word essay makes a single reference to the movie Finding Nemo somewhere in the document and another document in the system has only 50 words and includes Finding Nemo, the latter is counted more relevant to a query for nemo.

Outcome type

Content in Jive can be marked with structured outcomes. These outcomes impact the score of that content in the search results, results are boosted based on outcome type.

The boosts given to content according to outcome type are listed in Table 3.

Table 3. Outcome boosts
Outcome Boost Outcome Boost
Finalized 1.4 Official 2.0
Outdated 0.1 Default 1.0

This score is being multiplied by the boosts above. A higher boost results in that content being ranked higher in the search results, so the 0.1 score for outdated documents significantly reduces their rank.

Object type

Similarly to outcome boost, there is a boost for ranks based on the type of content used. Documents and blogs are ranked higher in the search results as these are usually used for more comprehensive content that may be more relevant for the searching user. The settings are listed in Table 4.

Table 4. Object boosts
Object Boost Object Boost
Document 1.4 Poll 1.0
Blog 1.4 Idea 1.0
Blog post 1.4 Video 1.0
Discussion 1.0 Status Update 1.0
Question 1.0    

Recency

Recency (or time decay) lowers the score for older content. The impact of content can be seen this way:

Figure: Default recency boost



The recency score calculation is based on the following parameters:

Drop speed
Determines how fast the algorithm reduces the content score by age. The default setting in 50.
Max value
Determines the latest period the content from which has the same score without decay. The default setting is 4 weeks.
Minimum score
Determines the score difference of a very old document and a just created one as 2 times as maximum. It is set so that even the oldest relevant content can be found but allows preference for fresh content. The default setting is 0.9.