Introduction to the SMITH algorithm
In October 2020, Google published a research paper on a new algorithm called SMITH, which also stands for “Siamese Multi-Depth Transformer-based Hierarchical”. The algorithm is used to understand the longer form of the document or web page content.
The SMITH algorithm analyzes the broader part of the content present on the page such as sentences, paragraphs, and even the entire document.
You may be thinking, isn’t this what Google does when it comes to ranking pages in search engine results? What then distinguishes this algorithm from other similar algorithms?
Let’s understand the much deeper aspect of the algorithm below.
Why was it needed at all? Why is BERT often mentioned along with the SMITH algorithm? What’s the difference between the two? And what can we do to be safer?
Before understanding the above, it is important to understand the role of natural language processing for search engines.
Using natural language processing (NLP) for page ranking
Natural language processing is used to understand the context of content and its sound using artificial intelligence. NLP is based on the context of the document and this interpretation is used to provide search results that are most relevant to search queries.
The use of NLP has become even more important as voice search becomes increasingly popular around the world. Search engines rely heavily on NLP to understand the intent of the words used in the content.
What is the BERT algorithm?
Bidirectional encoder representations of transformers are widely known as the BERT algorithm was published by Google in 2019. The goal of using the BERT algorithm is to understand users’ search intent for long queries.
The term “bidirectional” in the BERT algorithm explains the way in which the words in the document are evaluated. The algorithm interprets the set of words from ‘before’ and ‘after’ of a given keyword. Let us understand this using the following two sentences.
- “The human body is made up of millions of cells”
- “The prisoners were locked in cells”
To understand the search intent of the users, the BERT algorithm interprets both sentences for the keyword ‘cell’.
For sentence 1, the algorithm understands the context of the keyword “cell” by surrounding words such as “human body”.
For sentence 2, the algorithm will understand the relevance of “cell” by other keywords in the line such as “prisoner”.
Hence, the BERT algorithm understands users’ search intent by interpreting the correct set of keywords in the document.
BERT and SMITH algorithm
The BERT algorithm had some limitations on the long form of the content on the page.
It is suitable for short lines and sentences.
To solve this problem, the SMITH algorithm offers the advantage that content is rated in long form. It can interpret sentence after sentence, paragraph after paragraph, and the entire document.
Although the BERT and SMITH algorithms have the same intent with different algorithmic capabilities, the SMITH algorithm still relies on the BERT algorithm and is not a separate algorithm in action.
Speculation about the implementation of the SMITH update
Google published a research paper on the SMITH algorithm in 2020. The research talks about the benefits of the SMITH algorithm and how it outperforms the BERT update. Here’s what the research excerpt explains, as cited below.
“Our experimental results with multiple benchmark data sets for long-form document matching show that our proposed SMITH model outperforms previous prior art models, including hierarchical attention, attention-based hierarchical multi-depth recurring neural network, and BERT. Compared to BERT-based baselines, our model can increase the maximum length of the input text from 512 to 2048. We will provide a Wikipedia-based benchmark dataset, code, and pre-trained checkpoint as an open source version to accelerate future research on long-form document matching. “
There are no official updates for the implementation of the SMITH update. The researchers believe that the SMITH algorithm is in the initial phase or is partially implemented and tested together with the BERT algorithm.
What’s in it for you
In order to rank a page in search results, it is always important that you provide your users with valuable information, create brand awareness and educate your users.
The key to achieving the rank is to create a general user experience online using the guidelines set by search engines. The overall user experience is enhanced when the content is relevant to your users and answers their questions, concerns, and expectations of your brand. And it’s easy to switch between pages within a website. Although there are several other parameters that search engines consider for ranking, your website has good scope to rank at the top as long as you maintain and provide a user-friendly environment for your users.