Google’s Latent Semantic Indexing
Billions of websites used to saturate the internet. The endless onslaught of spam, link farms, irrelevant ads and so forth surely didn’t help the situation, either. Hence, information overload took place-followed by what seemed like a search-engine event horizon.
Around that time, Google said “enough is enough” and started using the latent semantic indexing (LSI) method more and more over the legacy way of doing things-that is, using keywords and keyword strings to dish out relevant search results. LSI, as most insiders know, generally provides more confident search results.
What Exactly is LSI?
LSI is a word relationship method of generating relevant search results, based on natural language processing. While it’s not the only technology that moguls like Google use, it’s becoming immensely popular. LSI is a more sophisticated method than using keywords/key phrases alone; it uses a complex set of algorithms that automatically compute statistical probabilities based on the frequency (semantic distance) of words or keyword phrases that are relative to the actual context of the pages they are embedded on and to other pages similar to it.
Is it technical? Yes. To break it down a little, consider this: the LSI search method uses certain parameters to gauge the “distance” between words on a site and compares the difference to the overall theme of the website and to that of other similar websites that have the highest click quotient. The ultimate goal is to prune as many useless or irrelevant websites from the top results of a search and only show the most reliable. Consider this about Google:
• Google’s engineers continually refine their search parameters to weed-out or rank very lowly spam-which includes, yes, sites that are ripe with overstuffed keywords and phrases (a condition otherwise known as ‘over-optimized’).
• Pages with a variety of related keywords tend to rank higher with Google.
• LSI takes its ”findings” about the page and then proceeds to analyze, or scan, the remainder of the site-looking for additional relevant and related content.
• It’s good to keep in mind that while latent semantic indexing analyzes and computes statistical relationship among words, it understands nothing about the individual words or the context that they’re used in.
It can be a little difficult to visualize the process, but once you understand, everything basically comes full circle.
Relevant websites are then rounded up and grouped by their overall theme. For instance, typing “2010 Mercedes-Benz S-Class” into Google ranks millions of pages nearly instantly on parameters such as the ones set forth above. Using the overall themes of those pages, the program groups the most relevant websites with the words “Mercedes-Benz” “2010″ “S” “Class”, chooses lists the sites most closely relate to themes (as determined by LSI) such as “cars”, “car reviews”, and “luxury”-and displays the pages with the highest number of clicks that are relative to both the theme and the anchor text (keywords and phrases).
Implementing LSI and Google’s Semantically Related Words
There are a myriad of proven methods to use in successfully making your website LSI-friendly:
• Try the Google AdSense sandbox approach on a block of sample text. This, basically, tells you how strong or weak the anchor text and related keywords are. • Learn to alter key phrases and words without changing the meaning; there are a ton of good keyword suggestion applications out there. • Never center your website around one or two keywords; instead, base it off of a main theme and expand into related sub-themes. • Keywords are still important; use similar words to them, as well as plural forms of them and vary (where possible) the tense of related verbs across the site. • Use inbound links wisely. While targeting the keyword(s) is important, focus similar and relevant variations of them around inbound links as well.