Technology: The case of the stealth Google-bomb

Earlier this year, I was asked to work on a legal case involving a couple of finance sector companies where one was accused by the other of trademark infringement where the defendant’s website abruptly began ranking for the plaintiff’s service mark.

The only trouble was, the unique phrase was in no way associated with the defendant’s website. In this article, I’ll discuss a unique functionality of Google’s ranking algorithm.

What I expected to find in this case was quite different from what I ultimately found. Most search ranking factors are very straightforward.

SEO is an arena where the logical principle of Occam’s razor (“the simplest explanation is usually the right one”) is typically present in formulating explanations.

For instance, if a page ranks for a unique keyword, that keyword will be found within a page’s HTML code such as the title, body text, image alt text – or, failing that, the keyword might be present in links pointing to the page.

But, this case was to prove to be a rare instance wherein media linking the unique, branded phrase, was simply… absent.

The case overview

The scenario was quite unique in a few ways.

Some years ago, the plaintiff had dreamed up a sort of catchphrase that they used in traditional media to promote their business.

The catchphrase combined a word closely related to their business and products with a word that was not traditionally related to their industry.

They neglected to promote the catchphrase online, and as it was very unique, there were fewer webpages that appeared in Google search results for the precise query.

When searching for the catchphrase, 1.7 million pages were shown in the results, but many of these were only partial matches – once you searched with quotes around the phrase, the results dropped to only a little over three thousand.

The phrase had been used generically in a completely different industry, so most of the exact-match results related to that other use.

The plaintiff’s website used the catchphrase on some of their pages, and had the expectation that only their website would appear in the search results when it was searched upon.

The plaintiff advertised in offline media channels, and on the radio, they ended their promotion with a phrase like “For more information, search in Google for _____ _____!”

At some point, they themselves searched for “_____ _____” and thought it was a little odd that the defendant’s website appeared on page one, near the lower half of the first page of search results.

As their own website was at the top of the results, they did not think more of it at that time.

However, they eventually thought they might wish to register a .com domain name with their catchphrase. They were unhappy to discover the domain was already registered.

When they typed in the domain, the URL redirected to one of their top competitors, located in their area, not many miles from them.

That is when they consulted with a lawyer and then decided to sue.

When the attorney approached me to look into how the defendant’s website began ranking for the plaintiff’s catchphrase, I was pretty confident I would find them a veritable “smoking gun” explaining the undeserved ranking.

In my experience, keyword text must be associated with a webpage in some way for it to appear prominently in the rankings for that keyword query.

I thought it likely that I would find keywords, hidden or otherwise, in the code of the defendant’s website, or links with anchor text pointing to the webpage. (There is also the possibility of a page ranking for a keyword’s synonym searches, too.)

The keyword was nowhere in the website’s code

I first spot-checked the page that was appearing in the rankings – and did not see the keyword on the page, nor visible in its code.

I glanced at copies of the page in the Internet Archive’s repository of historic copies of webpages, and I did not find the keyword present in the page’s code.

To be thorough, I developed a list of all the webpages’ instances in the Internet Archive, and I set up Screaming Frog SEO spider to crawl all of them and to check for the keyword in the pages’ code.

I could not find the keyword in any of the historic copies of the page. I was a bit surprised at that since in some cases where I have been called in on trademark infringement, the offender can be fairly naive or brazen.

Then, I figure out where they wilfully infringed on domain name registrations. They might very well have used hidden keywords in drawing more connection with their competitor’s marks.

But, the keyword was completely absent from the thousands of copies of pages stored in the Internet Archive.

The keyword was also not in the backlinks

I next figured it must be present in the backlinks’ anchor text. It is possible to not have any keyword whatsoever on a webpage, and yet have it rank prominently for the keyword if external links have been developed containing the term. Using Majestic, I checked the backlinks and found no links containing the keyword.

At this point, I was surprised. There are generally few conditions where a webpage can rank for terms not present upon them, and not present in their backlinks’ anchor text.

One condition is if the keyword phrase is found within text on a page that is linking to the webpage in question. For this to happen, typically the overall topic of the page must be dominated by the keyword, or else one would expect the keyword to be very close to the link.

Having the keyword on the page but not in the anchor text is a pretty tenuous connection, and one would expect the relevancy conveyed to be pretty weak. This is harder to detect as well, as this is such a tenuous connection that industry link research tools do not detect it.

Another scenario I have encountered a few times involved the keyword being present on the page (and/or on pages linking to the webpage in question) for some period of time and then subsequently deleted.

I get to see this in online reputation management cases where a defamatory reference has been present, and we later persuade (or sue and force) a webpage to remove defamatory content or links.

You might be surprised to know that keyword associations with webpages linger in Google’s algorithm for a period of time in some instances, subsequent to deletion – this factor could be called “historical keyword association.”

But, the association typically fades pretty fast in most cases – the page begins subsiding in search results pretty rapidly in most cases after the keyword association has been removed. (In some cases, the keyword association seems to linger far longer.)

If you have read this far, you probably see where I’m leading: the cybersquatting of the domain name that started off this lawsuit.

Indeed this is where I am leading. In fact, it was four domain names that were set up using letter sequences that were confusingly similar to the plaintiff’s trademark. The client’s trademark followed this form: “The [Unique] [Industry]”.

Thus the first word in the phrase was “The”, followed by a “unique” word completely unrelated to the industry, followed by the final “industry” word which is very common throughout that niche of the finance sector. Altogether, the phase was distinctive.

The four domain names were formed in the following ways (spaces inserted for ease of reading):

The [Unique] [Industry] . com
[Unique] [Industry] . com
[Unique] [Industrys] . com
[Unique] . [Industry]

The first domain was the entire trademark word phrase with the spaces removed, followed by .com.

The second domain was the same as the first domain with the initial “The” term removed.

The third domain was the same as the second, only the industry term was made plural with the addition of an “s” at the end of the word preceding .com.

The fourth domain used one of the new generic top-level domains (New gTLD) where the Industry term is the TLD – as in Example.Finance.

I privately approached a few well-regarded SEO industry colleagues, and to a man, they all opined that such a linkage between keywords in the domain names was highly unlikely, or at best a very weak relevancy signal.

One of them sleuthed to find a business directory page where both the plaintiff and the defendant were listed, some distance apart on the page, along with addresses and links to their respective websites – descriptive text for each business listed on the pages included mention of the plaintiff’s catchphrase.

But, was this the explanation?

The catchphrase on the page was closer to the plaintiff’s website link by far, and other businesses were likewise listed on the page as well. Their websites did not rank for the phrase searches in Google.

This answer was highly unsatisfactory because co-mentions on a common directory page do not typically result in one’s competitors’ websites ranking for one’s trademark name search, but perhaps this very weak connection could be an explanation.

I began to think it was possible that the four infringing domain names that were redirected to the defendant’s website were responsible for the defendant’s website appearing on page one in Google’s rankings for “[Unique] [Industry]” searches.

The main thing linking the keyword term to the defendant’s website was the four domain names. This theory obviously depends in part on the question of whether keywords in domain names are used by Google.

Are keywords in domain names used by Google?

Keywords in URLs have long been known to convey keyword relevancy to their pages if constructed well.

As far back as 2009, former Googler Matt Cutts had stated that keywords in URLs help “a little bit”, as long as there was no keyword stuffing. Google’s SEO Starter Guide even now recommends to “Use words in URLs.”

However, domain names are very specific parts of URLs, and there has been an uneasy shift over time as many years back it was observed by many in the industry that keyworded domain names seemed to have been given too much weight.

In 2012, Google took pains to reduce the influence of exact match domain (“EMD”) names in the rankings for their keyword query equivalents. Whereas once upon a time, HumptyDumpty.com could be expected to rank unusually easily for “Humpty Dumpty” searches, the EMD update reduced the effect while not wiping it out altogether.

In a more recent timeframe, Google has emphasized that keywords in URLs have fairly minimal influence, once the page associated with them is indexed.

Google has downplayed the keyword URL influence because many websites have been known to expend resources beyond a realistic ROI in converting websites with abstract URLs into keyworded URLs, and lost some rankings and traffic in the process. (Because conversion to keyworded URLs changes requires time to re-establish rankings with the all-new resultant URLs, losing keyword ranking history advantages.)

Despite reducing the weighting of the EMD keyword relevancy factor and downplaying the influence of keywords in URLs and domain names, those keywords in domain names continue to be an influential factor.

Just aside from the keyword’s presence in a domain name, the domain name itself often gets used as the anchor text for links pointing to it, and this may cause or increase the keyword relevancy for the domain name.

It often becomes quite impossible to differentiate the effect of the keywords within the domain name with the anchor text of external links pointing to the domain.

Regardless, it may be taken as given that keyworded domain names carry a definite advantage in SEO.

Keywords in domain names are either used by Google in ranking determinations, or else the closely associated anchor text associated with links to the domain causes the domain to have ranking advantages for its own keywords.

The difference between those two things may be a moot point, as the result is that keywords in domain names are certainly a factor that enables webpages to rank for searches for those keyword queries.

As the keywords associated with the infringing domains are influential in search, the next question is whether the redirection of those domains could transfer the keyword relevancy to the URL they were pointed to.

Why is redirection used?

While keywords in domain names and links unequivocally have some ranking advantage for their equivalent/similar search queries, the next question one must ask is if a domain’s keyword relevancy is transferred through redirection.

A redirect is a method by which one internet address (a webpage URL) is made to forward a website visitor requesting it to instead view another, different page.

For example, if an internet user clicked upon a link on a webpage that pointed to “http://example.com”, that page could be set to automatically redirect the user to a different URL such as one at “http://destination.com”.

Redirection of internet URLs is similar in concept to setting up a phone number to forward calls to a different telephone number.

As you may know, redirects are set up to aid the website visitor in getting to the content they seek when they have visited a legacy URL for content that has since been moved.

Companies will often set up a redirect when they rebrand, necessitating a change in their domain names.

For instance, Overstock.com rebranded itself as “O.CO” in 2011, believing the shorter brand name and URL would be advantageous – and they redirected their domain name “overstock.com” and its URLs to “o.co” equivalent URLs.

This change allowed their customers who had bookmarked the original, legacy URLs or who were more familiar with the original website homepage address to be automatically taken to the new URLs if they typed in or clicked upon legacy URL links.

Domain names can also be redirected for the purpose of enabling consumers to navigate to a company’s website when they have mistyped a URL, or when a brand may have alternate spellings when typed by consumers.

Google has set up a number of these – for instance, typing “gogle.com”, “gooogle.com”, or “googel.com” into a browser window’s address box will result in redirecting to the canonical “google.com” domain name.

Domain names can also be redirected to combine or sustain the goodwill associated with a brand name.

For instance, in the early 2000s, FedEx acquired Kinko’s and combined the company’s services under its FedEx branding. Even now, almost 20 years later, “kinkos.com” is still set to redirect to the “fedex.com” domain name.

Kinko’s was a well-known brand name, so the FedEx company still maintains its intellectual property asset through the redirection of the domain name. (Trademarks generally must be kept “in use” in order to protect their registration status. Redirection could be one method for establishing that a trademarked name remains “in use”.)

Get the daily newsletter search marketers rely on.

See terms.

Can a domain name’s keywords in a redirected domain cause a separate webpage to rank for them?

The reason why some of my colleagues did not believe this could influence rankings was that it does not have a lot of specific documentation from Google or anyone else.

However, there are some intriguing hints that indicate there may indeed be a dynamic wherein keywords in a domain name will influence rankings for those keywords.

First of all, there is the closely similar dynamic we know quite a bit about – “Google bombing.”

Google bombing most famously came to be known circa 2004 as SEO pranksters caused the White House’s biography page for President George W. Bush to rank for the query, “miserable failure.”

This was accomplished by many people cooperating to create external links with the anchor text, “miserable failure”, and all pointing at the White House biography page.

The result of this effort caused the page to rank for words that were not found in the code of the webpage.

Google bombing works because Google’s algorithm considers the link’s text to be more about the page that the link points to than the page where the link is found. So, the keyword relevancy signals from the link’s anchor text are transferred to the page that is linked to.

Our question here is, “does Google transfer keyword relevancy from a redirected URL to the destination URL?”

There is a basis to believe they do. In 2009, Cutts stated:

“Typically, anchor text does flow through a 301 redirect.”

Also, Google’s contemporary documentation states:

“Ranking signals (such as PageRank or incoming links) will be passed appropriately across 301 redirects.”

This is perhaps yet unsatisfyingly vague. Colleagues I queried thought this was generally confined to ranking weight, classically referred to as PageRank, and not necessarily any other signals.

Or, perhaps this was confined to only the anchor text of links redirected to Google and no other ways that keywords could be connected to a page URL that gets redirected. And, what does “appropriately” mean in this context?

In the case I was working on, there was no other content particularly associated with the domains that were redirecting, so there was no anchor text to be transmitted.

The keywords themselves, present in the domain name, likely would have to be the source of the keyword relevancy that was applied to the defendant’s URL.

So, the question is also whether keywords in domain names are used by Google as a discreet ranking signal for keyword relevancy.

Twelve years back, Cutts said that “keywords in a URL” are influential, and the domain is part of the URL, but this is a bit indistinct.

A few years later, Cutts more clearly acknowledged that Google did pay some attention to keywords in domains and that they were adjusting the weighting of this as a factor.

Up until that point, having an exact match domain (“EMD”) for a keyword phrase was a major advantage, because it seemed to convey a significant ability to rank for searches involving the phrase.

But, Google published an update that revoked a lot of this innate advantage, particularly for low-quality websites.

It seems likely that Google did not revoke all influence of that signal, but those of us in the industry for a long time have known that this factor was much reduced from what it once was.

Of course, SEO industry expert books and guides have long advised that search engines use keywords in domains as ranking factors.

And, industry studies have shown a high correlation between having keywords included in domain names with ranking well for those keyword queries.

At least based on past history of the search engines, keywords in a domain name can confer relevancy in search for those related keyword queries.

And, there is cause to think that this ranking factor could indeed transfer through a redirect to a destination URL, as this functionality occurs for the anchor text in redirected links.

What happened in the cybersquatting case?

We should generally take it as given that a webpage does not rank for a keyword search if there is not some connection between the keyword and that page.

Google rankings do not occur in a keyword void. The website appeared to never contain the catchphrase.

Sure, a page can rank for synonyms – but, trust me, this keyword bears no semantic relation to any words found on the defendant’s website.

It could have made a partial match based on the industry term included in the catchphrase, but then many other local or nationwide companies should also have appeared on page one.

It is perhaps possible that the keyword could have been added to the webpage, and removed without a copy getting recorded in the Internet Archive.

But, this seems unlikely – the catchphrase should have been there for a discernible period if that was the case. Majestic’s historical backlinks likewise show no evidence of the keyword in the backlinks.

I shared info on the case with one colleague, and they found only one item connecting the defendant’s website to the catchphrase aside from the redirecting domain names.

A Yelp page happened to list both companies on the same “top 10” provider page, but the listings were not close in proximity.

Could keywords present on the same page with a link to the defendant’s site cause the site to be deemed relevant in search? Possibly.

However, without closer proximity to the link, this seemed unlikely.

Semrush showed that the defendant’s website began ranking abruptly for the catchphrase searches early in the year, the same month the domain names were registered:

The defendant’s website organic research report from Semrush. — The defendant’s website organic research from Semrush, showing when rankings began for the plaintiff’s trademarked catchphrase. While this report is for desktop, the website showed similar rankings on mobile as well.

I determined that the greatest likelihood was that the redirected domain names were the cause for the website to achieve prominence for a period of time when the trademark was searched.

The spike upward in Semrush for the catchphrase occurring directly in close tandem with the registration and redirection of the domain names seemed to be the root cause for why the website was able to appear for a phrase that was not present.

Even with the media advertising conducted by the plaintiff, the catchphrase is something of a niche search phrase.

Google Trends shows searches for the phrase began when the plaintiff began using it, but the volume is pretty low:

You might expect that keywords in a domain name might not lend much ranking weight, especially if redirected. I believe that is the case.

There was not a lot of online media involving the catchphrase, so the other items ranking on the page were:

Coincidental matches where the phrase was used a little in a casual/generic way in another state or two.
Or coincidental in that the industry term and unique term just happened to be on the same page in close proximity.

I think it is highly likely that if there had been more web content using the catchphrase, almost anything else would have outranked the defendant’s website.

That site showed it only appeared near the bottom half of the search results page, so it did not rise all that high.

Still, while the facts create a circumstantial case that the trademark phrase ranking was achieved via the cybersquatting and redirection of the domains, this is perhaps not certain proof that this is a thing.

If we use the scientific method, we can only say that the condition seems to support the theory, but there are still tantalizing alternate explanations when we are dealing with the black box that is Google.

So, I decided to test the theory.

Experiment: Can keywords in a redirected domain name cause a page to rank for them?

I wanted to imitate the main conditions from the legal case, so I would try to create a phrase that had very few pages that would be relevant to the phrase in search results.

In the legal case, the important parts of the phrase were a real word unrelated to their industry, coupled with a word commonly used in their industry.

For my test, I came up with the phrase “supercalifragilistic seo.” There are a few SEO industry sites out there that have the word “supercalifragilistic” on one of their webpages, but not particularly in direct combo with “SEO”.

I registered the following domains using this phrase:

supercalifragilisticseo.com
supercalifragilistic-seo.com
supercalifragilisticseo.agency
supercalifragilisticseo.media
supercalifragilisticseo.xyz

I took these domains and set them up with 301 redirects to point to my agency website’s homepage at argentmedia.com.

In order to move the experiment forward quickly, I submitted each of the domain names through Google Search Console to get them spidered.

While there was no evidence that the defendant in my legal case had done this, it was perhaps unnecessary because of the nature of the internet. There are numbers of websites that auto-generate content based on the domain name system (DNS).

These websites automatically generate profile pages listing out data about registered domain names, such as the WHOIS data showing registration data. They often also mix additional data into the pages, such as:

IP address location for the website.
Other domains on the same server.
Information scraped from the homepage at the domain.
Links to third-party statistics about the domain/website.
And more.

These autogenerated domain profile pages that appear everywhere could be one source by which Google discovers the domains.

That said, Google itself is a registrar that could visit all new domain name registrations. Alternatively, they have general access to the DNS of the internet so they could send Googlebot out to visit any domain names registered.

Some of the domain information pages that are autogenerated do not have direct links to the domain names they document. They just list the domain names out in unlinked text.

But even that is not necessarily an impediment to Google, since they have the ability to detect hyperlinks in text that are not linked.

Some have referred to such non-linked hyperlink mentions as “inferred links,” but Google spokespersons have stated that they do not use such for ranking purposes.

However, the door may still be open to the possibility that Google might use unlinked hyperlinks in text for URL discovery purposes, while not conferring any PageRank on the links.

Thus, these pages might be part of why the infringing domain names in my case were discovered by Google, leading to them potentially influencing the defendant’s website rankings for the trademarked term.

What I found was that after setting up the experimental domain names and redirecting them to my website, within a few weeks, ArgentMedia.com began ranking for the search term, but only really when the term was searched for in quotes: “supercalifragilistic SEO”.

Can keywords in a redirected domain name cause a page to rank for them?

Also, some of the automated domain profile sites appeared in the results as well, such as a page from “com.all-url.info” as shown in the above screenshot.

As of writing this piece, my website only ranks for the exact-match phrase when the search is conducted in quotes.

I expect this means that Google deems my site to be of only very tenuous potential relevancy for the search term. It may only be showing that result in the listing because there are very few webpages that match that precise query.

When conducting the search without quotes, there are many other webpages that match the query more closely due to having both words present in the visible text of the webpages.

Conclusion

It seems likely that my theory that keyword signals are passed through redirection is valid.

There remains some degree of fuzziness about the precise mechanisms involved, because it is not possible to isolate the influence of keywords found present in the domain name from keyword anchor text that could be appearing out there in relation to automatically generated domain profile websites or scraper websites.

Interestingly, there is one other historical event that further establishes how keyword signals are passed through redirection. Remember the President Bush “miserable failure” Google bomb? Google got around to suppressing the White House page’s rankings for the “miserable failure” searches in order to diffuse the bomb.

However, once Obama was elected into office, the White House IT personnel redirected the old profile page for Bush, pointing it to Obama’s new profile page.

As Google’s ranking suppression and content removals for search are based on using the URL as a unique identifier, redirection of the URL caused the “miserable failure” Google bomb suppression to become undone.

As many of the links pointing to Bush’s former biography page contained the “miserable failure” anchor text, the keywords then became redirected to Obama’s new page, eventually making it rank for “miserable failure” as well. This is yet another proof that keyword data is passed through redirection – not merely rank weight devoid of other signals.

Demonstrating this particular dynamic in Google’s algorithm does not appear to convey any worthwhile advantage, other than exposing a potential vulnerability that could be exploited by the next evolutionary level of a Google bombing prank.

If used as a Google bombing, be aware that this practice would then likely be deemed to be a black hat SEO technique.

Any ranking advantage this might convey seems altogether very, very weak – perhaps demonstrating how Google has discounted most or all keyword ranking advantage that was once innate in exact match domain names.

The automated nature of the advantage in terms of it resulting in links on various domain profile websites seems negligible, in part because such websites appear to be assessed by Google to be very low-quality or even spammy.

While I did not test it by using much larger quantities of keyworded domain names, I suspect that the introduction of hundreds of domain redirects linked from many low-quality domain profile websites might even incur a penalty.

(Note that I am not an attorney, and this article is not intended to be used for legal advice.)

The post The case of the stealth Google-bomb appeared first on Search Engine Land.

via Search Engine Land https://ift.tt/0CqRdIG

Tuesday, 18 October 2022

The case of the stealth Google-bomb

The case overview

The keyword was nowhere in the website’s code

The keyword was also not in the backlinks

Are keywords in domain names used by Google?

Why is redirection used?

Can a domain name’s keywords in a redirected domain cause a separate webpage to rank for them?

What happened in the cybersquatting case?

Experiment: Can keywords in a redirected domain name cause a page to rank for them?

Conclusion

No comments:

Post a Comment

Tuesday, 18 October 2022

The case of the stealth Google-bomb

The case overview

The keyword was nowhere in the website’s code

The keyword was also not in the backlinks

How cybersquatting related to the ranking of the webpage

Are keywords in domain names used by Google?

Why is redirection used?

Can a domain name’s keywords in a redirected domain cause a separate webpage to rank for them?

What happened in the cybersquatting case?

Experiment: Can keywords in a redirected domain name cause a page to rank for them?

Conclusion

No comments:

Post a Comment