Key Takeaways From Google Document Leak: What Matters Most For SEO
It’s been a couple of weeks since the news of Google’s search API document leak went viral—sending shock waves throughout the SEO world. This is undoubtedly the biggest SEO story of the year, and it’s already been covered extensively by all of the biggest players in this space.
With nearly 2,600 modules and roughly 14,000 attributes in the API documents, saying there’s a lot to unpack would be a drastic understatement.
Instead of going through everything, I’ve decided to share my two cents on what matters the most and what you should be doing right now to adjust your organic search strategy accordingly.
Google’s Search Algorithm Did Not Leak
I’ve seen a ton of misinformation spreading on the web about this. To clarify, Google’s search algorithm did not leak.
There’s nothing in the documentation that contains specific ranking factors and how certain criteria is weighted to determine search results. In fact, it’s not even clear which parts of the documents are live, in production, or just used for testing.
This is really important for everyone to understand, as you shouldn’t be abandoning your content marketing strategies and SEO plan on what could help your website. I personally think a big mistake to use this new leak is the new north star or single source of truth for SEO strategy.
That said, we can definitely read between the lines and draw our own conclusions from the information. All of it is still incredibly valuable for SEOs, and it’s arguably in the top five biggest SEO news stories that I’ve seen in roughly three decades in the industry.
Google Does Have a Site Authority Score Used in its Ranking System
Google has long denied having a measurement similar to Moz’s Domain Authority (DA) or Domain Rating (DR) from Ahrefs.
But the leaked documents say otherwise.
Debate over.
There it is—siteAuthority, straight from the document leak.
That said, it’s unclear exactly how Google’s “site authority” is calculated. And it may be very different from how Moz determines a site’s DA or how Ahrefs measures DR.
It’s also worth noting that Google’s siteAuthority attribute was found in the CompressedQualitySignals module of documentation. To me, this suggests that Google’s site authority isn’t as link-specific as DA or DR. Instead, it’s probably based on things like page quality scores, clicks, and other aspects of Navboost.
Links Still Matter (If You’re Getting the Right Links)
My biggest takeaway from the leak is that links still matter for SEO. There’s been some skepticism in recent years on whether or not backlinks still hold value and if they’re worth pursuing.
I’ve always held strong in my opinion that building a solid backlink profile will help your website climb the SERPs, and it’s nice to see my stance validated by the document leak.
That said, links only matter if you’re getting the right links—more specifically:
- Links should come from a relevant source
- Links from the same country hold more value than foreign links
- Continuing to get new links is crucial
- Links from seed sites might be the most important of all
Let’s dive deeper into each of these below.
Relevant Links
There are several components of Google’s API documentation that suggest relevancy is important when it comes to backlinks and how they’re weighted in the eyes of Google.
For one, the CompressedQualitySignals module contains an anchorMismatchDemotion.
To me, I think it’s safe to assume that Google is “demoting” links that have a mismatch between the source page and target.
The same module also contains a topicEmbeddingVersionedData attribute. This is likely Google’s way of using NLP to understand the different topics and context associated with a web page. So they can figure out if one page linking to another source is related or not.
Furthermore, it appears that Google is also using content surrounding the anchor text of a link for additional context when determining relevance.
Google has been telling us for years that it’s important to write good anchor text. This is something that’s clearly stated in Google’s link best practices within its Search Central documents.
But the leaked documents take this one step further, suggesting that Google is using more than just anchor text to determine the relevancy of a link. We found references in the leak containing:
- context2 — Hash of terms near the anchor
- fullLeftContext — Full context of text to the left of an anchor, not written in linklogs
- fullRightContext — Full context of text to right of an anchor, not written in linklogs
Logically, this makes sense. We should know that a blog post about cats and dogs containing a link to a 3D printer isn’t relevant—even if the anchor text of that link is relevant. However, this is the first time we’ve somewhat “confirmed” this theory.
Local Links
The leak also suggests that local links (aka links from the same country) probably hold more weight than links from foreign countries.
Why?
Well, the document leak contained an attribute of localCountryCodes within an AnchorsAnchorSource module.
The local country codes attribute specifically refers to “countries to which the source page is local/most relevant.”
This isn’t to say that links from other countries are harmful to your website (unless they’re junk links, spam, or totally irrelevant). But it’s likely that backlinks from your own country are better for SEO purposes.
Newer Links
There’s one part of the leak that I haven’t seen talked about too frequently—specifically related to the importance of getting new links.
If we look at the sourceType, we can see that Google is using TYPE_HIGH_QUALITY, TYPE_MEDIUM_QUALITY, and TYPE_LOW_QUALITY as a metric related to the anchor’s source page and the page index tier.
But what really stands out about this information is the TYPE_FRESHDOCS, which refers to newly published content (highlighted at the bottom of the screenshot above).
It looks like Google has a special case where links to new content can be marked as “high quality.”
I think that this shows how important an ongoing link-building strategy is for SEO purposes. Don’t abandon your process. Continue to produce high-quality content and do whatever you can to earn quality backlinks on your new content as well.
Links From Seed Sites
There’s a component of the leak that says PageRank (Google’s algorithm to rank web pages) is “long depreciated, no longer maintained, and can break at any moment.”
It also says that PageRank has been replaced by PageRankNS (PageRank Nearest Seeds), which calculates a score using the “nearest seeds method.”
This is not a new concept, and the earliest mention of seed sites dates back to 2010. Google even filed a patent for how it’s used.
In simple terms, seed sites are essentially white listed by Google. While we don’t know exactly what these are, we can assume that they’re sites like CNN, BBC, or any website that Google assumes won’t be producing spam or black hat SEO.
Google says that these sites are reliable, diverse enough to cover a wide range of topics, and well-connected to other websites. They have useful outgoing links that identify other useful and high-quality pages, essentially acting as hubs for the internet.
But I think PageRankNS in the leak tells us that any links closest to a seed site hold more value than others.
Here’s an ultra-simple diagram so you can see what I mean:
In this example, Site A and Site X are both seed sites. Sites B, Y, and Z are the closest distance from those seeds—and, therefore, have more valuable links. The link from Site B to C is more valuable than Site C to D, and the link from Site Y to D is also more valuable than Site C to D.
Links Coming From High-Quality News Sources Have a Special Tag
We’ve already established that links still matter—especially certain types of links. But another key takeaway I found from this leak is that Google uses a special tag for high-quality news websites.
There’s an attribute on the documentation called encodedNewsAnchorData, which is populated “only if the anchor is classified as coming from a newsy, high quality site.”
Thought PR was dead? Think again.
While it’s unclear exactly how Google uses this information to determine rankings, it’s clear that they’re storing data associated with links from news sites.
If you’re getting links from big, trustworthy publications (New York Times, CNN, etc.), then Google is taking notes.
Bad Links Can Hurt You
We know that good links can help you (and it’s something that most of us have known prior to the leak). However, the leak also suggests that bad backlinks can actually penalize you.
This is another scenario where Google has said one thing in the past, and the documents say otherwise.
Just last month, Google’s John Mueller denied “toxic links” in a Reddit thread. Here’s some context from SEO Round Table, which includes the question posed and Mueller’s response. I’ve highlighted the important parts:
Despite this claim, we found a badbacklinksPenalized attribute in the documentation—which, to me, directly contradicts what Muller said in May and what Google has continued to say in recent years.
We just don’t know exactly how Google is classifying “bad backlinks” and if it’s similar to how SEO tools identify “toxic” links.
The leaked documents also contained a module for IndexingjoinerAnchorSpamInfo, which I assume relates to how they handle spammy links. Some of those attributes include:
- spamProbability — Predicted probability of spam.
- trustedDemoted — Trusted anchors used for spam probability.
- trustedExamples — Examples of trusted sources.
- trustedMatching — Number of trusted anchors with anchor text that matches spam terms.
- trustedTarget — Related to field record details about trusted anchors (true if the URL is a trusted source).
- trustedTotal — Number of total trusted sources for the URL.
My guess is that Google is using this information to help them determine whether a link is spammy, and then potentially use it to penalize bad backlinks.
PR is Still Really Important For Building and Maintaining Brands
It takes some reading between the lines, but I can’t help but conclude how important it is to have a solid public relations strategy for your brand moving forward. This has consistently been one of the best ways to build authority and boost brand recognition for decades.
From an SEO standpoint, much of the Google leak backs this up. Just think about what we’ve covered so far:
- Links from high-quality and trusted news websites have a special tag
- Google rewards links that are closest to or directly coming from “seed sites”
- You need to have links that are relevant to your brand, website, and niche
- Getting new links is important, and some new links are automatically tagged as “high quality”
All of this stuff can be accomplished through a mix of PR and content marketing.
Final Thoughts
I don’t think anyone should completely abandon their current SEO plans based on the document leak alone. While the information is insightful, we don’t actually know how everything is weighted in Google’s algorithm.
Keep doing what’s been working for you, and you can make some slight adjustments based on the information provided above.
Here are the two most important conclusions that I’ve drawn—link building is still important, and PR still matters.
What’s your take on the Google document leak? I’d love to hear your thoughts. Drop a comment below or book a consultation to discuss the specific implications for your website.
Leave a Comment!