How To Prune The Enterprise Link Tree

this process lets me sift through 30,000+ links in less than 3 hours. Which means more Skyrim time ??? a win-win.

  1. Create a ???whitelist???. That???s a list of domain names that are 100% (cough OK 90%) legitimate link sources.
  2. Grab the basic link data from Open Site Explorer and Majestic. Import both into Excel.
  3. Combine the two URL lists, including SEOMOZ Domain Authority and/or Majestic ACRank so that you have a single list of all linking URLs. Filter out any duplicates.
  4. Pull a list of unique domain names from that list. I use Python to do this. You can use Excel???s Text to Columns feature, too: Split the text up at each ???.???, remove any folders and queries, and you should have a list of domain names.
  5. Remove any whitelisted domains.
  6. Run a WHOIS query on each domain name. Be sure to get the hostname, registrant name and status, at a minimum. Store that in Excel, too. I use Python to perform the bulk lookup. You can also send a list of domains to a paid service and they???ll do it for you.
  7. Grab the IP address of each domain. You can use NSLOOKUP to do this, if you want to get all geeky about it. There are a few tools you can add to Excel, or you can script it in Google Spreadsheets. None of this is trivial, I know. It???s the price of success ??? you wanted your terrifying in-house SEO job for a Fortune 100. Time to pay up!
  8. Use Vlookup to combine the domains, WHOIS results and Majestic/SEOMOZ/ahrefs data. It???s important that you have all of this in one place.
  9. Now, look for sites that share common registrants. Ignore the private domain registration companies. Yes, that???s a lot of them. But you???ll be amazed how many link networks still operate ???in the clear???.
  10. If you find groups of sites owned by a single person or company, flag them. Why? Because multiple sites under a single owner may be part of a link network.
  11. Compare IP addresses, the same way you did registrants. If you have collections of sites under the same IP address, flag those, too.
  12. Now you should have a list of flagged domains.
  13. Grab those domains and run your Web crawler, fetching the home page of each domain. I use Python for this, saving the HTML for each page for the next few steps.
  14. Check the results for phrases that are a dead giveaway for spam: ???High pagerank,??? ???Link building,??? ???Upgrade your link??? and ???Free link??? are some of my favorites.
  15. Get a word and link count for each page. Compute the ratio of words to links. I use Python and BeautifulSoup (an HTML parser for Python) to do this.
  16. Pull all this data into your domains list.
  17. Score your domains. I use a holistic 1-10 scale: The more ???spam factors??? in evidence, the higher the score. So a page that???s part of a 10-domain portfolio, has spammy-sounding phrases on it and has a low ratio of words to link will get a really high score.
  18. Sort your spreadsheet by score. Then do a quick check of the worst offenders. If they???re spam, get those links removed.
  19. Repeat this process as necessary.

Per fer neteja de l’estructura de links, crea una whitelist de dominis, puntua dominis segons factors de spam, Intenta borrar els links. Si no te’n surts fes una petici?? de Disavow.

Anuncis

Etiquetes: ,

Deixa un comentari

Fill in your details below or click an icon to log in:

WordPress.com Logo

Esteu comentant fent servir el compte WordPress.com. Log Out / Canvia )

Twitter picture

Esteu comentant fent servir el compte Twitter. Log Out / Canvia )

Facebook photo

Esteu comentant fent servir el compte Facebook. Log Out / Canvia )

Google+ photo

Esteu comentant fent servir el compte Google+. Log Out / Canvia )

Connecting to %s


%d bloggers like this: