Thwarting the Search Engines
New! Let me know if you like this tutorial!
Sometimes you want to post information on your site that is private in nature. You want it available for certain people to access it but you don’t want search engines to make it public for the rest of the world to see. How do you control what gets indexed and what doesn’t?
When you don’t want something searched by spiders, there are several ways you can tell them to scram.
Check your site out
If you have content on your web site that you don’t want made public, the first thing you might want to do is see if the search engines have found it already. You can do this by using the information in the tutorial “How to Raid Your Competitor’s Web Site for Secret Information” and, instead of entering your competitor’s web site address, enter your own.
Things that will stop search engines
Here are things that will keep your information safe from search engines, whereas there are a few things that won’t (below) that you should be aware of as well.
Password protected pages
Any site where your visitors have to enter a username and a password to access the information will not be available to search engines. Search engines would have to enter a username and password just like any regular visitor.
If you haven’t entered in a username and password and you go to any one of the protected pages on my site, you get booted back to the login screen. That’s how you know your content is protected from search engines.
Robot.txt
You can tell spiders what they’re allowed to search and what they aren’t by creating a “robots.txt” file and placing this file in your root directory (root directory = the main folder where your home page (index file) is located).
Some search engines ignore robot.txt files, but the major search engines will follow them.
Here’s an example robots.txt file (you could copy everything in this box below, call it “robots.txt”, put it in the main folder of your web site, and spiders would behave as described in the translation provided):
Disallow: /*.pdf
Disallow: /really_bad_poetry/
Disallow: /gradeschool_stories/getting_peed_on_by
_a_5th_grader.html
- –Translation: For all spiders, don’t index any pdf files, don’t search files in the “really_bad_poetry” folder, and don’t index “getting_peed_on_by_a_5th_grader.html” in the “gradeschool_stories” folder (true story–and not one I care to share with the world)
Disallow: /
- –Translation: Only Google’s spider–don’t index anything on this site. Yahoo, MSN, etc… you’re welcome to search this site (I’m not sure why you would do this, but you can).
Meta tags
Here’s some code you can put on any single page on your web site to tell spiders what to do with that particular page (this goes in between the <head></head> code at the beginning of the page):
- –Translation: “noindex” = don’t search this page, and “nofollow” = don’t follow any of the links on this page.
You can use these meta commands in different combinations as well: “noindex, follow”, “index, nofollow” etc…
For more on meta tags, visit the meta tag tutorial.
Things that won’t stop search engines
There are a couple things that might stop humans from finding certain information, but search engines still seem to find a way.
Simple password pages
It’s easy to create a very simple page that requires that someone give you a password to continue. For example, if someone goes to www.yoursite.com/first_page.html and they enter the password, they go to www.yoursite.com/content.html.
If people can bypass the www.yoursite.com/first_page.html page by typing in www.yoursite.com/content.html and see the content on www.yoursite.com/content.html just fine, then search engines can (and eventually will) do the same thing and list your “password protected” content in their search engines.
There’s probably more than one way to create a simple password page besides using javascript. I don’t know what other programming languages can do this (I’m sure most can), but these simple password pages don’t afford true protection from search engines.
Capture/squeeze pages
It’s common to create a capture page that that requires someone give you their email get access to special information. People give you their email address they want the information.
Like the simple password pages, if people can bypass the www.yoursite.com/first_page.html page by typing in www.yoursite.com/content.html and see the content on www.yoursite.com/content.html just fine, then search engines will eventually fine you “email required” content and make it public.
These capture/squeeze pages that require an email address to continue might stop a normal human from continuing, but don’t afford true protection from search engines.
Not linking to the page
Search engines usually find your web site from other sites that link to you. Once they get to your web site, they simply go from page to page on your site and look at everything you’ve got available.
You might think that if you put up a page and you don’t link to it from you main site, search engines won’t find it. For example, if you put www.yoursite.com/content.html on your site, but none of the pages on your existing site link to it, you’d think that search engines wouldn’t find it.
Just like javascript passwords, if a human could somehow get to the content then so can a search engine.
And search engines will eventually find that content. Either someone will link to that content from their web site, or you’ll send the link www.yoursite.com/content.html out to someone in an email and it will end up on a page somewhere online or in a a conversation between two people on a forum where search engines find it.
How will they find it? It’s impossible to tell. But they will.
Guess what?
Here’s a sneaky trick: if you encounter a site that you suspect uses any one of the things that won’t stop search engines (requires a password, and email, or you know there’s more content on the site than you’re seeing up front), use the information described in “How to Raid Your Competitor’s Web Site for Secret Information” tutorial to see if the search engines have found that information and indexed it. They usually have, and you can bypass whatever feeble protection they’ve put in place.
I do this all the time. Bwa-ha-ha-ha! (*Jarom rubs his hands together and laughs like an evil villain*)
More Popular Tutorials
- How to Set Up Your Web Site the Smart Way
- Social Marketing: How to Profit on Facebook
- Email Marketing: Starting a Profitable Email List
- Web Site Tune-Up: How to Increase Your Web Site’s Effectiveness
- Competitive Analysis: How to Spy on Your Competition
- Multimedia Marketing: Creating Effective Media
- Blogging: Build a Blog that will Build Your Business
- SEO: How to Get to the Top of the Search Engines
- Article Marketing: How to get Your Message to the Masses through Articles
- How to Unleash a Marketing Torrent through Funded Proposals