Site Links

Home Page


More Free Tutorials

Get access to more free tutorials on this web site and new tutorials as they arrive!

Join the
email list here




Just So You Know

I'm not an owner or a partner in any of the companies reviewed on this site. However, some of the links on this site are affiliate links and I do get a commission if you purchase a product using these links.

Here's how to bypass my affiliate links for anybody who would like to.

Jarom Adair
IMFBO.com




Like this tutorial?
Share it!




more...












Thwarting the Search Engines


New! Let me know if you like this tutorial!

Sometimes you want to post information on your site that is private in nature. You want it available for certain people to access it but you don’t want search engines to make it public for the rest of the world to see. How do you control what gets indexed and what doesn’t?

When you don’t want something searched by spiders, there are several ways you can tell them to scram.

Check your site out


If you have content on your web site that you don’t want made public, the first thing you might want to do is see if the search engines have found it already. You can do this by using the information in the tutorial “How to Raid Your Competitor’s Web Site for Secret Information” and, instead of entering your competitor’s web site address, enter your own.

Things that will stop search engines


Here are things that will keep your information safe from search engines, whereas there are a few things that won’t (below) that you should be aware of as well.

Password protected pages

Any site where your visitors have to enter a username and a password to access the information will not be available to search engines. Search engines would have to enter a username and password just like any regular visitor.

If you haven’t entered in a username and password and you go to any one of the protected pages on my site, you get booted back to the login screen. That’s how you know your content is protected from search engines.

Robot.txt

You can tell spiders what they’re allowed to search and what they aren’t by creating a “robots.txt” file and placing this file in your root directory (root directory = the main folder where your home page (index file) is located).

Some search engines ignore robot.txt files, but the major search engines will follow them.

Here’s an example robots.txt file (you could copy everything in this box below, call it “robots.txt”, put it in the main folder of your web site, and spiders would behave as described in the translation provided):

user-agent: *
Disallow: /*.pdf
Disallow: /really_bad_poetry/
Disallow: /gradeschool_stories/getting_peed_on_by
_a_5th_grader.html

  • –Translation: For all spiders, don’t index any pdf files, don’t search files in the “really_bad_poetry” folder, and don’t index “getting_peed_on_by_a_5th_grader.html” in the “gradeschool_stories” folder (true story–and not one I care to share with the world)

User-agent: googlebot
Disallow: /

  • –Translation: Only Google’s spider–don’t index anything on this site. Yahoo, MSN, etc… you’re welcome to search this site (I’m not sure why you would do this, but you can).

Meta tags

Here’s some code you can put on any single page on your web site to tell spiders what to do with that particular page (this goes in between the <head></head> code at the beginning of the page):

<Meta name=”robots” content=”noindex, nofollow”>

  • –Translation: “noindex” = don’t search this page, and “nofollow” = don’t follow any of the links on this page.

You can use these meta commands in different combinations as well: “noindex, follow”, “index, nofollow” etc…

For more on meta tags, visit the meta tag tutorial.

Things that won’t stop search engines

There are a couple things that might stop humans from finding certain information, but search engines still seem to find a way.

Simple password pages


It’s easy to create a very simple page that requires that someone give you a password to continue. For example, if someone goes to www.yoursite.com/first_page.html and they enter the password, they go to www.yoursite.com/content.html.

If people can bypass the www.yoursite.com/first_page.html page by typing in www.yoursite.com/content.html and see the content on www.yoursite.com/content.html just fine, then search engines can (and eventually will) do the same thing and list your “password protected” content in their search engines.

There’s probably more than one way to create a simple password page besides using javascript. I don’t know what other programming languages can do this (I’m sure most can), but these simple password pages don’t afford true protection from search engines.

Capture/squeeze pages


It’s common to create a capture page that that requires someone give you their email get access to special information. People give you their email address they want the information.

Like the simple password pages, if people can bypass the www.yoursite.com/first_page.html page by typing in www.yoursite.com/content.html and see the content on www.yoursite.com/content.html just fine, then search engines will eventually fine you “email required” content and make it public.

These capture/squeeze pages that require an email address to continue might stop a normal human from continuing, but don’t afford true protection from search engines.

Not linking to the page


Search engines usually find your web site from other sites that link to you. Once they get to your web site, they simply go from page to page on your site and look at everything you’ve got available.

You might think that if you put up a page and you don’t link to it from you main site, search engines won’t find it. For example, if you put www.yoursite.com/content.html on your site, but none of the pages on your existing site link to it, you’d think that search engines wouldn’t find it.

Just like javascript passwords, if a human could somehow get to the content then so can a search engine.

And search engines will eventually find that content. Either someone will link to that content from their web site, or you’ll send the link www.yoursite.com/content.html out to someone in an email and it will end up on a page somewhere online or in a a conversation between two people on a forum where search engines find it.

How will they find it? It’s impossible to tell. But they will.

Guess what?


Here’s a sneaky trick: if you encounter a site that you suspect uses any one of the things that won’t stop search engines (requires a password, and email, or you know there’s more content on the site than you’re seeing up front), use the information described in “How to Raid Your Competitor’s Web Site for Secret Information” tutorial to see if the search engines have found that information and indexed it. They usually have, and you can bypass whatever feeble protection they’ve put in place.

I do this all the time. Bwa-ha-ha-ha! (*Jarom rubs his hands together and laughs like an evil villain*)

More Popular Tutorials



Yours in success,
-Jarom Adair