Cayce Cookbook - Page Template

htaccess File Snippets

A collection of htaccess declarations.

Basic Image Indexing Prevention

In bot the robots.txt and .htaccess files, place:
User-agent: *
Disallow: /images/
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.png$

How to stop Search Engines from crawling your Website

Resources

.htaccess tips and tricks

The Ultimate Guide to .htaccess Files

How To Use the .htaccess File

How to stop Search Engines from crawling your Website

Hiding all images from search engines:

HTML Help: Prevent Image Indexing

Prevent images from appearing in Image Search on pages that you own.

Playing with the X-Robots-Tag HTTP header.

Prevent images from appearing in Image Search on pages that you own

10 useful .htaccess snippets to have in your toolbox.

Ultimate Guide to htaccess and mod_rewrite

Preventing your site from being indexed, the right way

Snippets:

This next area is from the source linked to this text.

I didn't do a lot to clean this up, presentation-wise. Maybe someday I'll have that kind of time.

In the Moz Q&A, there are often questions that are directly asked about, or answered with, a reference to the all-powerful .htaccess file. I've put together a few useful .htaccess snippets which are often helpful. For those who aren't aware, the .htaccess file is a type of config file for the Apache server, which allows you to manipulate and redirect URLs amongst other things.

Everyone will be familiar with tip number four, which is the classic 301 redirect that SEOs have come to know and love. However, the other tips in this list are less common, but are quite useful to know when you need them. After you've read this post, bookmark it, and hopefully it will save you some time in the future.

1) Make URLs SEO-friendly and future-proof

Back when I was more of a developer than an SEO, I built an e-commerce site selling vacations, with a product URL structure:

/vacations.php?country=italy

A nicer URL would probably be:

/vacations/italy/

The second version will allow me to move away from PHP later, it is probably better for SEO, and allows me to even put further sub-folders later if I want. However, it isn't realistic to create a new folder for every product or category. Besides, it all lives in a database normally.

Apache identifies files and how to handle them by their extensions, which we can override on a file by file basis:

<Files magic>
ForceType application/x-httpd-php5
</Files>

This will allow the 'magic' file, which is a PHP file without an extension, to then look like a folder and handle the 'inner' folders as parameters. You can test it out here (try changing the folder names inside the magic 'folder'):

http://www.tomanthony.co.uk/httest/magic/foo/bar/donk

2) Apply rel="canonical" to PDFs and images

The SEO community has adopted rel="canonical" quickly, and it is usually kicked around in discussions about IA and canonicalization issues, where before we only had redirects and blocking to solve a problem. It is a handy little tag that goes in the head section of an HTML page.

However, many people still don't know that you can apply rel="canonical" in an alternative way, using HTTP, for cases where there is no HTML to insert a tag into. An often cited example that can be used for applying rel="canonical" to PDFs is to point to an HTML version or to the download page for a PDF document.

An alternative use would be for applying rel="canonical" to image files. This suggestion came from a client of mine recently, and is something a couple of us had kicked about once before in the Distilled office. My first reaction to the client was that this practice sounded a little bit 'dodgy,' but the more I think about it, the more it seems reasonable.

They had a product range that attracts people to link to their images, but that isn't very helpful to them in terms of SEO (any traffic coming from image search is unlikely to convert), but rel="canonical" those links to images to the product page, and suddenly they are helpful links, and the rel="canonical" seems pretty reasonable.

Here is an example of applying HTTP rel="canonical" to a PDF and a JPG file:

<Files download.pdf>
Header add Link '<http://www.tomanthony.co.uk/httest/pdf-download.html>; rel="canonical"'
</Files>
 
<Files product.jpg>
Header add Link '<http://www.tomanthony.co.uk/httest/product-page.html>; rel="canonical"'
</Files>

We could also use some variables magic (you didn't know .htaccess could do variables!?) to apply this to all PDFs in a folder, linking back the HTML page with the same name (be careful with this if you are unsure):

RewriteRule ([^/]+)\.pdf$ - [E=FILENAME:$1]
<FilesMatch "\.pdf$">
Header add Link '<http://www.tomanthony.co.uk/httest/%{FILENAME}e.html>; rel="canonical"'
</FilesMatch>

You can read more about it here:

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394

3) Robots directives

You can't instruct all search engines not to index a page, unless you allow them to access the page. If you block a page with robots.txt, then Google might still index it if it has a lot of links pointing to it. You need to put the noindex Meta Robots tag on every page you want to issue that instruction on. If you aren't using a CMS or are using one that is limited in its ease, this could be a lot of work. .htaccess to the rescue!

You can apply directives to all files in a directory by creating an .htaccess file in that directory and adding this command:

Header set X-Robots-Tag "noindex, noarchive, nosnippet"

If you want to read a bit more about this, I suggest this excellent post from Yoast: 

http://yoast.com/x-robots-tag-play/

4) Various types of redirect

The common SEO redirect is ensuring that a canonical domain is used, normally www vs. non-www. There are also a couple of other redirects you might find useful. I have kept them simple here, but often times you will want to combine these to ensure you avoid chaining redirects:

# Ensure www on all URLs.
RewriteCond %{HTTP_HOST} ^example.com [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]
 
# Ensure we are using HTTPS version of the site.
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
 
# Ensure all URLs have a trailing slash.
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://www.example.com/$1/ [L,R=301]

5) Custom 404 error page

None of your visitors should be seeing a white error page with black techno-babble when they end up on at a broken URL. You should always be serving a nice 404 page which also gives the visitor links to get back on track.

You can also end up getting lots of links and traffic if you but your time and effort into a cool 404 page, like Distilled's:

This is very easy to setup with .htaccess:

ErrorDocument 404 /cool404.html
 
# Can also do the same for other errors...
ErrorDocument 500 /cool500.html

6) Send the Vary header to help crawl mobile content

If you are serving a mobile site on the same URLs as your main site, but rather than using responsive design you are altering the HTML, then you should be using the 'Vary' header to let Google know that the HTML changes for mobile users. This helps them to crawl and index your pages more appropriately:

https://developers.google.com/webmasters/smartphone-sites/details

Again, this is pretty simple to achieve with your .htaccess file, independent of your CMS or however your are implementing the HTML variations:

Header append Vary User-Agent

7) Improve caching for better site speed

There is an increasing focus on site speed, both from SEOs (because Google cares) and also from developers who know that more and more visitors are coming to sites over mobile connections.

You should be careful with this tip to ensure there aren't already caching systems in place, and that you choose appropriate caching length. However, if you want a quick and easy solution to set the number of seconds, you can use the below. Here I set static files to cache for 24 hours:

<FilesMatch ".(flv|gif|jpg|jpeg|png|ico|swf|js|css|pdf)$">
Header set Cache-Control "max-age=28800"
</FilesMatch>

8) An Apple-style 'Back Soon' maintenance page

Apple famously shows a 'Back Soon' note when they take their store down temporarily during product announcements, before it comes back with shiny new products to love or hate. When you are making significant changes to redirect users to such a page, a message such as this can be quite useful. However, it can also make it tough to check the changes you've made.

With this bit of .htaccess goodness, you can redirect people based on their IP address, so you can redirect everyone but your IP address and 127.0.0.1 (this is a special 'home' IP address):

RewriteCond %{REMOTE_ADDR}  !your_ip_address
RewriteCond %{REMOTE_ADDR}  !127.0.0.1
RewriteRule !offline.php$ http://www.example.com/back_soon.html [L,R=307]

9) Smarten up your URLs even when your CMS says "No!"

One of the biggest complaints I hear amongst SEOs is about how much this or that CMS "sucks." It can be intensely frustrating for an SEO when they are hampered by the restraints of a certain CMS, and one of those constraints is often that you are stuck with appaling URLs.

You can overcome this, turning product.php?id=3123 into /ray-guns/ in no time at all:

# Rewrite a specific product...
RewriteRule ray-guns/ product.php?id=3123
 
# ... or groups of them
RewriteRule product/([0-9]+)/ product.php?id=$1

This won't prevent people from visiting the crappy versions of the URLs, but combined with other redirects (based on IP) or with judicious use of rel="canonical," you improve the situation tremendously. Don't forget to update your internal links to the new ones. :)

10)  Recruit via your HTTP headers

Ever looked closely at SEOmoz's HTTP headers? You might have missed the opportunity to get a job...

If you would like to add a custom header to your site, you can make up whatever headers and values you'd like:

Header set Hiring-Now "Looking for a job? Email us!"

It can be fun to leave messages for people poking around - I'll leave it to your imaginations! :)