Apache Rewrite Rules | Zero Byte, LLC

The Apache module mod_rewrite provides a powerful mechanism for hiding, redirecting, and reformatting request URLs. I just finished implementing a mod_rewrite scheme for timfanelli.com to accomplish 3 things:

Redirect old URLs with a 301 redirect code
Hide certain parts of the URL from my readers.
Optimize my Google pagerank.

My first goal was to redirect old URLs using 301 Redirect codes. I migrated to pyBlosxom a long time ago, and it recently came to my attention that not only were there links to my old URLs on other people’s blogs, Google was turning up search results pointing to my old URLs also! All of these references resulted in 404’s, driving my pagerank down towards 0.

Using a two simple rewrite rule, I was able to redireect my previous URLs, http://wwww.timfanelli.com/index.cgi to a static page, old.html, which provides links to my new URL, http://www.timfanelli.com/cgi-bin/blog.cgi:

RewriteEngine on
RewriteRule ^/index.cgi(.*) /old.html [R=301]

Now, any link followed to my page that starts with “/index.cgi” is redirected, and a 301 is issued to the requesting client indicating that the resource has been permanently relocated.

My second goal was to hide the /cgi-bin/blog.cgi portion of my URL. It’s ugly and it’s hard to remember. I wanted any request sent to http://www.timfanelli.com/blog/ to go directly to that CGI script. Using a passthrough rule and a 301 redirect accomplished this nicely:

RewriteRule ^/blog/(.*)$ /cgi-bin/blog.cgi/ [PT]
RewriteRule ^/$ /blog/ [R=301]
RewriteRule ^/blog$ /blog/ [R=301]

The first rule redirects any request sent to /blog/ to /cgi-bin/blog.cgi/. Any extra characters in the URL string is copied into the new rewritten URL using regular expression groupings. The second rule causes a 301 redirect from my base-url to the blog, and the third causes a 301 redirect if the URL is missing the trailing / character. We use a 301 redirect here instead of another passthrough rule to prevent having multiple “valid” URLs with the same content.

Having multiple “valid” URLs with the same content isn’t in and of itself a problem. Your website would work just fine, but I also wanted to optimize my site for Google pagerank. To this end, the astute reader would have noticed that there is now two ways to access my site: http://www.timfanelli.com/blog and http://www.timfanelli.com/cgi-bin/blog.cgi. We need to hide the /cgi-bin/blog.cgi URL from the outside world. This gets a little tricky, because we can’t just redirect /cgi-bin/blog.cgi to /blog/ — this would cause an infinitely recursive rewrite, because /blog/ rewrites to /cgi-bin/blog.cgi! We’ll still use this rewrite rule though, but we’ll protect it with a RewriteCond clause so its only evaluated when it comes in the original request URL:

RewriteCond ${IS_SUBREQ} false
RewriteRule ^/cgi-bin/blog.cgi(.*)$ /blog/ [R=301]

IS_SUBREQ is “true” if the rule is being processed as a sub request of the original; false otherwise. So when it’s matching the user-entered URL, it is not a sub request, and the rewrite rule substitutes /cgi-bin/blog.cgi with /blog/. This is done with a 301 redirect, so Google won’t see it as a valid URL. Later, when the rewrite engine substitutes /blog/ for /cgi-bin/blog.cgi – IS_SUBREQ is going to be “true”, and this rule won’t be executed again.

So now the only valid way to access my site from the “outside” is via the URL http://www.timfanelli.com/blog/, even though all of the following URLs will appear to work as well:

http://www.timfanelli.com/ (no “/blog/”)
http://www.timfanelli.com/blog (no trailing slash)
http://www.timfanelli.com/cgi-bin/blog.cgi
http://www.timfanelli.com/cgi-bin/blog.cgi/

Many thanks to Pete for all his help!

Leave a Comment Cancel reply