Apache mod_rewrite tweaks
Due to high demand, I’m going to write a straightforward mod_rewrite guide to group together the solutions to the most common problems. Again, this article is NOT a detailed guide about mod_rewrite: if you need such a thing, you should go to Apache’s website which brings a complete document about this issue. I will update this tutorial when I get new info about this “Apache 2 swiss army knife”. But first, let’s just introduce the topic by seeing what mod_rewrite is…
From Wikipedia:
A rewrite engine is a piece of web server software used to modify URLs, for a variety of purposes. Some benefits derived from a rewrite engine are:
- Making website URLs more user friendly
- Making website URLs more search-engine friendly
- Preventing undesired ‘hot linking’
- Not exposing the (web address related) inner workings of a website to users
The Apache HTTP server has a rewrite engine called mod_rewrite, which has been described as “the Swiss Army knife of URL manipulation“.
Before going on, you must know that mod_rewrite works by using regular expressions. From Wikipedia:
A regular expression (abbreviated as regexp or regex, with plural forms regexps, regexes, or regexen) is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are used by many text editors and utilities to search and manipulate bodies of text based on certain patterns. Many programming languages support regular expressions for string manipulation. For example, Perl and Tcl have a powerful regular expression engine built directly into their syntax. The set of utilities (including the editor sed and the filter grep) provided by Unix distributions were the first to popularize the concept of regular expressions.
This article is not meant to explain you what regexp are, and how to deal with them. Maybe I will focus on them on another tutorial, but if I did this now I would have thrown out of scope this article! So, I just remind the most important regexp operators used by mod_rewrite (as they’re shown on apache guide)
Text:
- . Any single character
- [chars] Character class: Any character of the class “chars”
- [^chars] Character class: Not a character of the class “chars”
- text1|text2 Alternative: text1 or text2
Quantifiers:
- ? 0 or 1 occurrences of the preceding text
- * 0 or N occurrences of the preceding text (N > 0)
- + 1 or N occurrences of the preceding text (N > 1)
Grouping:
- (text) Grouping of text (used either to set the borders of an alternative as above, or to make backreferences, here the Nth group can be referred to on the RHS of a RewriteRule as $N)
Anchors
- ^ Sart-of-line anchor
- $ End-of-line anchor
Escaping:
- \char escape the given char (for instance, to specify the chars “.[]()” etc.)
Let’s now deal with the most common issues!
Tweaks Index
(Last Updated 28th Feb 2006)
- Basic URL redirection
- Add Trailing Slashes
- Blocking bad referers - No hotlinking
- Blocking Bad Bots | Fetchers
- Do not show ‘www’
- Guessable url
- Flags Explained
- Time-Dependent Rewriting
- Browser Dependent Content
- On-the-fly Content-Regeneration
- External Rewriting Engine
- Sources
1) Basic URL redirection
You have moved a file and you want to redirect all the visit to the new location
<IfModule mod_rewrite>
RewriteEngine on
RewriteRule ^old\.html$ new.html [R=301,L]
</IfModule>
This way, all the request that point to old.html will be pointed to new.html with a permanent redirect.
If the redirect is temporary, use R=302
Be careful with redirects, they are a bad thing when we talk about SEO and SE rankings!
2) Add Trailing Slashes
Use this to add trailing slashes to some recurrent url of your site
<IfModule mod_rewrite>
RewriteEngine on
RewriteRule ^myphotos/([0-9]+)$ myphotos/$1/ [R]
</IfModule>
3) Blocking bad referers - No hotlinking
4) Blocking Bad Bots | Fetchers
5) Do not show ‘www’
6) Guessable URL
Suppose you see from your logs that a lot of visitors come searching some page they’d think you provide, but is really in another section of your website. For example, suppose visitors come searching for http://www.example.com/tweaks and keep getting a 404 error, when the things they’re searching for are at http://www.example.com/tutorials/. You can route them to the right location!
<IfModule mod_rewrite>
RewriteEngine on
RewriteRule ^tweaks(/)?$ /tutorials/ [R]
</IfModule>
7) Flags Explained
You can add some flags to each rewrite rule to tweak its behaviour. The flags are added after each RewriteRule in [ ]
To have a complete reference, see here.
- C (chain): chains the current rule with the next rule
- CO=NAME:VAL:domain[:lifetime[:path]] (cookie):This sets a cookie in the client’s browser
- E=VAR:VAL (env): This forces an environment variable named VAR to be set to the value VAL, where VAL can contain regexp backreferences ($N and %N) which will be expanded. You can use this flag more than once, to set more than one variable
- F (forbidden): This forces the current URL to be forbidden - it immediately sends back a HTTP response of 403 (FORBIDDEN)
- G (gone): This forces the current URL to be gone - it immediately sends back a HTTP response of 410 (GONE)
- L (last): Stop the rewriting process here and don’t apply any more rewrite rules
- N (next): Re-run the rewriting process (starting again with the first rewriting rule). This time, the URL to match is no longer the original URL, but rather the URL returned by the last rewriting rule.Be careful not to create an infinite loop!
- NE (noescape): This flag prevents mod_rewrite from applying the usual URI escaping rules to the result of a rewrite
- NS (nosubreq): This flag forces the rewrite engine to skip a rewrite rule if the current request is an internal sub-request.
- P (proxy): This flag forces the substitution part to be internally sent as a proxy request and immediately (rewrite processing stops here) put through the proxy module.
- PT (passthrough): This flag forces the rewrite engine to set the uri field of the internal request_rec structure to the value of the filename field
- QSA (querystringappend): This flag forces the rewrite engine to append a query string part of the substitution string to the existing string, instead of replacing it
- R[=code] (redirect):Prefix Substitution with http://thishost[:thisport]/ (which makes the new URL a URI) to force a external redirection. If no code is given, a HTTP response of 302 (MOVED TEMPORARILY) will be returned
- S=num (skipnext): This flag forces the rewriting engine to skip the next num rules in sequence, if the current rule matches
- T=MIME-type (mimetype): Force the MIME-type of the target file to be MIME-type. This can be used to set up the content-type based on some conditions
8) Time-Dependent Rewrite
Sometimes you may want to apply different rules with respect to time
<IfModule mod_rewrite>
RewriteEngine on
RewriteCond %{TIME_HOUR}%{TIME_MIN} >0700
RewriteCond %{TIME_HOUR}%{TIME_MIN} <1900
RewriteRule ^foo\.html$ foo.day.html
RewriteRule ^foo\.html$ foo.night.html
</IfModule>
9) Browser Dependent Content
At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent content, i.e. one has to provide a maximum version for the latest Netscape variants, a minimum version for the Lynx browsers and a average feature version for all others.
<IfModule mod_rewrite>
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.*
RewriteRule ^foo\.html$ foo.NS.html [L]
RewriteCond %{HTTP_USER_AGENT} ^Lynx/.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[12].*
RewriteRule ^foo\.html$ foo.20.html [L]
RewriteRule ^foo\.html$ foo.32.html [L]
</IfModule>
10) On-the-fly Content-Regeneration
Here comes a really esoteric feature: Dynamically generated but statically served pages, i.e. pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can have CGI-generated pages which are statically served unless one (or a cronjob) removes the static contents. Then the contents gets refreshed.
<IfModule mod_rewrite>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^page\.html$ page.cgi [T=application/x-httpd-cgi,L]
</IfModule>
11) External Rewriting Engine
Question? How can we solve the FOO/BAR/QUUX/etc. problem? There seems no solution by the use of mod_rewrite
Response: Use an external RewriteMap, i.e. a program which acts like a RewriteMap. It is run once on startup of Apache receives the requested URLs on STDIN and has to put the resulting (usually rewritten) URL on STDOUT (same order!).
<IfModule mod_rewrite>
RewriteEngine on
RewriteMap quux-map prg:/path/to/map.quux.pl
RewriteRule ^/~quux/(.*)$ /~quux/${quux-map:$1}
</IfModule>
Then you code your personal mapping
#!/path/to/perl
# disable buffered I/O which would lead
# to deadloops for the Apache server
$| = 1;
# read URLs one per line from stdin and
# generate substitution URL on stdout
while (<>) {
s|^foo/|bar/|;
print $_;
}





Hi! My name is Chris and I work at Help.com. One of our members posted a question and after reviewing your blog I thought you might be able to provide some advice. See below for their forwarded question:
“everytime I access my site I get an index page and the words Index on it and a link to all of the files that I have in my folder. Why? and what can I do to fix this problem?”
~Larry
If you have the know-how and time to offer any insight, please contact Larry. Thanks!
Hi. I assume Larry is using Apache Webserver. This happens because he turned Directory Indexing on. From the Apache FAQ:
Hope this helps
Hey!
My question is about the order of rewrite rules as they’re processed. Most of what I’ve read so far elsewhere has been a bit ambiguous. My question is: if you have four RewriteRule’s in a row, does Apache automatically apply the first relevant rule, or does it require that you add in the [L] flag? I understand that the RewriteRule is read before the RewriteCondition, to be applied when the condition is met, but do these Condition/Rule pairings constitute a block which is executed together? Does the [L] flag work like a “break” in PHP?
Really confusing the heck out of myself, here. Any enlightenment possible would be very helpful.
Hi there
I have a question related to this post.
If I have a domain that says http://www.warren.com/file/eample.pdf
and I have a new system where I dont use the directory “file” just http://www.warren.com/example.pdf
but people still go to http://www.warren.com/file/example.pdf to get that document how do I use the mod rewrite to make it so if they put in http://www.warren.com/file/example.pdf they will be automatically sent to http://www.warren.com/example.pdf.
where would I go to change this and what would the code be.
sorry if not explained well.
thanks
Reading many htacceess redirect tips and Mod Rewrite tips I leave with a question. In my case I am using Apache services and have .htaccess running fine. Tutorials give both examples how to implement redirects.
I want to redirect using root level .htaccess file, not by by editing httpd.conf file everytime. In the root .htaccess file I currently list redirects like this (
RewriteEngine on
RewriteRule ^/products/foo(/)?*$ /products/newsletter/index.html [R=302,L]
)
This is working. My questions is should I really be using this instead?
redirect 302 /products/foo /products/newsletter/index.html
It is important to work with or without training /.
PS. What does the ,L mean or do?
how do you get the mod_rewrite to allow same page anchors as the setup i have wont allow it?