How much Bandwidth and Disk Space do I Really Need?

How much Bandwidth and Disk Space do I Really Need?

Bandwidth is relative to the amount of visitors (traffic) you have to your site. 99% of web sites currently on the web use no more than 5 GB of bandwidth a month. It’s only sites that are expected to produce a large amount of traffic that will require a larger rate of monthly bandwidth, like adult sites, or large search engines like Google and Yahoo.

Unlimited Bandwidth?

So don’t fall foul of those Unlimited bandwidth offers, for a starters it’s a myth. All Web Hosting Providers have to pay for their bandwidth, so to offer an unlimited service is just crazy talk. By unlimited, they mean that for the average user it really is unlimited since they will never reach the limit. And don’t be wowed over by providers offering large rates of bandwidth, more doesn’t necessarily mean better. Unless you know your site is going to enjoy a lot of traffic, stick to the basics.

Unlimited Disk Space?

As for Disk space, well that really is solely dependent on the content of your web site. Most web sites average around 150 MB with a single web page taking up around 40-50KB. Many Web Hosting providers will offer unlimited disk space. In my opinion 5 GB is more then enough for most individuals and small to medium businesses.

However, it is worth considering just what the content of your website will be. Will there be banner ads, will you have a lot of graphics, video media and or photo’s? Will your web site be database driven, or have downloadable software? For the average advanced user 50GB is usually plenty.

Basically the more advanced and media rich your web site is, the more disk space and probably bandwidth you’ll need. But again, a reminder, you don’t have to go crazy.

I might put together a calculator ind the future, that can estimate your traffic needs. Check back if you are interested!

For everyday usage up to a couple thousand pageviews a day I usually recommend eHost. They are very good in every respect.

Robots.txt File Explained: Allow or Disallow All or Part of Your Website

The sad reality is that most webmasters have no idea what a robots.txt file is. A robot in this sense is a “spider”. It’s what search engines use to crawl and index sites on the internet.

Basically a spider will crawl a site and index all the pages (that are allowed to) of that site. Once that’s complete the robot will then move on to external links and continue it’s indexing. This is how search engines find other sites and build such a large index of sites. They depend on other sites to link to relevant websites, which will link to others and so on.

When a search engine (or robot or spider) hits a site the first thing it will look for is a robots.txt file. Remember to keep this file in the root directory.


This will ensure that the robot will be able to find the file and use it correctly. This file will tell a robot what to crawl. This system is called “The Robots Exclusion Standard“.

Pages that are disallowed in your robots.txt file won’t only be not indexed but they won’t be crawled either.

Robots.txt Format

The format for a robots.txt file is a special format but it’s very simple. It consists of a “User-agent:” line and a “Disallow:” line.

The “User-agent:” line refers to the robot. It can also be used to refer to all robots.

Example of how to disallow all robots:

To disallow all robots from indexing a certain folder on a site, we’ll use this:

User-agent: *
Disallow: /cgi-bin/

For the User-agent line we used a wild card “*” to refer to the robot which tells all robots to listen to this command. So once a spider reads this, it will then know that the /cgi-bin/ should not be indexed at all. This will include all folders contained in it.

Specifying certain bots is also allowed and in most cases very useful to users that utilize doorway pages or other ways of search engine optimization. Specifying certain bots will allow a site owner to tell a spider where to index and what not to index.

Here is an example of restricting access to the /cgi-bin/ from Google:

User-agent: Googlebot
Disallow: /cgi-bin/

This time with the User-agent command we used Googlebot instead of the wildcard command “*”. This lets Google’s spider know we’re talking to it specifically and not to crawl this folder or file.

White Space & Comments

White spaces and comment lines can be used but are not supported by most robots. When using a comment it is always best to add it to a new line.

Not recommended:

User-agent: googlebot #Google Robot


User-agent: googlebot
#Google Robot

Notice on the first one the comment line is on the same line indicated by a # then the comment. While this is ok and will be accepted in most cases, a lot of robots may not utilize this. So be sure to use example 2 when using comments.

In most cases if Example 1 is used and a robot does not support it, the robot will interpret the line as “googlebot#GoogleRobot”. Instead of “googlebot” like we originally intended.

White spaces refer to using a blank space in front of a line in order to comment it out. It is allowed but not always recommended.

Common Robot Names

Here are a few of the top robot names:

  • Googlebot –
  • YandexBot –
  • Bingbot –

These are just a few common robots that will hit a site at any given time.

Robots.txt Examples

The following examples are commonly used commands for robots.txt files.

The following allows all robots to index an entire site. Notice the “Disallow:” command is blank; this tells robots that nothing is off limits.

User-agent: *

The following tells all robots not to crawl/index anything on a site. We used “/” in the “Disallow:” function to specify the entire contents of a root folder to not be indexed.

User-agent: *
Disallow: /

The following tells all robots (specified by the wildcard command in the “User-agent:” function) to not allow the cgi-bin, images, and downloads folder to be indexed. It also doesn’t allow the admin.php file to be indexed, which is located in the root directory. Subdirectory files and folders can also be used in this case.

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /downloads/
Disallow: admin.php

This tells the Google Bot not to index the wp-admin folder.

User-agent: googlebot
Disallow: /wp-admin/


More information on robots.txt files can be found on Remember that all the major sites will use a robots.txt file. Just punch in a URL and add robots.txt file to the end to find out if a site uses it or not. It will display their robots.txt file in plain text so anyone can read it.

Remember that the robots.txt file isn’t mandatory. It’s mainly used to tell spiders what to crawl and what not to crawl. If everything is to be indexed on a site, a robots.txt file isn’t needed.