IBM has posted a great primer article on building your own web spider. It’s a great place to start for anyone interested in building their own spider. Why would you build your own spider? Well if you’re responsible for even a medium sized website like the Dalhousie Computer Science website, a spider can take care of a lot of leg work for you in the same manner as search engine robots take care of a lot of leg work in gathering information for the search engines.
Automating website maintenance is a huge task as your website gets larger and larger. It is a huge step in improving your website. The crawler doens’t need to fix things neccessarily, only report back with a list of problems. Some examples of things that we check for are page validation, spelling and link validation. We haven’t gotten to grammar checking yet unfortunately.
So if building your own crawler sounds like something that you’d want to do, check out the article. It’s a great place to get started and then you can extend it to give your own flavour to your creation.