Google analytics does good job at monitoring pages visited, but what about pages that are not found on your server and thus never displayed? You can get these pages by moving the content around, having errors in your links, getting malformed links from various webmasters (often they are caused by bad software on their side or miscommunication) or for other reasons. Even a soft like wordpress does not check comment links, which might generate 404 errors (page not found). How to detect such links? Well, you have 3 choices, each of them have their own drawbacks.

1. Google webmaster tools. Adding your site to google webmaster tools is good idea, and there you will get information about links not found. It is most simple way, but worst as well. This is because Google detects the links that are linked and indexed. However, there might be plenty other links that need to be taken care of. For example, advertisement campaigns use non-followed (often javascript) links.

2. Log analysis. That is the best method if you have access to log files and you can process them. I prefer using Awstats for that, however you can do it by hand for smaller sites on apache, as errors are logged to separate file as well. The single problem with manual analysis is that error log has less information than common log. There is no referrer link mentioned in the error log. However, this can be solved by using grep to scan access log for 404 error codes as well. The drawback is that some CMS processes all requests and do not generate error codes successfully. This means that you will not see such errors in logs even if they exist.

3. If you can’t access error logs, the best way of action is to use custom error pages and create a log from them. You have to log both referrer path and request uri for best result. This approach can be implemented in many of the abovementioned CMS’es too.

So, what to do with bad links? This depends on what causes these links. If it is an advertisement campaign or a referrer site, you will have to create a redirect from bad link to the appropriate good one. In cases this is a malformed comment link, I would just delete it from database.


Giedrius Majauskas

I am a internet company owner and project manager living at Lithuania. I am interested in computer security, health and technology topics.

6 Comments

Vladimir Radmilovic · August 13, 2009 at 1:04 pm

With our product Web Log Storming (http://www.weblogstorming.com) it’s easy to overcome the AwStats problem you mentioned and list 404 hits with referrers, among other things. It’s not free, though…

I hope it’s not inappropriate to post this comment here. If it is, please accept my apologies and remove it.

    Giedrius · August 13, 2009 at 2:06 pm

    well, grep solves that problem too πŸ™‚

      Vladimir Radmilovic · August 13, 2009 at 2:24 pm

      Fair enough πŸ˜‰

      Vladimir Radmilovic · August 14, 2009 at 11:46 am

      I’ve tried, but I can’t resist… πŸ˜‰ You can also use Notepad.exe to open log files and pen and paper to count visitors, but it’s not the point. πŸ™‚

        Giedrius · August 14, 2009 at 1:26 pm

        You are only partly right. Reviewing last log has some uses as well. However analytics does much better job at checking “alive” visitors. I am not saying it is perfect, or that one does not needs to watch bots and javascript disabled users. Personally, I do not need analyser (especially paid one) if his single bonus is faster 404 analysis, which is not daily nor even weekly task for me πŸ™‚ Maybe weblogstorming has something more to offer πŸ™‚

          Vladimir Radmilovic · August 14, 2009 at 3:48 pm

          LOL! Yeah, I got your point. There’s lot of discussion about JS vs log analyzers (which depends on specific needs). Still, most agree that both methods should be used for full picture (but you are already doing this).

          >Maybe weblogstorming has something more to offer

          I’d say: definitely… πŸ˜‰ You should really check it out as there’s a chance you would find it useful. Interactive reports, “on-the-fly” filters and drill-down to individual visitors, to name some features. If you have any comments I will be glad to hear them.

          BTW, thanks for listening!

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *