Google Webmaster Tools: Part 3, Statistics Reports

This blog talks about the reports available under the Statistics tab in Google Webmaster Tools. The 4 main sections are: Query stats, Crawl stats, Page analysis, and Index stats. Let’s take a look at each one below:

Query stats

This page displays two main reports:
1) Top search queries (queries that most often return a page from your site), and your site’s position for each query.
2) Top search query clicks (queries that generated a click to your site), and your site’s position for each query.

You can also drill down by search time (web, image, etc) or search location (google.com, google.co.uk, googl.ca, etc — note that does not represent where your user is coming from. It simply indicates which Google search engine the user used).

Crawl stats

This report shows PageRank distribution of the pages in your site, as well as which page has the highest PageRank. This is a functionality that I can do without — Google should either provide PR information on all the pages, or just get rid of this report.

Page analysis

This page has two sections: Content, which shows the type of documents Google found on your site, as well as the distribution of language encoding for the pages on your site. Common Words, which shows the words most commonly found on your site, as well as the anchor text most often found in links pointing to your site.

Index stats

This is simply a list of commands that you can use to find more information about your site. For example, site:www.yoursite.com can be used to find the indexed pages from www.yoursite.com.

Google Webmaster Tools: Part 2, Diagnostic Reports

Once you have set up your site with Google Webmaster Tools, you’ll be able to view two types of reports: Diagnostic Reports and Statistics Reports. In this post, I’ll review the information available in the Diagnostic Tab.

Under the Diagnostic tab, there are three main sections: Summary, Crawl errors, and Tools. Let’s take a look at each one below:

Summary

The summary page showing the following:
1. Whether pages from your site are included in the Google index.
2. The date Googlebot last successfully accessed the home page of your site.
3. Whether you have submitted a sitemap to Google.
4. Crawl errors Google found, including:

  • HTTP errors¬
  • Not found
  • URLs not followed
  • URLs restricted by robots.txt
  • URLs timed out
  • Unreachable URLs

This error report is useful in helping you identifying incorrect links to your site, especially from internal links.

Crawl errors

The Web crawl report under Crawl errors shows any crawl errors in more detail. The Mobile Web report shows any crawl errors for your mobile site in CHTML, WML/XHTML.

Tools

robots.txt analysis: This report shows whether Google found a robots.txt file in your site. You can also experiment changing the content of robots.txt file and see how that affect Google’s crawlers.

Manage site verification: This report displays information webmasters need for verifying that they are indeed the owner of the site.

Preferred domain: Google allows you to specify whether you want Google to think www.sitename.com and sitename.com are the same. This is the one functionality that I think is the most valuable for Google Webmaster Tools. Since you have no control on how other people link to your site, you’ll want to make sure that Google knows that links to www.sitename.com and sitename.com are the same (this should usually be the case). This way, your site can get full credit for all the incoming links.

Google Webmaster Tools: Part 1, Setting Up

In this post, I’ll talk about how to set up your site in Google Webmaster Tools. In subsequent posts, I’ll look at the reports available in Google Webmaster Tools.

To add a site to Google Webmaster Tools, do the following:

1. Go to http://www.google.com/webmasters/sitemaps.

2. Login with your Google account.

3. Type in your site (starting with http://) into the text box and click on the OK button.

4. Google will show you some initial information it has on the site, such as whether pages from this site are included in Google’s index, and the date Googlebot last accessed your home page.

5. Click on the “Verify your site” link to verify that you are the owner of the site.

6. There are two ways to verify: Add a meta tag to the site’s homepage (Google will tell you what the meta tag should look like), or upload a HTML file to the site’s root directory (Google will tell you the file name to use). Choose your method, and you’ll be given directions to set up properly.

7. Once you have either added a meta tag to the site’s homepage, or upload a HTML file to the site’s root directory, you can click on the “Verify” button. You don’t need to do this right away — you can always come back later when you are ready.

8. You may also submit a sitemap to Google. To add a sitemap, click on the “Add a Sitemap” link for the site after you log in to Google Webmaster Tools.

9. Next, select whether you are submitting a regular web sitemap or a mobile sitemap.

10. Specify the location of the sitemap in the textbox.

11. Before you click on the “Add Web Sitemap” button, you’ll then generate a sitemap for your site. A simple way to generate a sitemap is covered in an earlier article titled Creating a Simple Google Sitemap. Once it’s generated and uploaded to your site, click on the “Add Web Sitemap” button.

12. That’s it. In the next post, I’ll take a look at the reports you can see in Google Webmaster Tools.

Yahoo Site Explorer

Yahoo Site Explorer (http://siteexplorer.search.yahoo.com) is a service provided by Yahoo that shows what the Yahoo search engine knows about your site, specifically which pages are indexed, and the number of inbound links. If you register, you can submit a feed to Yahoo to ensure that Yahoo knows all your pages.

The most useful part of Yahoo Site Explorer is the number of inbound links. As we all know, the link: command provided by Google is notoriously inaccurate, and it appears to me that the count returned by Yahoo Site Explorer is more reliable. In addition, Yahoo has made it easy to exclude inbound links from the same domain or subdomain. This is helpful as webmasters often want to know both the total number of inbound links, as well as the inbound links from external sites. It’s true that the above information was already available on Yahoo before Site Explorer, but it is much easier to get at the information now.

One nice feature about Yahoo Site Explorer is that it’s easy to explore different URL’s. For example, as you are looking through all the pages that link to a particular site, a “Explore URL” button appears as you mouse over each page. You can click on that button and instantly get the information for that page. That was very convenient for me.

I also authenticated my site (you’ll need to use your Yahoo ID) to see what additional information I can get. I found out basically the only benefit of authenticating your site is that you can send a feed (basically a sitemap) to Yahoo, and Yahoo tells you when its crawler last accessed that feed. This is less than what I was hoping for. For example, Yahoo does not tell you it cannot find a page listed in the feed, nor does it tell you how many times your site was clicked on from within Yahoo Search.

Overall, Yahoo Site Explorer is a useful product for users to find out more information about a site/page. As a webmasters tool, it lags behind Google’s Webmasters Tool product. I would recommend that you authenticate your site through Yahoo Site Explorer so that you can submit your feed to ensure Yahoo picks up your new pages, but that’s pretty much it.

Comparing Major Search Engines

When it comes to the complexity of search engine algorithms, it is known that MSN is the least sophisticated, Yahoo (Inktomi) is better than MSN, and Google is the most advanced. With that in mind, it is slightly surprisingly to me that I’d find the following rankings for my SQL Tutorial site:

Query Term = SQL Tutorial
Google rank: 3
Yahoo rank: 9
MSN rank: 12

Note that those rankings are all coming from the .com version of the site.

The relative rankings were somewhat unexpected because I had applied all the basic SEO techniques to this site, which should lead to the site ranking well on MSN and Yahoo, which focus more on page content than Google does. But as you can see, this is not the case.

I ran across an article by Aaron Wall of SEO Book.com on search engine relevancy, which shed some light on this matter. Aaron pointed out that Yahoo and MSN results tend to favor commercial sites, while Google favors information/content sites. As my site is clearly content-oriented (confirmed by doing a search at Yahoo Mindset), this explains why it ranks better on Google than on Yahoo and MSN.

Creating a Simple Google Sitemap

There is a way to tell Google what pages exist on your website by creating a sitemap in XML format. Note that this is different from a sitemap HTML page, even though both share the same purpose of making sure the search engine sees your pages. The main difference is that your human visitors will not be viewing this XML sitemap file, where as the HTML sitemap file is geared towards human visitors and search engine bots alike.

There are several pieces of information you can tell Google about your pages, and I will only discuss the most useful portion — listing all the pages. Other information, such as frequency of page updates, is used by Google only as a suggestion, and I don’t use it myself.

To list all your pages is pretty easy, and you can easily generate that file on your own. The syntax is as follows:

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.google.com/schemas/sitemap/0.84″>
¬ ¬ <url>
¬ ¬ ¬ ¬ <loc>[URL 1]</loc>
¬ ¬ </url>
¬ ¬ <url>
¬ ¬ ¬ ¬ <loc>[URL 2]</loc>
¬ ¬ </url>
¬ ¬ …
</urlset>

Simply replace [URL 1], [URL 2], etc, with your own pages.

That’s it! Next, save this file as sitemap.xml. Then, you’ll want to create an account in Google to use its Webmaster Tool to inform Google about the presence of this sitemap file. This way, you can ensure that Google knows about all of your pages. Going forward, whenever you add new pages to your site, just append to the sitemap.xml file and Google will know about those pages.

Convert IP Address to Country

One exercise that you’d like to do is find out which countries your visitors are coming from. There are several reasons for doing this:

1) You might want to tailor your content to that particular demographic.

2) If you have lots of visitors from a particular country, you might want to consider adding a version of your website in that particular language.

3) If your site has IP-based targeting for ads (programs such as Adsense have an IP-targeting component), this will help you understand why, or why not, your users are clicking on your ads.

The way I did it was to download a flat file that include IP-to-country mapping data from the link found here. This is a free database. There are other versions that claim to be more accurate, but they charge a fee. For my purposes, the free version is sufficient.

I first loaded this file into a database and then use a table join to lookup the country code from the IP number. This turned out to be a time-consuming exercise. Then, I remembered that in data warehousing, you want to do as much of your data transformation outside of the database as possible. Applying that principle, I moved the country lookup portion into the perl processing routine prior to loading the data into the database. This move proved to be an excellent time-saver.

Below I show the perl code for matching IP address with country code. There are 3 basic steps:

1. Read IP/Country mapping file.

2. Convert IP address to IP number.

3. Find country code based on IP number

The code for each step is shown below:

1. Read IP/country mapping file

open (IN1,’ip2country.txt’);
## ip2country.txt is the file that stores the
## IP number/country mapping data. Assuming the
## following format: IP_START,IP_END,COUNTRY_CODE.

$i=0;
while (<IN1>) {
chomp;
@ips = split (“,”);
$ip_start[$i] = $ips[0];
$ip_end[$i] = $ips[1];
$ip_country2[$i] = $ips[2];
$i++;
}

2. Convert IP address to IP number.

Assume IP address is already stored in the variable $ip_address

@ipp = split (/\./,$ip_address);
$ip_number = $ipp[0]*256*256*256+$ipp[1]*256*256+$ipp[2]*256+$ipp[3];

3. Find country code based on IP number

## We want to find the country code where $ip_number is between $ip_start and $ip_end.

while ($ip_number > $ip_end[$j]) {
$j++;
}
if ($ip_number > $ip_start[$j]) {
$country = $ip_country2[$j];
} else {
$country = ‘NA’;
}

SEO Technique – Leveraging Yahoo Answers

One method that I’ve never seen anyone talk about for increasing your site’s traffic is leveraging Yahoo Answers (or Yahoo Knowledge, depending on which country you are in). This is especially useful if you have an informational site.

What you can do is to list your site as part the answer, or list it as the resource for your answer. Even though Yahoo has cleverly anticipated this and they have put a rel=nofllow tag on the links there, thus preventing link spam and making this technique worthless in terms of link building, what you are really looking for here is getting actual traffic to your site. This technique will not only get people who start with Yahoo! Answers, but because Yahoo! Answers pages rank well on all major search engines, you’ll also be indirectly capturing the search engine traffic.

The key thing to remember here is that if your site is indeed a valuable source for people’s questions, this technique will for sure generate high-quality traffic to your site. Once people discover your site and like your site, you’ll start to gain additional links naturally, and this will in turn help your search engine rankings.

Posted in SEO