by Daniel Nicholls (dpnick in BIT330, Fall 2008)
Questions and queries
Web search engines
The National Hockey League is one of the premiere sports leagues in America. While it may not match the NFL or NBA in size or popularity, it still has a very large fan base and creates billions of dollars in revenue. After reaching a low point during the 20045 lockout, it has grown in strength every year since. I know that Detroit was one of the first few teams in the league, but how many teams were there when the league first started?
The query I used for this search was: “NHL number teams beginning”.
Blog search engines
This year’s election is a very close race between Barack Obama and John McCain. Recently both have chosen their Vice Presidential picks. While Obama chose a wellknown senator in Joe Biden, McCain picked a relatively unknown official in Sarah Palin. Not many people, including myself, had ever heard of Sarah Palin. I have heard that her current position is the governor of Alaska, and I was curious how many years has she served at this position?
The query I used for this search was: “Sarah Palin governor Alaska”
Data that I collected
Search engine overlap data
Web search 
Live 
Google 
Yahoo Web 
Live 
15 
5 
10 
Google 

65 
25 
Yahoo Web 


45 
All 
5 



Blog search 
Technorati 
Google Blog 
Bloglines 
Technorati 
15 
0 
5 
Google Blog 

40 
5 
Bloglines 


45 
All 
0 



Search engine ranking overlap data
This table provides a measure of how much of Google's responses are reproduced by Yahoo.
GY 
Yahoo 
Google 
5 
10 
20 
5 
3 
3 
4 
10 
3 
3 
4 
20 
3 
3 
5 

This table provides a measure of how much of Yahoo's responses are reproduced by Google.
YG 
Google 
Yahoo 
5 
10 
20 
5 
3 
3 
3 
10 
3 
3 
3 
20 
4 
4 
5 

This table provides a measure of how much of Blogline's responses are reproduced by Google Blog Search.
BG 
Google 
Bloglines 
5 
10 
20 
5 
0 
0 
0 
10 
0 
0 
0 
20 
1 
1 
1 

This table provides a measure of how much of Google Blog Search's responses are reproduced by Bloglines.
GB 
Bloglines 
GBlog 
5 
10 
20 
5 
0 
0 
1 
10 
0 
0 
1 
20 
0 
0 
1 

Results
Web search
This table provides an average of the precision of Web Search for all students in BIT330 Fall 2008.
Web search 
Live 
Google 
Yahoo Web 
Live 
43 
18 
20 
Google 

54 
21 
Yahoo Web 


52 
All 
10 



This table shows the average of all students in BIT 330. To interpret this table, you must remember that these numbers are percentages. When a number is in the column and row of the same site, this shows the precision of the site. For example, where the Google row and Google column meet we see 54%. This illustrates that 54% of the query searches in Google, on average, brought back accurate results. When a row labeled with one site intersects a row of another site, this is showing the percentage that the two sites overlap. For instance, where Live and Yahoo Web met we see 20%. This means that 20% of the relevant searches in Windows Live were also in Yahoo Web, on average. Finally, the 10% under All illustrates that 10% of 20 searches (or 2 searches total) were found in all three Web searches, on average.
This table provides a measure of the average of all BIT330 students of how much of Google's responses are reproduced by Yahoo.
GY 
Yahoo 
Google 
5 
10 
20 
5 
1.06 
1.29 
1.65 
10 
1.35 
2.00 
2.47 
20 
1.63 
2.65 
3.71 

This table provides a measure of the average of all BIT330 students of how much of Yahoo's responses are reproduced by Google.
YG 
Google 
Yahoo 
5 
10 
20 
5 
1.06 
1.47 
1.88 
10 
1.18 
1.94 
2.65 
20 
1.65 
2.47 
3.76 

Both tables above illustrate the average of all BIT330 students. If we look under Google row 5 and Yahoo column 5, we see the number 1.06. This means that on average, there are 1.06 results that are in the top 5 of Google that are also in the top 5 results of Yahoo. If we look under Google row 20 and Yahoo column 5, we see that number 1.63. This means that on average, there are 1.63 responses in the top 5 results of Yahoo that are also in the top 20 results of Google. We should be able to match up parallel data between the two tables, however, some of the numbers are different. For example, the top 20 of both Google and Yahoo is 3.71 in the first table and 3.76 in the second. This portrays an error in some of the student's reporting.
Blog search
This table provides an average of the precision of Blog Search for all students in BIT330 Fall 2008.
Blog search 
Technorati 
Google Blog 
Bloglines 
Technorati 
33 
4 
9 
Google Blog 

53 
7 
Bloglines 


44 
All 
1 



This table shows the average of all students in BIT 330. To interpret this table, you must remember that these numbers are percentages. When a number is in the column and row of the same site, this shows the precision of the site. For example, where the Bloglines row and Bloglines column meet we see 44%. This illustrates that 44% of the query searches in Bloglines brought back accurate results, on average. When a row labeled with one site intersects a row of another site, this is showing the percentage of results that the two sites overlap. For instance, where Technorati and Google Blog met we see 4%. This means that 4% of the relevant searches in Technorati were also in Google Blog, on average. Finally, the 1% under All illustrates that 1% of 20 searches were found in all three Web searches, on average.
This table provides a measure of the average of all BIT330 students of how much of Blogline's responses are reproduced by Google Blog Search.
BG 
Google 
Bloglines 
5 
10 
20 
5 
5 
7 
9 
10 
6 
9 
15 
20 
10 
14 
19 

This table provides a measure of all BIT330 students of how much of Google Blog Search's responses are reproduced by Bloglines.
GB 
Bloglines 
GBlog 
5 
10 
20 
5 
5 
7 
12 
10 
6 
8 
13 
20 
8 
14 
18 

Both tables above illustrate the total responses of all BIT330 students. If we look under Bloglines row 5 and Google column 5, we see the number 5. This means that in total of all the students, there were 5 results in the top 5 of Bloglines that were also in the top 5 results of Google. If we look under Bloglines row 20 and Google column 5, we see that number 10. This means that in total of all students, there were 10 responses in the top 20 results of Bloglines that were also in the top 5 results of Google. We should be able to match up parallel data between the two tables, however, some of the numbers are different. For example, the top 20 of both Google and Bloglines is 19 in the first table and 18 in the second. This portrays an error in some of the student's reporting.
Discussion
Web search
Looking at the two sets of data, we see that while all three web sites are relatively accurate in finding data (approximately 50% of the time), they normally do it through different channels. In the top 20 results of both Google and Yahoo, there are only 3.71 results that are the same. Given that Google is accurate 54% of the time and Yahoo 52%, this means that most of the time the two searches bring back good results but from different web sites. This is a very good finding because if a user is having trouble finding information through one search site, they can always try another. This is because they are all relatively effective (Google being the most effective at 54% and Windows Live the least at 43%), and on average bring back different results. This is my biggest finding through this experiment because it tells me that the different search engines actually have very different results, on average. I was very surprised to see how little the sites overlapped. One question that might be beneficial to research is: what kind of websites did these engines overlap on? For example, if all of the overlaps came from large, wellestablished websites, then that would imply that these engines have very different techniques. If this were the case, it might be helpful to investigate this further to understand why they bring back such different results.
Blog search
The results from the Blog search illustrate similar findings as the Web search, but to an even further extent. All of the engines showed relatively effective results, on average. Google Blog was certainly the most efficient, as it found helpful results 53% of the time. And while Technorati only found good results 33% of the time, this is still pretty effective. It is amazing to see how little the blog search engines found in common. Throughout a class of 17 students, Bloglines and Google Blog brought back a total of 19 similar findings. To a person searching for information, I would recommend trying a variety of different blog searches when researching a topic. They all bring back relatively accurate and unique responses. Eventually, one may find one of the blog searches to be their favorite. One thing that I learned from this exercise was that even though these searches are effective, they can provide misleading results, and one must be careful when analyzing their search. It's also important to realize that this experiment only covers three of the possible hundreds of different blog searches. There is a wealth of research capabilities out there!