Scripts for API Study

MySQL

To start MySQL running on andromeda:
[andromeda] ~/mysql-standard-5.0.24a-solaris9-sparc>./bin/mysqld_safe &

Perl scripts

All scripts are located in ~/Projects/compare_se_apis

To compare the total number of results for a search term and interface:

produce_term_counts.pl [term] google API 
or
produce_term_counts.pl [term] yahoo WI

To compare the total number of results for a search term and both interfaces:

produce_term_counts.pl [term] google > results/term_count.txt

To produce the graph from the previous results, from R run:

> source("Z:/Projects/compare_se_apis/R-scripts/term_chart.R")

To produce a count for all terms on all dates:

produce_term_counts.pl -db > results/Tterm_totals.dat

Load into MySQL:

mysqlimport --local search_engines results/Tterm_totals.dat -u root -p --delete --ignore-lines 1

To produce files that contain all URLs for each search term:

produce_term_urls.pl
will create files named results/url_results_search_engine_term.txt.

produce_term_urls.pl [term] google API 
outputs to the screen all URLs from Google's API that match term.

produce_term_urls.pl [term] google 
outputs to the screen all URLs from Google's API and WI that match term.


This script requires the produce_term_urls.pl be ran first. To run the Bubble Sort difference algorithm:

compare_term_urls.pl
to produce files named results/url_results_search_engine_term_diff.txt for every search term and engine. Each file contains the difference between the API vs. WI, API day n vs. API day n+1, and WI day n vs. WI day n+1.

Example output:

	2006-07-12      0.9351  0.9728  0.9769
	2006-07-13      0.9482  0.9543  0.9593
	2006-07-14      0.9323  0.9464  0.9851

To output results that can be injested into MySQL:

compare_term_urls.pl -db > results/Tterm_scores.dat
To load into MySQL:
mysqlimport --local search_engines results/Tterm_scores.dat -u root -p --delete

To only process the top 10 results:

compare_term_urls.pl -db -n 10 > results/Tterm_scores_top_ten.dat
To load into MySQL:
mysqlimport --local search_engines results/Tterm_scores_top_ten.dat -u root -p --delete

To print the comparison of results (using the bubble sort diff algorithm) for yahoo:

compare_term_urls.pl diff [term] yahoo

To print the comparison of results (using the shared percent of URLS) from days 2006-07-12 and 2006-07-13:

compare_term_urls.pl shared [term] yahoo 2006-07-12 2006-07-13


To output the distances between the API and WUI for various offsets (using all 100 results):

compare_term_urls_offsets.pl -db > results/Tterm_score_offset_ave.dat
To load into MySQL:
mysqlimport --local search_engines results/Tterm_score_offset_ave.dat -u root -p --delete 
    --ignore-lines=1  --columns=term,se,type,offset,api_wi


To output the distances between the API and itself and the WUI and itself (using top 100 results) over time by comparing the first URL result list with all others:

compare_term_urls_over_time.pl -db -look 120 > results/Tresult_changes.dat
To load into MySQL:
mysqlimport --local search_engines results/Tresult_changes.dat -u root -p --delete 
    --ignore-lines=1  --columns=term,se,type,day,wi,api


To output the distances between the API and itself and the WUI and itself (using top 100 results) over time by comparing all URL result lists with each other:

compare_term_urls_over_time.pl -db -look 120 -days 120 > results/Tresult_changes_ave.dat
To load into MySQL:
mysqlimport --local search_engines results/Tresult_changes_ave.dat -u root -p --delete 
    --ignore-lines=1  --columns=term,se,type,day,wi,api


To output the distances between the API and itself and the WUI and itself (using top 10 results) over time:

compare_term_urls_over_time.pl -db -n 10 -look 120 > results/Tresult_changes_top_ten.dat
To load into MySQL:
mysqlimport --local search_engines results/Tresult_changes_top_ten.dat -u root -p --delete 
    --ignore-lines=1  --columns=term,se,type,day,wi,api


To output the distances between the API and itself and the WUI and itself (using top 10 results) over time by comparing all URL result lists with each other:

compare_term_urls_over_time.pl -db -n 10 -look 120 -days 120 > results/Tresult_changes_top_ten_ave.dat
To load into MySQL:
mysqlimport --local search_engines results/Tresult_changes_top_ten_ave.dat -u root -p --delete 
    --ignore-lines=1  --columns=term,se,type,day,wi,api


To print the URL data from the url_results.txt file:

parse_url_results.pl > results/Turl_results.dat
To load into MySQL:
mysqlimport --local search_engines results/Turl_results.dat -u root -p --delete 
	--ignore-lines=1
    --columns=date,url,se,site,wi_indexed,wi_cached,wi_backlinks,wi_site,api_indexed,api_cached,
              api_backlinks,api_site


To calculate the percentage of disagreements each day for total results for search terms:

compute_agreements.pl -type terms
To calculate disagreements for total terms, total pages indexed, etc.:
compute_agreements.pl > results/Tagreements.dat
To load into MySQL:
mysqlimport --local search_engines results/Tagreements.dat -u root -p --delete 
    --ignore-lines=1  --columns=date,se,type,over,under


To calculate the percentage of disagreements each day for total results for search terms where disagreement means the API values is outside the limits of +- 10% of the WUI values.

compute_agreements.pl -type terms -a 10 > results/Tagreements_partial.dat
To load into MySQL:
mysqlimport --local search_engines results/Tagreements_partial.dat -u root -p --delete 
    --ignore-lines=1  --columns=date,se,type,over,under


To calculate the decay half-life of search results:

compute_half_life.pl > results/Thalf_life.dat
To load into MySQL:
mysqlimport --local search_engines results/Thalf_life.dat -u root -p --delete 
    --ignore-lines=1  --columns=termtype,term,se,wi_day,api_day


Cron job

On andromeda:

List the current crontab: crontab -l
Delete the current crontab: crontab -r
To create a crontab: crontab -e

   01 * * * * echo "This command is run at one min past every hour - test"
   28 17 * * * /home/fmccown/cron/run_scripts    (this runs daily at 5:28 pm)

There is currently a cron job running on beatitude.


Research | Home

Page last modified: