Scripts for API Study
[andromeda] ~/mysql-standard-5.0.24a-solaris9-sparc>./bin/mysqld_safe &
All scripts are located in ~/Projects/compare_se_apis
To compare the total number of results for a search term and interface:
produce_term_counts.pl [term] google API or produce_term_counts.pl [term] yahoo WI
To compare the total number of results for a search term and both interfaces:
produce_term_counts.pl [term] google > results/term_count.txt
To produce the graph from the previous results, from R run:
> source("Z:/Projects/compare_se_apis/R-scripts/term_chart.R")
To produce a count for all terms on all dates:
produce_term_counts.pl -db > results/Tterm_totals.dat
Load into MySQL:
mysqlimport --local search_engines results/Tterm_totals.dat -u root -p --delete --ignore-lines 1
To produce files that contain all URLs for each search term:
produce_term_urls.plwill create files named results/url_results_search_engine_term.txt.
produce_term_urls.pl [term] google APIoutputs to the screen all URLs from Google's API that match term.
produce_term_urls.pl [term] googleoutputs to the screen all URLs from Google's API and WI that match term.
This script requires the produce_term_urls.pl be ran first. To run the Bubble Sort difference algorithm:
compare_term_urls.plto produce files named results/url_results_search_engine_term_diff.txt for every search term and engine. Each file contains the difference between the API vs. WI, API day n vs. API day n+1, and WI day n vs. WI day n+1.
Example output:
2006-07-12 0.9351 0.9728 0.9769 2006-07-13 0.9482 0.9543 0.9593 2006-07-14 0.9323 0.9464 0.9851
To output results that can be injested into MySQL:
compare_term_urls.pl -db > results/Tterm_scores.datTo load into MySQL:
mysqlimport --local search_engines results/Tterm_scores.dat -u root -p --delete
To only process the top 10 results:
compare_term_urls.pl -db -n 10 > results/Tterm_scores_top_ten.datTo load into MySQL:
mysqlimport --local search_engines results/Tterm_scores_top_ten.dat -u root -p --delete
To print the comparison of results (using the bubble sort diff algorithm) for yahoo:
compare_term_urls.pl diff [term] yahoo
To print the comparison of results (using the shared percent of URLS) from days 2006-07-12 and 2006-07-13:
compare_term_urls.pl shared [term] yahoo 2006-07-12 2006-07-13
To output the distances between the API and WUI for various offsets (using all 100 results):
compare_term_urls_offsets.pl -db > results/Tterm_score_offset_ave.datTo load into MySQL:
mysqlimport --local search_engines results/Tterm_score_offset_ave.dat -u root -p --delete
--ignore-lines=1 --columns=term,se,type,offset,api_wi
To output the distances between the API and itself and the WUI and itself (using top 100 results) over time by comparing the first URL result list with all others:
compare_term_urls_over_time.pl -db -look 120 > results/Tresult_changes.datTo load into MySQL:
mysqlimport --local search_engines results/Tresult_changes.dat -u root -p --delete
--ignore-lines=1 --columns=term,se,type,day,wi,api
To output the distances between the API and itself and the WUI and itself (using top 100 results) over time by comparing all URL result lists with each other:
compare_term_urls_over_time.pl -db -look 120 -days 120 > results/Tresult_changes_ave.datTo load into MySQL:
mysqlimport --local search_engines results/Tresult_changes_ave.dat -u root -p --delete
--ignore-lines=1 --columns=term,se,type,day,wi,api
To output the distances between the API and itself and the WUI and itself (using top 10 results) over time:
compare_term_urls_over_time.pl -db -n 10 -look 120 > results/Tresult_changes_top_ten.datTo load into MySQL:
mysqlimport --local search_engines results/Tresult_changes_top_ten.dat -u root -p --delete
--ignore-lines=1 --columns=term,se,type,day,wi,api
To output the distances between the API and itself and the WUI and itself (using top 10 results) over time by comparing all URL result lists with each other:
compare_term_urls_over_time.pl -db -n 10 -look 120 -days 120 > results/Tresult_changes_top_ten_ave.datTo load into MySQL:
mysqlimport --local search_engines results/Tresult_changes_top_ten_ave.dat -u root -p --delete
--ignore-lines=1 --columns=term,se,type,day,wi,api
To print the URL data from the url_results.txt file:
parse_url_results.pl > results/Turl_results.datTo load into MySQL:
mysqlimport --local search_engines results/Turl_results.dat -u root -p --delete
--ignore-lines=1
--columns=date,url,se,site,wi_indexed,wi_cached,wi_backlinks,wi_site,api_indexed,api_cached,
api_backlinks,api_site
To calculate the percentage of disagreements each day for total results for search terms:
compute_agreements.pl -type termsTo calculate disagreements for total terms, total pages indexed, etc.:
compute_agreements.pl > results/Tagreements.datTo load into MySQL:
mysqlimport --local search_engines results/Tagreements.dat -u root -p --delete
--ignore-lines=1 --columns=date,se,type,over,under
To calculate the percentage of disagreements each day for total results for search terms where disagreement means the API values is outside the limits of +- 10% of the WUI values.
compute_agreements.pl -type terms -a 10 > results/Tagreements_partial.datTo load into MySQL:
mysqlimport --local search_engines results/Tagreements_partial.dat -u root -p --delete
--ignore-lines=1 --columns=date,se,type,over,under
To calculate the decay half-life of search results:
compute_half_life.pl > results/Thalf_life.datTo load into MySQL:
mysqlimport --local search_engines results/Thalf_life.dat -u root -p --delete
--ignore-lines=1 --columns=termtype,term,se,wi_day,api_day
On andromeda:
List the current crontab: crontab -l
Delete the current crontab: crontab -r
To create a crontab: crontab -e
01 * * * * echo "This command is run at one min past every hour - test" 28 17 * * * /home/fmccown/cron/run_scripts (this runs daily at 5:28 pm)
There is currently a cron job running on beatitude.
Page last modified: