Directory structure: -------------------- lib contains the library and the output of compiling process models contains trained models: sentence segmentation, POS tagging, and phrase chunking src contains source code files webdata contains sample outputs of JWebPro Makefile make file README this one How to compile JWebPro: ------------------------ make clean make all after compiling, the output is lib/jwebpro.jar How to test: ------------ go to webdata/samples directory and perform the following command make run note that, you have to provide Google API client key to JWebPro in two ways: - set the clientKey in option.txt file of each search transaction, e.g., see the webdata/samples/BillGates/option.txt for more information - set the clientKey attribute of class WebRetOption in src/jwebpro/webret/WebRetOption.java after running, visit sub-directories such as BillGates, DonaldKnuth, etc. to see the output of JWebPro. For example, the outputs in webdata/samples/BillGates are: a) retrievallog.txt: the retrieval log file b) GoogleSummary/*-google.html: contain summary information of each search result such as URL, Title, and Web Snippet c) WebtxtContent/*-webtxt.txt: contain the content of web pages together with POS tags, and phrase chunks How to use JWebPro: --------------------- command line: java -mx512M -classpath jwebpro.webret.WebRet where: -mx512M: to extend heap memory for JVM is the path to: lib/googleapi/googleapi.jar lib/htmlparser/htmlparser.jar lib/jtextpro/jtextpro.jar lib/jwebpro.jar for example, if you are currently at the top directory of JWebPro, the CLASSPATH is ./lib/googleapi/googleapi.jar:./lib/htmlparser/htmlparser.jar:./lib/jtextpro/jtextpro.jar:./lib/jwebpro.jar is the path to the directory containing the option and the output of a search transaction. For running, you have to prepare "option.txt" file in each transaction directory. See webdata/samples/BillGates for more information example: if we are currently at webdata/samples directory and want to perform the search with the query "Donald Knuth", we prepare the directory transaction webdata/samples/DonaldKnuth with the option.txt file and then execute the following command: java -mx512M -classpath ../../lib/googleapi/googleapi.jar:../../lib/htmlparser/htmlparser.jar:../../lib/jtextpro/jtextpro.jar:../../lib/jwebpro.jar jwebpro.webret.WebRet ./DonaldKnuth the outputs will be: - webdata/samples/DonaldKnuth/GoogleSummary/* - webdata/samples/DonaldKnuth/WebtxtContent/* - webdata/samples/DonaldKnuth/retrievallog.txt Search Query: ------------- Here are some examples of search queries specified in option.txt file example 1: if the search query is ["Bill Gates"], you only specify the following line in option.txt file: searchKeyword=Bill_Gates example 2: if the query is [Bill Gate "Steve Ballmer"], you need three lines in the option.txt file as follows: searchKeyword=Bill searchKeyword=Gates searchKeyword=Steve_Ballmer Notes: ------ - To compile JWebPro on Windows, modify the Makefile to replace the path separator character "/" by "\"