Learning goals

Prerequisites

Please read the following before you start this assignment, to learn about REST APIs:

The Assignment

In the past assignment, we loaded some data into Elastic Search.  Now we will write a backend application to use it.  You'll provide a REST API to search the news article database.

HW2 will be graded by an automated test script. 

Note that again this is NOT a group project. 

A Java Servlet

This time we are going to build a ".war" file, which is a special type of jar file that is meant to be run on a Java web application server, such as Tomcat.  Java web applications are written as subclasses of javax.servlet.http.HttpServlet, a built-in abstract class that provides a framework for handling HTTP requests.

Note that we are not going to subclass HttpServlet directly, but instead use JAX-RS and Jersey. Another alternative is to use a frameworks such as Spring.  The JAX-RS style we're using here is both simple and efficient.

A REST API

Your goal in this assignment is to build a servlet that implements the following very simple API:

GET /api/search

{
"returned_results": INTEGER,
"total_results": INTEGER,
"articles": [
{
"title": STRING,
"url": STRING,
"txt": STRING,
"date": STRING, /* null or missing if not available */
"lang": STRING /* null or missing if not available */
},
...
]
}

Provided code skeleton

Skeleton code for HW2 has been pushed to https://github.com/starzia/ssa-skeleton.

You will run the code with a slightly different maven command than we used in HW1.  Your run configuration should run "clean package tomcat7:run" or "clean package cargo:run".  The first option is preferred because it allows debugging in IntellJ, however the second option uses a newer version of Tomcat (version 8.5 instead of version 7) which will cause fewer weird warning messages to appear when running.  Alternatively, you can run the following on the command line using "mvn clean package tomcat7:run" or "mvn clean package cargo:run".

Note that Tomcat is a Java web application server.  The command above runs Tomcat on your machine and launches your code within it.  That's just for testing purposes.  In a real deployment, you would create a Tomcat environment on AWS Elastic Beanstalk and with a few clicks launch your war file on one or more servers in the cloud.  We'll do this later.

Note that the code should never stop running unless you terminate it with the "stop" button in the IDE (or control-C on the command line).  It just stays alive and continues to handle requests.  You can test it by putting  url like this in your browser: http://localhost:8080/api/search?query=hello

Note that "localhost" is just a special hostname for your own machine.  ":8080" is the port number of the web server.  When the colon and number are omitted then the default http port of 80 (or https port of 443) are used.  Your java code is using port 8080 because most systems require root/admin access to listen on a port number less than 8000.  This prevents unprivileged users from running an official-looking service.

Environment variables

Your servlet must connect to an Elastic Search cluster to fetch data.  It's technically possible to connect to the Elasticsearch database that you created in HW1, but instead I want you all to connect to my Elasticsearch database.  This will allow you to shut off your ES database (saving money) and it will ensure that everyone's code is compatible with the same database schema (data format).  The following environment variables should control your ES connection:

Actually, Java Servlet containers (like Tomcat) often use System Properties instead of Environment Variables to configure apps.  If you wish to deploy this code to the cloud, then I recommend that you check for the configuration variables in both places, as follows:

private static String getParam(String paramName) {
String prop = System.getProperty(paramName);
return (prop != null)? prop : System.getenv(paramName);
}

Document format

The documents that were uploaded to Elastic Search are similar to those you imported in HW1, but there are some extra fields to help with the filtered searches:

{
"title": "My great webpage",
"url": "https://hereiam.com",
"txt": "I posted this page because I have something important to say....",
/* the fields below are both optional, they might be missing or have null value */
"lang": "en",
"date": "2019-10-02"
}

Querying Elastic Search

You do not have permission to run POST or PUT requests on the shared Elastic Search instance (ssa-elasticsearch.stevetarzia.com).  That's because I don't want you to change the data I already posted.  So, you must implement your queries entirely with GETs and query parameters.  You will not be able to submit a request in an HTTP body (as a JSON object).  You can read about the query string request format here.  Elastic search uses the Lucene query syntax.   An example Elastic Search request might look like:

Note that the Lucene format query is "(this AND that) AND lang:en"  but the spaces are "escaped" to "%20" by AwsSignedRestRequest (because HTTP urls cannot contain spaces).  This is called URL encoding.

You can test your Elasticsearch query syntax by entering urls in a web browser, like: https://ssa-elasticsearch.stevetarzia.com/_search?q=txt:northwestern 

Testing

You can do some initial tests in your web browser or using another REST or HTTP tool.  Try as many different variations of the requests as possible.

Compare your results to those returned by the reference implementation: http://ssa-hw2-backend.stevetarzia.com/api/search

We are providing a Python script to test your code: hw2_tester.tar.gz.  It is essential that you run this test script on your war file before submitting it to Canvas.  The provided script is very similar to the auto-grading script that we will be running.  You must run it on moore.wot.eecs.northwestern.edu.  The README.txt file gives the exact commands to run the tester on moore.

The tester expects to find a file named search-api-1.0-SNAPSHOT.war in the current working directory.  This should be copied from the "target" subdirectory of your project.  The script needs to know where to find the Java and Maven commands on your system.  If you want to do some debugging of the tester by running it locally on your machine you will have to edit the script to change the following constants (defined near the beginning of the file): JAVA_HOME, MVN.  However, your final tests must be on moore.

Submission

Post your .war file in this assignment.