Unfortunately, you will not be able to complete this assignment unless you are a currently enrolled student at Northwestern University, since it requires access to a large database on their network.

In this assignment you will connect to MySQL databases using the instructions in the provided document called "EECS-317 MySQL Workbench instructions.pdf".

Note that MySQL Workbench automatically adds a "LIMIT 1000" to your query, so you may have to override this with a higher limit in some queries.

You may want to use "SHOW TABLES", "DESCRIBE [table]", and "SELECT * FROM [table]" commands to explore each database's schema.

Yelp database

The database “yelp_db” has data from the Yelp local business review app (http://yelp.com/). You will use this  database to answer the following questions:

1) How many rows are in the friend table? Hint: This query will take a long time. (write the query and the answer)

2) Although this Yelp database is pretty big, it’s not the full Yelp database. To get an idea of what it covers, please determine which state has the most businesses. (write the query, the state, and the number of businesses)

3) There is no question 3!

4) How many businesses have the text “McDonald” in their name? (write the query and the answer) Hint: the answer is between 500 and 1000.

5) Where is the highest-rated McDonald’s restaurant? (write the query and the answer)

Hint 0: Please ignore the "stars" column in the business table and instead use an average of the values in the review table.  This is more accurate.

Hint 1: You must use the category called “Restaurants”, otherwise you’ll get results for other types of businesses with “McDonald” in the name.

6) What are the names of the businesses having at least 5 reviews containing the word “barf”? (write the query and the answer)

Hint: There are five such businesses, but three of them actually have the word “barf” within the name of the business (that's OK).

Stack Overflow database

The database “stackoverflow” has data from the Stack Overflow computer programming Q&A website (https://stackoverflow.com/). This database is about 15 times large than the Yelp database.  You will use this  database to answer the following questions:

7) How many rows are in the Comments table? How long did it take to run this query?

8) What is the text of the row in the Comments table with id=30000000 (30 million)? How long did this query take?

9) Run this query to get the one comment made by user 19:

       SELECT * FROM Comments WHERE userId=19;

What are the first five words in the comment? How long did this query take?

10) Why was the query in question 8 so much faster than the query in question 9?

HINT: You can click "execution plan" after running a query to get a visual explanation of how it was executed.  Alternatively, use the EXPLAIN command.  Read about the EXPLAIN output here: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html 

11) Why did the query in question 9 take about the same amount of time (same order of magnitude) as the query in question 7?

12) How could you modify the database schema to make the query in question 9 much faster?