Si618f08 HW3 Description - Interspike Codex

Si618f08 HW3 Description

Taken from bart.si.umich.edu:/u/mcq/si618week3/hw3.txt

After hw2, you should have two (or more) lists of nominal phrases comparing two (or more) sets of documents. You've stored these in a sqlite3 database. For hw3, create a back to back histogram showing the number of documents containing each of these phrases in both sets. Connect the sqlite3 database to R to accomplish this.

An example of a back-to-back histogram can be found at

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=136

although this isn't exactly what your display should look like. Your display should have the nominal phrase on the same line as the histograms, which may be differentiated in any way you see fit. In the example above, one is bright blue and one is bright red, but you may prefer a different way.

If you have generated more than two lists, you may want to select the two most interesting or generate several pairwise back-to-back histograms. Your call.

As before, describe your process and analyze your results. This time, you must use the number of documents in which phrases occur. This number will probably be measured on the x-axis of your back-to-back histograms.

This hw will be provisionally due in class on Nov 18.