SI 618 HW 1 - Interspike Codex

SI 618 HW 1

SI 618 Fall 2008

Overview of Homework 1

A study of noun phrase (NP) lists generated from the textual content of web pages, documents, and PDF files. The URLs were gathered by a spider that targeted university pages dealing with institutional diversity.

Objectives

Generate a Monty Lingua NP list from a subset of authors, genres, and/or schools. By studying the list, attempt to reveal something about the targeted audience. Generate another list in a similar way and compare the two lists. Attempt to identify differences between the selected audiences and genres.

Deliverables

Create a report that includes the following:

  1. an abstract of less than 150 words
  2. a description of the data used
  3. a diary of what was done
  4. the results (lists)
  5. a statement of what the lists mean
  6. document everything on a web page and put a link to it in the class wiki

See SI_618_HW_1:_Homework_Assignment.