LASI - Linguistic Analysis for Subject Identification




Welcome to the CS410 Red group's page. This project is part of the Computer Science Department at Old Dominion University. We are working on developing a program called LASI. LASI will be a decision support tool that will take in multiple documents of varying file types and return a weighted list of themes common to all. The case study for this project is Dr. Patrick Hester and Dr. Tom Meyers at NCSOSE.

Date
Event
Mon 12/10/12: Final Presentation to Dr. Hester and Prof. Brunnell and Price.
Week of Nov. 26 Milestone Presentation
Mon 11/19/2012: Feasibility Presentation with Milestone Diagram
Mon 10/22/2012: Feasibility Presentation (Revised)
Wed 10/03/2012: Risk Presentation
Mon 09/24/2012: Presentation 1
Wed 09/12/2012: Meet with Dr. Hester to start clarifying the solution.
Tues 09/11/2012: Individual members prepare questions for meeting with Dr. Hester

Currently Working On
IMPLEMENTATION & DOCUMENTATION

LASI


Project Information


Select a Topic
  • Themes and Their Importance
  • What's Wrong? (Societal Problem)
  • About LASI (Our Proposed Solution)
(Click on a topic to display the information)

Themes and Their Importance

What is a theme? (The 5 W's & 1 H)
  • Who
  • What
  • When
  • Where
  • Why
  • How

Why is it important to be able to identify themes?

Themes are important because they help the reader to comprehend what has just been read. Then, if they can comprehend what was read, they can also summarize the material. Comprehension and summarization are important because together they help the reader to communicate the content of the material with other people.

What's Wrong? (Societal Problem):

It is difficult for people to identify a common theme in a large set of documents in a timely, consistent, and objective manner.

About LASI (Our Proposed Solution)

LASI will extrapolate the most likely congruence of themes and ideas across all documents in the input domain. LASI is a linguistic analysis decision support tool used to help determine said themes. It is our goal with LASI to:

  • Accurately Find Themes
  • Be System Efficient
  • Provide Consistent Results

The Basic Premise Behind the LASI Algorithm:

Presentations


4) Milestone Presentation

Here we discuss the Milestones involved in the development of LASI


3) Feasibility Presentation with Milestone Diagram

Here we add our Milestone Diagram to the Feasibility Presentation


2) Feasibility Presentation (Revised)

Here we discuss the feasibility of LASI


1) Presentation 1

Here is an overview of the initial planning and research stages of this project

Use Stories


Select a Story Topic
  • Start Up
  • During Execution
  • Results
  • Expectations
(Click on a topic to display the information)

On Start Up

  1. I want to parse any type of text document I import.
  2. I need to parse scanned text.
  3. I want to parse documents in multiple languages.
  4. I need to quickly create a project and begin analysis.
  5. I need your software to run without me having to change options or parameters.

During Execution

  1. I want to be able to pause my progress in case I am required to move locations or start another task.
  2. I would like to open separate projects so that I may collect documents and parse in parallel.
  3. I want to be able to manually increase the weight of specific topics.
  4. I want to re-use existing parses in new projects.
  5. I want the charts, graphs, etc. inside of my documents to be parsed alongside the text.
  6. Your software needs to be able to account for incomplete sentences and still utilize their content.
  7. I'd like to see the themes being updated in real time.

Results

  1. I want to be able to view results visually.
  2. I'd like to be able to save my results to view in a third party program (such as Microsoft Office) to show others.
  3. I want to be able to determine the theme of a single document.
  4. I want to be able to determine the theme of multiple documents.
  5. The program needs to save the documents used when parsing so that I know what has been themed.
  6. I want to view the output in multiple ways.
  7. I want to print my results.

User Expectations

  1. I want your software to run quickly and efficiently.
  2. I don't want to be confused by the interface nor do I want to read a manual to figure out how to use your software.
  3. I expect the output to be comprehendable.
  4. I need to know that the results are accurate.
  5. I expect the results to be consistent if I import the same documents twice.

Risks

    Customer Risks:
  • C1 -- Product Interest
  • C2 -- Maintanence
  • C3 -- Trust
    Technical Risks:
  • T1 -- System Limitations
  • T2 -- Scanned Text Recognition
  • T3 -- Jargon Recognition
  • T4 -- Illegal Character Handling

Mitigations

C1. Product Interest   |   Probability - 2 / Impact - 4

LASI offers unique functionality and user-friendliness.


C2. Maintenance   |   Probability - 3 / Impact - 2

LASI will be a free, open source application allowing the community to maintain and extend it over time.


C3. Trust   |   Probability - 3 / Impact - 3

LASI will provide a step by step breakdown of output analysis and algorithm reasoning.


T1. System Limitations   |   Probability - 4 / Impact - 2

LASI will be designed from the ground up in native C++ for memory and CPU efficient code.


T2. Scanned Text Recognition   |   Probability - 4 / Impact - 3

LASI will implement an optical character recognition algorithm to handle scanned text.


T3. Jargon Recognition   |   Probability - 4 / Impact - 3

LASI will have domain specific dictionaries and feature intuitive contextual inference.


T4. Illegal Character Handling   |   Probability - 4 / Impact - 2

LASI will providers contextual inference, synonym recognition and statistical methods.

CS410 Red Group Members


Select a name
  • Scott Minter

    Scott Minter

    Project Co-Leader & Software Specialist

    Email - sminter@gmail.com

    I'm Scott Minter. I'm a Senior in the Computer Science department here at ODU. I minored in American Studies. I've worked for a year and a half at a research firm in Williamsburg called Borrell Associates, who specializes in marketing research. While there I've done a lot of work using Jquery and PHP turning Excel spreadsheets used for research into web based applications.

  • Dustin Patrick

    Dustin Patrick

    Algorithm Specialist & Expert Liaison

    Email - dpatr004@gmail.com

    My name is Dustin Patrick. I am an ODU Super-junior studying Computer Science and Modeling and Simulation. I am an articulate speaker and an avid fan of pop culture. I will probably annoy the rest of my group with my obscure references to television and movies. I work at InMotion Hosting doing tech support and love it. My strong points for coding are C/C++ in Linux Environments, PHP, and bash scripting. I am also not afraid to ask for help if I need it. My weaknesses are working with Windows and setting up functional IDEs. I also enjoy long walks on the beach and the occasional Manhattan.

  • Brittany Johnson

    Brittany Johnson

    Project Co-Leader & UI Designer

    Email - bjohn071@odu.edu

    My name is Brittany Johnson. I'm a Senior here at ODU studying Computer Science and Mathematics. In my spare time I like watching classic horror movies and doing the cryptoquip in the paper. Im hoping that my background in mathematics will be an asset to my group.

  • Richard Owens

    Richards Owens

    Documenation Specialist & Communication Specialist

    Email - rowens@cs.odu.edu

    Born at an age when computers were still considered a "fad" by some; my mother bought a computer when I was one year old. Since that day, I have been diving into computer science and never turned back. I have been studying at ODU for too long. I am an ex-Systems employee, and now work as a network technician. I became enthralled with system administration and networking. Along side my computer related endeavors, I am also a percussionist, ice hockey player, and ice hockey referee. I do not practice percussion much anymore, but still play in a recreational ice hockey league. I am also lucky enough to referee other recreational leagues, along with college club games; which includes William & Mary University, Christopher Newport University, and Old Dominion University.

  • Aluan Haddad

    Aluan Haddad

    Algorithm Specialist & Software Specialist

    Email - aluanh@gmail.com

    Hi, I'm Aluan Haddad. I am currently a senior in the Old Dominion University Department of Computer Science. Programming wise, I am Fluent in the C++, Java, and C# languages and am primarily interesting in projects related to Game Development, User Interfaces, Artificial Intelligence, and Self Modifying Algorithms. More Broadly, I am interested in Film and Literature (both Fiction and Nonfiction), Video Games, and Radio Broadcast Comedy. I am also keenly interested in Philosophy and Psychology, Storytelling and World Building, especially as they relate to my other areas of interest.

  • Erik Rogers

    Erik Rogers

    Marketing Specialist & Documentation Specialist

    Email - eroge009@odu.edu

    My name is Erik Rogers--a senior here at Old Dominion majoring in Computer Science and minoring in Mathematics; I plan on pursuing my Master's once I've graduated in the Spring. Currently, I am a web designer/developer for the University of Virginia in Charlottesville, VA, NVM Services and The Wright Company (including the new Urban Outfitters) here in Norfolk, VA, as well as a bushel of individuals across the globe. I prefer extracting creative aspects from boring technical elements and very much enjoy design, colour, art, abstract concepts, and manipulation of language. I spend most of my time exploring my mind and levels of consciousness. I'm currently writing an essay on the abstraction of language.

(Click on a name to see the Bio)

Papers


1) CS411 Written Summary

This is an overview of LASI and our Prototype

Bibliography

CS410 Red Group Fall 2012