Milestone Due Dates
Abstract |
Friday, Sep. 27, On Google Colab, Submit your URL to Piazza
| 2 pts |
Progress Checks |
Oct 15, Nov 01, Nov.19 | 3 pts |
Presentation/ Demo |
Nov. 26 in Class, 3 minute talk presentation
| 5 pts |
Final Report |
Dec. 06 (Complete report in Colab) | 5 pts |
Introduction
The data project is an opportunity to tackle a more challenging
data science activity. For the project, you are required to individually work on a dataset of your
choosing that is interesting, significant, and relevant to Data Science. The
ultimate goal of your data project is to apply the techniques learn in each
week of the class towards your dataset (exploration, wrangling, machine
learning, visualization). We are going to use Google Colab (Colaboratory) (https://colab.research.google.com/),
a free Jupyter notebook environment that requires
no setup and runs entirely in the cloud. With Colaboratory you can write and
execute code, save and share your analyses, and access powerful computing
resources, all for free from your browser
Project Abstract
The abstract (in Googel Colab) should include the
following information:
○ Data Source
- URL,
- a short description
- first few records of the dataset (head())
○ Your
end goal with this dataset (build a recommender system, prediction
model/classifier, evaluaiton of models, visualizing something, infer
something, or something else)
○ Any
secondary datasets you are planning to utillize to augment your primary
dataset (should be clearly specified that this is a secondary dataset)
● You
can take as
much space as you need for the project abstract, but I would guess that most would be in the two-three page range.
● You
need to have
an acceptable abstract submitted by
the deadline.
Project Presentation (3 Minute Talk)
Your
presentation/Demo should briefly and succinctly tell us *why* we should
care and *what* interesting insight you have about the chosen dataset.
Give us some insight into the tough / cool / interesting aspects of
your project. This is your time to shine, so carefully prepare what
exactly you want to show off that will impress us in this summary. View
the audience as potential upper management in your company -- so
convince us that your problem is important, that you have the
appropriate insight about the dataset.
During
the 3 Minute Talk session, the author for each dataset should be
prepared to present a less than three minute (180 second) preview talk
about the main idea(s) of your project. The 3-minute time limit will be
strictly enforced by a timer and buzzer. You'll be stopped right after
the 3 minute mark whether you finish your summary talk or not! Practice, practice, practice, and time your self before the presentation.
Follow the Guidelines preparing your Summary section for the talk (This should be at the very end of your Colab)
- Have a title of the project and your name at the verfy first in your Summary Section
- Have a clear outcome presented (charts, graphs, key conclusions)
- Know
what you want your audience to take away from your presentation.
Ideally, you would like the audience to leave with an understanding of
what you’re doing and why you’re doing it.
- Tell a Story
- You
may like to present your 3 Minute Talk like a story, with a beginning,
middle and an end. It’s not easy to condense your project into three
minutes, so you may find it easier to break your presentation down into
smaller sections. Try writing an opener to catch the audience
attention, then highlight your key findings, and finally have a summary
to restate the importance of your work.
- Remember
your peers are evaluating your presentation (5 pt is the weighted
average of your peer score report). Engage your audience!
Project Final Report
A
comprehensive report describing the project. This should be a
"complete" document, so it should include front matter (title page,
abstract, table of content, chapters), or a sidebar index that connect
to your report elements. These should include problem statement,
explain your design and implementation, results and evaluation. This
report should stand by itself as the archival description of the
project.
- The is the continuation of your same Google Colab project document.
- Colab file title should be "YourLastName_CS620_DataProject"
- Your results and evaluation (or evaluation strategy)
- What metrics used (or will you use) to evaluate the success of your project?
- Performance measures (how you measure them)?
- Other criteria?
- You
should address the same questions as those you have addressed in the
previous reports (abstract, progress checks), only with more details,
especially regarding some of the challenges that you need to solve and
your experimental results if any.
- You should also include your conclusions from the study and point out how your work can be further extended (i.e., future work).
- References if available (this should be the very last section of your Colab)
- Provide
as much context as you can—any kind of diagrams and illustrations that
will make it easier for us to understand and evaluate your effort.
Graphics are always helpful, you have probably heard the saying “a
picture is worth a thousand words”!
- Do
not just show the diagrams—for all figures, tables, charts, and
diagrams provide some narrative discussion! Unfortunately, diagrams,
particularly technical diagrams, are rarely if ever self-explanatory.
You should document the alternative solutions that you considered as
well as the arguments for the final choice. Diagrams only represent
your final solution, but do not explain why you decided on this
solutions and what alternatives were considered.
Hence, all diagrams must be accompanied with explanation and
discussion of alternatives and tradeoffs. Anything that could lead to
ambiguity or misunderstanding on the reviewer’s part, should be clearly
explained. Explanations should be written in prose and key arguments
highlighted in bullet points.
- There
is no limit on the number of pages (or size) for the
report. Of course, you should avoid stuffing your report with
redundant or irrelevant material.
Data sources for projects