Question Description
Unit 2 – Individual Project |
|
Task Type: |
Individual Project |
Deliverable Length: |
One Python Program (100-200 lines) and a CSV that contains a generated column |
Points Possible: |
100 |
Description: During your second Individual Project (IP), you will utilize yourPython environment to derive structure from unstructured data. You willutilize the data set “Airline Sentiment” from Kaggle open data setslocated at https://www.kaggle.com/welkin10/airline-sentiment. Using this data set, you will create a text analytics Pythonapplication that extracts themes from each comment using termfrequency-inverse document frequency (TF-IDF) or simple word counts. Forthe deliverable, provide your Python file and a .csv with your resultsadded as a column to the original data set. Please submit your assignment. For assistance with your assignment, please use your text, Web resources, and all course materials. Reference Akash. (2017). Airline sentiment. Retrieved from https://www.kaggle.com/welkin10/airline-sentiment |
|
Course Objectives: · Discuss the main functions of each component of the Hadoop framework |
|
Model Answer: Students should utilize Python to create a Python Application thatprocesses each record to generate a list of key terms within the corpus.Then, this could be matched back to the comment to see if the key termexists in each comment. |