CISC520 Data Engineering and Mining
Assignment ID Number AFFGEHU83939HD Type of Document Essay Writing Format APA/MLA/Harvard Academic Level Masters/University References/Sources 4 References
CISC520 Data Engineering and Mining
CISC520 Data Engineering and Mining
For this part, you need to explore the bank data (bankdata_csv_all.csv), available on the LMS, and an accompanying description (bankdataDescription.doc) of the attributes and their values. The dataset contains attributes on each person’s demographics and banking information in order to determine they will want to obtain the new PEP (Personal Equity Plan).
Your goal is to perform Association Rule discovery on the dataset using R.
First perform the necessary preprocessing steps required for association rule mining, specifically the id field needs to be removed and a number of numeric fields need discretization or otherwise converted to nominal.
Next, set PEP as the right hand side of the rules, and see what rules are generated.
Select the top 5 most “interesting” rules and for each specify the following:
- Support, Confidence and Lift values
- An explanation of the pattern and why you believe it is interesting based on the business objectives of the company.
- Any recommendations based on the discovered rule that might help the company to better understand behavior of its customers or to develop a business opportunity.
Note that the top 5 most interesting rules are most likely not the top 5 in the strong rules. They are rules, that in addition to having high lift and confidence, also provide some non-trivial, actionable knowledge based on underlying business objectives.
To complete this assignment, write a short report describing your association rule mining process and the resulting 5 interesting rules, each with their three items of explanation and recommendations. For at least one of the rules, discuss the support, confidence and lift values and how they are interpreted in this data set.
You should write your answers as if you are working for a client who knows little about data mining. Your report should give your client some insightful and reliable suggestions on what kinds of potential buyers your client should contact, and convince your client that your suggestions are reliable based on the evidence gathered from your experiment results.
In more detail, your answers should include:
- Description of preprocessing steps
- Description of parameters and experiments in order to obtain strong rules
- Give the top 5 most interesting rules and the 3 items listed above for each rule.
In this part of homework, you are expected to apply decision tree induction algorithm to solve a mystery in history: who wrote the disputed essays, Hamilton or Madison?
- About the Federalist Papers
Quote from the Library of Congress
The Federalist Papers were a series of eighty-five essays urging the citizens of New York to ratify the new United States Constitution. Written by Alexander Hamilton, James Madison, and John Jay, the essays originally appeared anonymously in New York newspapers in 1787 and 1788 under the pen name “Publius.” A bound edition of the essays was first published in 1788, but it was not until the 1818 edition published by the printer Jacob Gideon that the authors of each essay were identified by name. The Federalist Papers are considered one of the most important sources for interpreting and understanding the original intent of the Constitution.
- About the disputed authorship
The original essays can be downloaded from the Library of Congress.
In the author column, you will find 74 essays with identified authors: 51 essays written by Hamilton, 15 by Madison, 3 by Hamilton and Madison, 5 by Jay. The remaining 11 essays, however, is authored by “Hamilton or Madison”. These are the famous essays with disputed authorship. Hamilton wrote to claim the authorship before he was killed in a duel. Later Madison also claimed authorship. Historians were trying to find out which one was the real author.
- Computational approach for authorship attribution
In 1960s, statistician Mosteller and Wallace analyzed the frequency distributions of common function words in the Federalist Papers, and drew their conclusions. This is a pioneering work on using mathematical approaches for authorship attribution.
Nowadays, authorship attribution has become a classic problem in the data mining field, with applications in forensics (e.g. deception detection), and information organization.
The Federalist Paper data set (fedPapers85.csv) is provided in LMS. The features are a set of “function words”, for example, “upon”. The feature value is the percentage of the word occurrence in an essay. For example, for the essay “Hamilton_fed_31.txt”, if the function word “upon” appeared 3 times, and the total number of words in this essay is 1000, the feature value is 3/1000=0.3%
Organize your report using the following template:
Section 1: Data preparation
You will need to separate the original data set to training and testing data for classification experiments. Describe what examples in your training and what in your test data.
Section 2: Build and tune decision tree models
First build a DT model using the default setting, and then tune the parameters to see if better model can be generated. Compare these models using appropriate evaluation measures. Describe and compare the patterns learned in these models.
Section 3: Prediction
After building the classification model, apply it to the disputed papers to find out the authorship and report the performance accuracy of your models.
QUALITY OF RESPONSE NO RESPONSE POOR / UNSATISFACTORY SATISFACTORY GOOD EXCELLENT Content (worth a maximum of 50% of the total points) Zero points: Student failed to submit the final paper. 20 points out of 50: The essay illustrates poor understanding of the relevant material by failing to address or incorrectly addressing the relevant content; failing to identify or inaccurately explaining/defining key concepts/ideas; ignoring or incorrectly explaining key points/claims and the reasoning behind them; and/or incorrectly or inappropriately using terminology; and elements of the response are lacking. 30 points out of 50: The essay illustrates a rudimentary understanding of the relevant material by mentioning but not full explaining the relevant content; identifying some of the key concepts/ideas though failing to fully or accurately explain many of them; using terminology, though sometimes inaccurately or inappropriately; and/or incorporating some key claims/points but failing to explain the reasoning behind them or doing so inaccurately. Elements of the required response may also be lacking. 40 points out of 50: The essay illustrates solid understanding of the relevant material by correctly addressing most of the relevant content; identifying and explaining most of the key concepts/ideas; using correct terminology; explaining the reasoning behind most of the key points/claims; and/or where necessary or useful, substantiating some points with accurate examples. The answer is complete. 50 points: The essay illustrates exemplary understanding of the relevant material by thoroughly and correctly addressing the relevant content; identifying and explaining all of the key concepts/ideas; using correct terminology explaining the reasoning behind key points/claims and substantiating, as necessary/useful, points with several accurate and illuminating examples. No aspects of the required answer are missing. Use of Sources (worth a maximum of 20% of the total points). Zero points: Student failed to include citations and/or references. Or the student failed to submit a final paper. 5 out 20 points: Sources are seldom cited to support statements and/or format of citations are not recognizable as APA 6th Edition format. There are major errors in the formation of the references and citations. And/or there is a major reliance on highly questionable. The Student fails to provide an adequate synthesis of research collected for the paper. 10 out 20 points: References to scholarly sources are occasionally given; many statements seem unsubstantiated. Frequent errors in APA 6th Edition format, leaving the reader confused about the source of the information. There are significant errors of the formation in the references and citations. And/or there is a significant use of highly questionable sources. 15 out 20 points: Credible Scholarly sources are used effectively support claims and are, for the most part, clear and fairly represented. APA 6th Edition is used with only a few minor errors. There are minor errors in reference and/or citations. And/or there is some use of questionable sources. 20 points: Credible scholarly sources are used to give compelling evidence to support claims and are clearly and fairly represented. APA 6th Edition format is used accurately and consistently. The student uses above the maximum required references in the development of the assignment. Grammar (worth maximum of 20% of total points) Zero points: Student failed to submit the final paper. 5 points out of 20: The paper does not communicate ideas/points clearly due to inappropriate use of terminology and vague language; thoughts and sentences are disjointed or incomprehensible; organization lacking; and/or numerous grammatical, spelling/punctuation errors 10 points out 20: The paper is often unclear and difficult to follow due to some inappropriate terminology and/or vague language; ideas may be fragmented, wandering and/or repetitive; poor organization; and/or some grammatical, spelling, punctuation errors 15 points out of 20: The paper is mostly clear as a result of appropriate use of terminology and minimal vagueness; no tangents and no repetition; fairly good organization; almost perfect grammar, spelling, punctuation, and word usage. 20 points: The paper is clear, concise, and a pleasure to read as a result of appropriate and precise use of terminology; total coherence of thoughts and presentation and logical organization; and the essay is error free. Structure of the Paper (worth 10% of total points) Zero points: Student failed to submit the final paper. 3 points out of 10: Student needs to develop better formatting skills. The paper omits significant structural elements required for and APA 6th edition paper. Formatting of the paper has major flaws. The paper does not conform to APA 6th edition requirements whatsoever. 5 points out of 10: Appearance of final paper demonstrates the student’s limited ability to format the paper. There are significant errors in formatting and/or the total omission of major components of an APA 6th edition paper. They can include the omission of the cover page, abstract, and page numbers. Additionally the page has major formatting issues with spacing or paragraph formation. Font size might not conform to size requirements. The student also significantly writes too large or too short of and paper 7 points out of 10: Research paper presents an above-average use of formatting skills. The paper has slight errors within the paper. This can include small errors or omissions with the cover page, abstract, page number, and headers. There could be also slight formatting issues with the document spacing or the font Additionally the paper might slightly exceed or undershoot the specific number of required written pages for the assignment. 10 points: Student provides a high-caliber, formatted paper. This includes an APA 6th edition cover page, abstract, page number, headers and is double spaced in 12’ Times Roman Font. Additionally, the paper conforms to the specific number of required written pages and neither goes over or under the specified length of the paper.
GET THIS PROJECT NOW BY CLICKING ON THIS LINK TO PLACE THE ORDER
Do You Have Any Other Essay/Assignment/Class Project/Homework Related to this? Click Here Now [CLICK ME]and Have It Done by Our PhD Qualified Writers!!
Tired of getting an average grade in all your school assignments, projects, essays, and homework? Try us today for all your academic schoolwork needs. We are among the most trusted and recognized professional writing services in the market.
We provide unique, original and plagiarism-free high quality academic, homework, assignments and essay submissions for all our clients. At our company, we capitalize on producing A+ Grades for all our clients and also ensure that you have smooth academic progress in all your school term and semesters.
High-quality academic submissions, A 100% plagiarism-free submission, Meet even the most urgent deadlines, Provide our services to you at the most competitive rates in the market, Give you free revisions until you meet your desired grades and Provide you with 24/7 customer support service via calls or live chats.