python数据挖掘代写 python数据挖掘代做
Assignment 3: Frequent Itemsets, Clustering, Advertising
Formative, Weight (15%), Learning objectives (1, 2, 3), Abstraction (4), Design (4), Communication (4), Data (5), Programming (5)
Due date: 11 : 59pm, 1 June, 2019
1 Overview
Read the following carefully as it di↵ers from the last assignment.
For students who are taking the course COMP SCI 3306 (i.e., undergraduate students), this assignment can be done in groups consisting of two students. If you have problems finding a group partner use the forum to search for group partners.
For other students who are taking the course COMP SCI 7306, this assign- ment should be done individually.
References to sections, examples, etc. refer to the book of “Leskovec, Ra- jaraman and Ullman: Mining Massive Datasets (Second Edition)”.
2 Assignment
Exercise 1 Frequent Itemsets (15+15+10+10 points)
For this exercise, you have to read Section 6.4 up to 6.4.3.
1. Implement the simple, randomized algorithm given in 6.4.1
2. Implement the algorithm of Savasere, Omiecinski, and Navathe (SON al- gorithm) in 6.4.3
3. Compare the two algorithms on the datasets T10I4D100K, T40I10D100K, chess, connect, mushroom, pumsb, pumsb star provided at
http://fimi.ua.ac.be/data/
and report the outcomes.
1
COMP SCI 3306, COMP SCI 7306 Mining Big Data Semester 1, 2020
4. Experiment with dierent sample sizes in the simple randomized algorithm such as 1, 2, 5, 10% and compare your results (including the result pro- duced by the SON algorithm).
Your approach should be as e
