You have to come up with your own idea on projects, which should be in any aspect of below:
- Design new ML algorithms for real problems.
- Solve existing problems using ML algorithms.
Initial Project Proposal: 15-Nov-2012
- Motivation (few reason why/how idea could benefit etc..) ,
- Challenges (few potential challenges that makes this problem non-trivial)
- Formally define problem you trying to address(i.e, Goal of this project).
Final Project Presentation: 10-Jan-2013
Include as many as possible details:
- Model framework used
- How data collected and dataset used
- Features explanation
- Experimental results and performance(precision, recall etc)
- Comparison results with any other model
- Any comparison with atleast one or two previous papers from literature.
Course Project Suggestions
In addtion to the following suggestions, you may find more projects from the Machine Learning course of CMU.
Task: Predict whether a user will follow another user back when she received a new following link from the other user.
Data Description: A twitter sub network, consists of all users which has a completely historic log of link formation among all users, i.e., each user is associated with a complete list of followers and users they are following at each time stamp. The sub network is comprised of 112,044 users, 468,238 following links among them, and 2,409,768 tweets. On average, there are 40,943 new follow links and 3,337 new followback links per day.
Please see the description details in Candidate1-TwitterDataSet.
- Tiancheng Lou, Jie Tang,John Hopcroft, Zhanpeng Fang, Xiaowen Ding. Learning to Predict Reciprocity and Triadic Closure. TKDD 2010
- John E. Hopcroft, Tiancheng Lou, and Jie Tang. Who Will Follow You Back? Reciprocal Relationship Prediction. In Proceedings of the Twenty Conference on Information and Knowledge Management (CIKM'2011). pp. 1137-1146.
Friendship Relationship Prediction
Task: Predict whether two users have a friendship if there were at least one voice call or one text message sent from one to the other.
Data Description: The data set consists of call logs, bluetooth scanning logs and location logs collected by a software installed in mobile phones of 107 users during a ten-month period. In the data set, users provide labels for their friendships. In total, 314 pairs of users are labeled as friends.
Please see the description details in Candidate2-MobileDataSet.
- Wenbin Tang, Honglei Zhuang, and Jie Tang. Learning to Infer Social Ties in Large Networks. PKDD 2011
- Chi Wang, JiaweiHan,Jie Tang. Mining advisor-advisee Relationships from Research Publication Networks.KDD 2010
Review Rating Prediction
Task: Predict the rating scores of online hotel reviews.
Data Description: The data set consists of 5000 hotel reviews, which is equally partitioned into training and testing sets. For each review, we provided the bag-of-words features.
Please see the description details in Candidate3-HotelReview_dataset.
- Jun Zhu, Amr Ahmed, and Eric P. Xing. MedLDA: Maximum Margin Supervised Topic Models, Journal of Machine Learning Research, 13(Aug):2237--2278, 2012.
- D. Blei, J. McAuliffe. Supervised topic models. In Advances in Neural Information Processing Systems 21, 2007.
Algorithm Analysis for Topic Models
Task: Implement and compare approximate inference algorithms for LDA which includes: variational inference (Blei et. al. 2003), collapsed gibbs sampling (Griffth et. al. 2004) and (optionally) collapsed variational inference (Teh. et. al. 2006).
Data Description: You should compare them over simulated data by varying the corpus generation parameters --- number of optics, size of vocabulary, document length, etc.
You should compare over several real world datasets.
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993?1022, January 2003.
- Griffiths, T, Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101, 5228-5235 2004.
- Y.W. Teh, D. Newman and M. Welling. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation.In NIPS 2006.
You can have the descriptions and data for final projects here.