Private traits and attributes are predictable from digital records of human behavior, by Michal Kosinskia, David Stillwella and Thore Graepelb, in PNAS 2013.
While this result is not surprising, and the method used is rather basic, it is still a nice demonstration of a fact we are all aware of, that machine learning is a powerful tool in predicting various user properties.
The authors use the following simple construction: create a user vs. likes matrix, decompose it using SVD with 100 dimensions, and then perform linear regression for each field of interest to find weights for each singular vector. Once a new user is observed, those weights are produced to compute prediction. (Logistic regression was used for binary categorical variables).
And here are some of the results. Each number signifies the success in prediction.
Another related paper which comes into mind is the paper by my friend Udi Weinsberg from Technicolor Labs:"BlurMe: Inferring and Obfuscating User Gender Based on Ratings," Udi Weinsberg, Smriti Bhagat, Stratis Ioannidis and Nina Taft. ACM Conference on Recommender Systems (RecSys), 2012.