Predict whether an author has written a given paper

This model is the 2nd place winner in the 2013 Annual Data Mining and Knowledge Discovery competition (KDD Cup) organized by ACM Special Interest Group on Knowledge Discovery and Data Mining.
The model was created by Dmitry Efomov, Lucas Silva and Benjamin Solecki. This paper describes the model in detail. The challenge was to develop a model to predict whether an author has written a given paper. The 'test' data (data to be scored) consists of a list of authors with a set of papers ids for each author. The model predicts authorship by ranking the papers from least likely to be written by the author to most likely to be written by the author.
View model in action
Submit, for scoring, either the provided test data or your own custom test data to see the model in action below. Note:
The model will be be excuted on a highly reduced data set (about 5%) in order to limit the execution times to a few minutes. This obviously will negatively impact the accuracy of the model. Executing the model on the full dataset can take upto an hour. If interested in testing this model on the full data set, please contact us.
Prediction file
If you are not interested in running the model, here's the Prediction file produced by this model corresponding to the Test file provided for the competition.
Score Data

Submit this sample input dataset. It consists of 5 authors along with a set of paper ids for each to be scored.

The model will take between 2 and 3 minutes to execute.

Note:
After the uploaded file is scored, it will be returned with an additional column
that ranks the paper ids from least likely to be written by the author to
most likely to be written by the author

Interactive Scoring Form
Add / Delete Rows Author Id Paper Ids to Rank Ranked Paper Ids
(Most likely to be deleted to least likely)
Add / Delete731 24943 688974 964345 1201905 1267992 1298546 2180622 Will be filled on Submit
Add / Delete2109 238936 359598 479485 521953 613087 902615 1190322 1247551 1431351 1771318 1817653 2008262 2086862 2099971 Will be filled on Submit
Add / Delete2139 236529 338381 1153355 1316625 1480578 1500698 1713884 Will be filled on Submit
Add / Delete3874 864676 879469 879933 885751 912907 923589 1144610 1145708 1155020 1163134 1164027 1166154 1171717 1175936 1216789 1217760 1227402 1236317 1244143 1247773 1257641 1261464 1272320 1278685 1291905 1401481 1403721 1411365 1428706 1428839 1432131 1444735 1452981 1481304 1492743 1507502 Will be filled on Submit
Add / Delete7066 36457 1365722 1428865 1462288 1607553 1619933 1629247 1635431 1704691 1787468 1792322 1946381 1958218 2095914 2146172 2165978 2187262 2207535 Will be filled on Submit
Submit your own csv file with the data to be scored.
Note:
The csv file must conform to the data specification used for training the model. Therefore, we suggest that you follow these three simple steps:
Step 1:   Start with this Sample File as a template.
Step 2:   Customize with your own data.
Step 3:   Submit the file below.
Note:
After the uploaded file is scored, it will be returned with an additional column
that ranks the paper Ids from least likely to be written by the author to
most likely to be written by the author