Related teaching


List of Available Data sets online (quoted from different sources):


Max Planck Institute has made data from IMC 2007 paper, WOSN 2008 papers, WWW 2009 paper, and WOSN 2009 paper, as well as Alan Mislove's PhD Thesis publicly available. Details at:
http://socialnetworks.mpi-sws.org/


Stanford Large Network Dataset Collection makes several data sets (not limited to social network) available at the following URL
http://snap.stanford.edu/data/index.html

(a repository website)
http://mldata.org/

(From J. Kleinberg's webpage)
Network Datasets
There are a number of interesting network datasets available on the Web; they form a valuable resource for trying out algorithms and models across a range of settings.
  • Collaboration and citation networks: For the 2003 KDD Cup competition, Johannes Gehrke, Paul Ginsparg, and I provided a dataset based on the arXivpre-print database, which allows one to study the networks of co-authorships and citations among a large community of physicists. Here is the KDD Cup dataset and a paper describing the competition in more detail.


  • Internet topology: The network structure of the Internet can be studied at several levels of resolution. Here is a dataset at the autonomous system (AS) level.

  • Web subgraphs: There are many such datasets available for download. One set is maintained by Panayiotis Tsaparas; the experiments that used this data are described in his Ph.D. thesis, and in other papers linked from his home page.


  • Semantic networks: Free association datasets for words have been collected by cognitive scientists; these are constructed by compiling the free responses of test subjects when presented with cue words. (For example, a test subject presented with the cue word `ice' might react with the word `cold,' `cream,' or `water.')

(Taken from MPI website)
Data from our IMC 2007 paper, our WOSN 2008 papers, our WWW 2009 paper, our WOSN 2009 paper, and Alan Mislove's PhD Thesis is publicly available by emailing Alan Mislove at amislove (at) mpi-sws (dot) org. Each of the data sets has been anonymized to protect the privacy of the social network users.
Alan Mislove, Massilmiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proceedings of the 5th ACM/USENIX Internet Measurement Conference (IMC'07), San Diego, CA, October 2007.
Meeyoung Cha, Alan Mislove, Ben Adams, Krishna P. Gummadi. Characterizing Social Cascades in Flickr. In Proceedings of the 1st Workshop on Online Social Networks (WOSN'08), Seattle, August 2008.
Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. A Measurement-driven Analysis of Information Propagation in the Flickr Social Network. In Proceedings of the 18th Annual World Wide Web Conference (WWW'09), Madrid, Spain, April 2009.
Fabrício Benevenuto, Tiago Rodrigues, Meeyoung Cha, and Virgílio Almeida. Characterizing User Behavior in Online Social Networks. In Proceedings of Usenix/ACM SIGCOMM Internet Measurement Conference (IMC), Chicago, Illinois, November 2009.

(Taken from J. Leskovec's course resource sections):
Snap network datasets
Yahoo! Webscope Catalog of datasets
  • Note: Jure Leskovec will have to apply for any sets you want, and we must agree not to distribute them further.
    There may be a delay, so get requests in early.
Coauthorship and Citation Networks
Internet Topology
Wikipedia
Movie Ratings
Who trusts whom data at Trustlet
Mark Newman's pointers


Reference Bibliography, classified by themes:

Available on pdf on the following detailed description


Small-world phenomenon

SmallWorld.html



[1] Lada A Adamic. . . . How to search a social network. Social Networks, Jan 2005. [2] M Granovetter. The strength of weak ties: A network theory revisited. Sociological
theory, Dec 1983.
[3] Jon Kleinberg. Navigation in a small world. Nature, Jan 2000.
[4] Jon Kleinberg. The small-world phenomenon: an algorithm perspective. STOC ’00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, May 2000.
[5] S Milgram. The small world problem. Psychology today, Jan 1967. [6] Duncan Watts and S Strogatz. Collective dynamics of ’small-world’networks. Nature,
Dec 1998.

Popularity and Reinforcement
[7] Dimitris Achlioptas, Aaron Clauset, David Kempe, and Cristopher Moore. On the bias of traceroute sampling. J. ACM, 56(4):1–28, Jun 2009.
[8] D Alderson, J Doyle, and W Willinger. Towards a theory of scale-free graphs: Definition, properties, and implications. Internet Mathematics, Dec 2005.
[9] Q Chen, H Chang, R Govindan, and S Jamin. The origin of power laws in internet topologies revisited. INFOCOM 2002. Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings. IEEE, 2:608–617, 2002.
[10] Byung-Gon Chun, Kamalika Chaudhuri, Hoeteck Wee, Marco Barreno, Christos Pa- padimitriou, and John Kubiatowicz. Selfish caching in distributed systems: a game- theoretic analysis. PODC ’04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, Jul 2004.
[11] A Clauset, CR Shalizi, and MEJ Newman. Power-law distributions in empirical data. SIAM review, 51(4):661–703, 2009.
[12] Edith Cohen and Scott Shenker. Replication strategies in unstructured peer-to-peer networks. SIGCOMM ’02: Proceedings of the 2002 conference on Applications, tech- nologies, architectures, and protocols for computer communications, Aug 2002.
[13] Alex Fabrikant, Elias Koutsoupias, and Christos Papadimitriou. Heuristically optimized trade-offs: A new paradigm for power laws in the internet. ICALP ’02: Proceedings of the 29th International Colloquium on Automata, Languages and Programming, Jul 2002.
[14] Stratis Ioannidis and Peter Marbach. On the design of hybrid peer-to-peer systems.
SIGMETRICS ’08: Proceedings of the 2008 ACM SIGMETRICS international confer- ence on Measurement and modeling of computer systems, Jun 2008.
[15] EF Keller. Revisiting “scale-free” networks. Bioessays, 27(10):1060–1068, 2005.
[16] N Laoutaris, G Bestavros, and A Matta. Distributed selfish caching. IEEE Transactions
on Parallel and Distributed Systems, Dec 2007.
[17] N Laoutaris, O Telelis, and V Zissimopoulos. Distributed selfish replication. IEEE
Transactions on Parallel and Distributed Systems, Dec 2006. [18] C Leng, W Terpstra, B Kemme, and W Stannat. Maintaining replicas in unstructured
p2p systems. Proceedings of the 2008 ACM CoNEXT Conference, Dec 2008. [19] C Leng, W Terpstra, B Kemme, and W Stannat. Slides: Maintaining replicas in un-
structured p2p systems. Proceedings of the 2008 ACM CoNEXT Conference, Dec 2008.
[20] Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and replication in unstructured peer-to-peer networks. SIGMETRICS ’02: Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Jun 2002.
[21] M Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2):226–251, 2004.
[22] Herbert Simon. On a class of skew distribution functions. Biometrika, 42(3/4):425–440, Dec 1955.
[23] W Terpstra, J Kangasharju, and C Leng. Bubblestorm: resilient, probabilistic, and exhaustive peer-to-peer search. portal.acm.org, Dec 2007.
[24] Saurabh Tewari and Leonard Kleinrock. Analysis of search and replication in unstruc- tured peer-to-peer networks. SIGMETRICS ’05: Proceedings of the 2005 ACM SIG- METRICS international conference on Measurement and modeling of computer sys- tems, Jun 2005.

Spreads and influence on social networks
Empirical studies
[25] S Aral. Identifying social influence: A comment on opinion leadership and social con- tagion in new product diffusion. Marketing Sci, 2010.
[26] S Aral, E Brynjolfsson, and MW Van Alstyne. Productivity effects of information diffusion in networks. Proceedings of the 28th Annual International Conference on Information Systems, 2007.
[27] S Aral, L Muchnik, and A Sundararajan. Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51):21544, 2009.
[28] S Aral and D Walker. Creating social contagion through viral product design: A ran- domized trial of peer influence in networks. Proceedings of the 31th Annual International Conference on Information Systems, 2010.
[29] D Centola. The spread of behavior in an online social network experiment. Science, Jan 2010.
[30] K Chaudhuri and F Chung Graham. . . . A network coloring game. INTERNET AND NETWORK ECONOMICS, Jan 2008.
[31] NA Christakis and JH Fowler. The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357(4):370, 2007.
[32] NA Christakis and JH Fowler. The collective dynamics of smoking in a large social network. New England journal of medicine, 358(21):2249, 2008.
[33] J. H Fowler and N. A Christakis. Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the framingham heart study. BMJ, 337(dec04 2):a2338–a2338, Dec 2008.
[34] Mark Granovetter. Threshold models of collective behavior. The American Journal of Sociology, 83(6):1420–1443, May 1978.
[35] Shawndra Hill, Foster Provost, and Chris Volinsky. Network-based marketing: Iden- tifying likely adopters via consumer networks. Statistical Science, 21(2):256–276, May 2006.
[36] R Iyengar, C Van den Bulte. . . , and TW Valente. Opinion leadership and social conta- gion in new product diffusion. Marketing Science, Jan 2010.
[37] S Judd and M Kearns. . . . Behavioral dynamics and influence in networked coloring and consensus. Proceedings of the . . . , Jan 2010.
[38] M Kearns. . . , S Judd, and J Tan. . . . Behavioral experiments on biased voting in net- works. Proceedings of the . . . , Jan 2009.
[39] M Kearns. . . and S Suri. . . . An experimental study of the coloring problem on human subject networks. Science, Jan 2006.
[40] Michael Kearns and Jinsong Tan. Biased voting and the democratic primary problem. INTERNET AND NETWORK ECONOMICS, Jan 2008.
[41] Stephen Morris. Contagion. The Review of Economic Studies, 67(1):57–78, Jan 2000. [42] D Strang and SA Soule. Diffusion in organizations and social movements: From hybrid
corn to poison pills. Annual review of sociology, 24(1), 1998. [43] TW Valente. Social network thresholds in the diffusion of innovations. Social Networks,
18(1):69–89, 1996.
Diffusion on random graphs
[44] Hamed Amini, Moez Draief, and Marc Lelarge. Marketing in a random network. Net- work Control and Optimization, Feb 2009.
[45] D Centola. The spread of behavior in an online social network experiment. Science, Jan 2010.
[46] D Centola, VM Egu ́ıluz, and MW Macy. Cascade dynamics of complex propagation. Physica A: Statistical Mechanics and its Applications, 374(1):449–456, 2007.
[47] D Centola and M Macy. Complex contagions and the weakness of long ties. American Journal of Sociology, 113(3):702–34, 2007.
[48] M Lelarge. Diffusion and cascading behavior in random networks. Note, 2010.
[49] Marc Lelarge. Diffusion of innovations on random networks: Understanding the chasm.
WINE ’08: Proceedings of the 4th International Workshop on Internet and Network Economics, Dec 2008.
[50] Marc Lelarge. Efficient control of epidemics over random networks. SIGMETRICS ’09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, Jun 2009.
[51] Duncan Watts. A simple model of global cascades on random networks. Proceedings of the National Academy of . . . , Jan 2002.
Maximizing influence
[52] E Berger. Dynamic monopolies of constant size. Journal of Combinatorial Theory, Series B, 83(2):191–200, 2001.
[53] J.-C Bermond, J Bond, D Peleg, and S Perennes. The power of small coalitions in graphs. Discrete Applied Mathematics, 127(3), May 2003.
[54] D Kempe, Jon Kleinberg, and Eva Tardos. Influential nodes in a diffusion model for social networks. Proceedings of 32nd International Colloquium on Automata, Languages and Programming (ICALP), pages 1127—1138, Jan 2005.
[55] David Kempe, Jon Kleinberg, and E ́va Tardos. Maximizing the spread of influence through a social network. KDD ’03: Proceedings of the ninth ACM SIGKDD interna- tional conference on Knowledge discovery and data mining, Aug 2003.
[56] J Kostka and Y Oswald. . . . Word of mouth: Rumor dissemination in social net- works. STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, Jan 2008.
[57] Marc Lelarge. Diffusion of innovations on random networks: Understanding the chasm.
WINE ’08: Proceedings of the 4th International Workshop on Internet and Network Economics, Dec 2008.
[58] Marc Lelarge. Efficient control of epidemics over random networks. SIGMETRICS ’09: Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems, Jun 2009.
[59] Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne Van- Briesen, and Natalie Glance. Cost-effective outbreak detection in networks. KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, Aug 2007.
[60] Dominic Meier, Yvonne Pignolet, Stefan Schmid, and Roger Wattenhofer. On the windfall of friendship: inoculation strategies on social networks. EC ’08: Proceedings of the 9th ACM conference on Electronic commerce, Jul 2008.
[61] E Mossel and S Roch. Submodularity of influence in social networks: From local to global. SIAM Journal on Computing, Dec 2010.
Variants of Spread
[62] MJ Brzozowski, T Hogg, and G Szabo. Friends and foes: ideological social network- ing. Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages 817–820, 2008.
[63] K Chaudhuri and F Chung Graham. . . . A network coloring game. INTERNET AND NETWORK ECONOMICS, Jan 2008.
[64] PeterDoddsandDuncanWatts.Universalbehaviorinageneralizedmodelofcontagion. Physical review letters, 92(21):218701, May 2004.
[65] N Immorlica, J Kleinberg, M Mahdian, and T Wexler. The role of compatibility in the diffusion of technologies through social networks. Proceedings of the 8th ACM conference on Electronic commerce, pages 75–83, 2007.
[66] S Judd and M Kearns. . . . Behavioral dynamics and influence in networked coloring and consensus. Proceedings of the . . . , Jan 2010.
[67] M Kearns. . . , S Judd, and J Tan. . . . Behavioral experiments on biased voting in net- works. Proceedings of the . . . , Jan 2009.
[68] M Kearns. . . and S Suri. . . . An experimental study of the coloring problem on human subject networks. Science, Jan 2006.
[69] Michael Kearns and Jinsong Tan. Biased voting and the democratic primary problem. INTERNET AND NETWORK ECONOMICS, Jan 2008.
[70] Jure Leskovec, D Huttenlocher, and J Kleinberg. Governance in social media: A case study of the wikipedia promotion process. Arxiv preprint arXiv:1004.3547, 2010.
[71] Erez Lieberman, Christoph Hauert, and Martin A Nowak. Evolutionary dynamics on graphs. Nature, 433(7023):309–312, Jan 2005.
4.4 Epidemy
[72] StephenBoyd,ArpitaGhosh,BalajiPrabhakar,andDevavratShah.Randomizedgossip algorithms. IEEE/ACM Transactions on Networking (TON, 14(SI), Jun 2006.
[73] M Cao, DA Spielman, and AS Morse. A lower bound on convergence of a distributed network consensus algorithm. Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC’05. 44th IEEE Conference on, pages 2356–2361, 2006.
[74] Cha, H Haddadi, F Benevenuto, and K Gummadi. Measuring user influence in twitter: The million follower fallacy. Proceedings of the 4th . . . , Dec 2010.
[75] F Chierichetti, S Lattanzi, and A Panconesi. Rumor spreading in social networks. Proceedings of the 36th Internatilonal Collogquium on . . . , Dec 2009.
[76] F Chierichetti, S Lattanzi, A Panconesi, and S Universita di Roma. Rumour spreading and graph conductance. Proceedings of SODA, 2010.
[77] A Ganesh, L Massouli ́e, and D Towsley. The effect of network topology on the spread of epidemics. INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, 2:1455–1466, 2005.
[78] Z Haas, J Halpern, and L Li. Gossip-based ad hoc routing. IEEE/ACM Transactions on Networking (TON), Dec 2006.
[79] SHedetniemiandALiestman.Asurveyofgossipingandbroadcastingincommunication networks. Networks, Dec 1988.
[80] R Karp, C Schindelhauer, S Shenker, and B Vocking. Randomized rumor spreading.
FOCS ’00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Nov 2000.
[81] F Liljeros, C Edling, L Amaral, and H Stanley. The web of human sexual contacts. Nature, Dec 2001.
[82] Laurent Massoulie and Moez Draief. Networks and epidemics. Books in preparation, pages 1–76, May 2010.
[83] Boris Oreshkin, Mark Coates, and Michael Rabbat. Optimization and analysis of dis- tributed averaging with short node memory. IEEE Transactions on Signal Processing, 58(5), May 2010.
[84] B Pittel. On spreading a rumor. SIAM Journal on Applied Mathematics, Dec 1987.
[85] Michael Rabbat. On spatial gossip algorithms for average consensus. SSP ’07: Pro- ceedings of the 2007 IEEE/SP 14th Workshop on Statistical Signal Processing, Aug 2007.
[86] Michael Rabbat, Jarvis Haupt, Aarti Singh, and Robert Nowak. Decentralized com- pression and predistribution via randomized gossiping. IPSN ’06: Proceedings of the 5th international conference on Information processing in sensor networks, Apr 2006.
[87] S Sanghavi, B Hajek, and Laurent Massouli ́e. Gossiping with multiple messages. Infor- mation Theory, Dec 2007.
[88] Devavrat Shah. Gossip algorithms. Foundations and Trends⃝R in Networking, 3(1), Jan 2009.
[89] Devavrat Shah and Tauhid Zaman. Detecting sources of computer viruses in networks: theory and experiment. SIGMETRICS ’10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Jun 2010.
[90] Dan-Cristian Tomozei and Laurent Massouli ́e. Distributed user profiling via spectral methods. SIGMETRICS ’10: Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Jun 2010.
[91]DenizU ̈stebay,BorisOreshkin,MarkCoates,andMichaelRabbat.Greedygossipwith eavesdropping. IEEE Transactions on Signal Processing, 58(7), Jul 2010.

Extracting Structures from Data
Nodes ranking
[92] Dimitris Achlioptas. . . , A Fiat, A Karlin, and Frank McSherry. Web search via hub synthesis. Foundations of Computer Science, 2001. Proceedings. 42nd IEEE Symposium on, pages 500 – 509, 2001.
[93] Alon Altman and Moshe Tennenholtz. Ranking systems: the pagerank axioms. EC ’05: Proceedings of the 6th ACM conference on Electronic commerce, Jun 2005.
[94] M Bianchini and M Gori. . . . Inside pagerank. ACM Transactions on Internet Technol- ogy, 5(1):92—128, Jan 2005.
[95] Allan Borodin, Gareth Roberts, Jeffrey Rosenthal, and Panayiotis Tsaparas. Finding authorities and hubs from link structures on the world wide web. WWW ’01: Proceed- ings of the 10th international conference on World Wide Web, Apr 2001.
[96] Allan Borodin, Gareth Roberts, Jeffrey Rosenthal, and Panayiotis Tsaparas. Link anal- ysis ranking: algorithms, theory, and experiments. Transactions on Internet Technology (TOIT, 5(1), Feb 2005.
[97] David Kempe and Frank McSherry. A decentralized algorithm for spectral analysis. Journal of Computer and System Sciences, 74(1), Feb 2008.
[98] Jon Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM, 46(5), Sep 1999.
[99] Frank McSherry. A uniform approach to accelerated pagerank computation. WWW ’05: Proceedings of the 14th international conference on World Wide Web, May 2005.
[100] Frank McSherry and Marc Najork. Computing information retrieval performance mea- sures efficiently in the presence of tied scores. ECIR’08: Proceedings of the IR research, 30th European conference on Advances in information retrieval, Mar 2008.