课程介绍
这是一门关于计算社会科学(CSS)的多学科入门课程,重点介绍社会科学家如何发展和利用与计算相关的社会理论和方法,以理解和分析数字时代的社会行为。课程内容包括CSS的历史和最新发展,以及如何使用计算方法收集、处理、分析和可视化现实世界中的大规模数据,以解决社会问题。本课程还将介绍前沿工具,如Python、R及其科学库,涵盖网络抓取、自然语言处理、主题建模,以及应用于文本、图像和视频数据的机器学习技术。请注意,与自然语言处理和计算机视觉相关的计算方法发展迅速,本课程更侧重于如何将这些方法应用于社会科学研究,而不是全面调查所有最新技术。
课程学习目标
本课程为学生提供一套完整的计算社会科学工具包,使其能够获取实现以下学习成果所需的知识和技能:(1)了解计算社会科学的历史、发展及主要概念。(2)理解社会科学家用于探索社会和行为现象的研究方法。(3)掌握应用计算工具和知识解决问题的能力。(4)设计和构建计算系统,以探索和分析人类世界的不同方面。
教材以及其他有用的材料
必读:
1.Justin Brimmer,Margaret E.Roberts,and Brandon M.Stewart.2022.Text as Data:A New Framework for Machine Learning and the Social Sciences.Princeton University Press.
2.Ian Goodfellow,Yoshua Bengio and Aaron Courville.Deep Learning.
3.Hadley Wickham and Garrett Grolemund.2016.R for Data Science.
4.Francois Chollet.Deep Learning with Python,Second Edition.OR Deep Learning with R.
选读:
1.Matthew J.Salganik.2017.Bit by Bit:Social Research in the Digital Age.Princeton University Press.
2.Steven Bird,Ewan Klein,and Edward Loper.2009.Natural Language Processing with Python.O'Reilly Media.
3.Hadley Wickham.Advanced R.
4.Kieran Healy.Data Visualization.
5.Eli Stevens,Luca Antiga,Thomas Viehmann.Deep Learning with PyTorch.
课程网站:
Our course materials,schedule,and announcements will be hosted on intro2css https://yongjunzhang.com/intro2css/.
Module 1:CSS Basics
Unit 1.Welcome and Introduction to CSS
Topics:CSS,Big Data,and Data Science.
Assigned Readings:
1.Lazer et al.2009.“Computational Social Science.”Science.
2.David M.J.Lazer et al.2020.“Computational social science:Obstacles and oppor-tunities.”Science,369,6507,Pp.1060-1062.
3.Edelman et al.2020.“Computational Social Science and Sociology.”Annual Review of Sociology.
4.David Donoho.2015.50 Years of Data Science.
5.Buyalskaya,A.,Gallo,M. and Camerer,C.F., 2021.The golden age of social science.Proceeings of the National Academy of Sciences,118(5).
Lab:
1.Github and version control;understand basic git commands,like git clone,git fetch,gitpull,and git push;
2.install all necessary software like python,R,Rstudio,notebook,colab, github desktop,etc;
3.Understand how to use the command line,like how to run Python using terminal,etc.
4.Get Twitter Academic API for later use.
Unit 2.Conceptualizing CSS and Programming
Topics:methodological approach;algorithm bias;measurement bias;research ethics;etc.
Assigned Readings:
1.Laura K.Nelson.2017.Computational Grounded Theory:A Methodological Frame-work.Sociological Methods and Research.
2.Justin Brimmer,Margaret E.Roberts,and Brandon M.Stewart.2022.Text as Data.Chapter 2 Social Science Research and Text Analysis.
3.Obermeyer,Ziad,et al.2019.“Dissecting racial bias in an algorithm used to managethe health of populations.”Science 366.6464:447-453.
4.Schwemmer,Carsten,et al.2020.“Diagnosing gender bias in image recognition systems.”Socius.5.Wagner,Claudia,et al.2021.“Measuring algorithmically infused societies.”Nature 595.7866:197-204.
6.Lazer,David,et al.2021.“Meaningful measures of human society in the twenty-first century.”Nature 595.7866:189-196.
7.R for data science.Chapter 1-3.(Lab reading,Pls spend some time reading these chapters)
Lab:
1.Basic programming in R or python
2.Understand how to read/save files
3.Basic data wrangling using tidyverse etc.
4.Understand regular expression
Unit 3.Machine Learning
Topics:Basics on supervised machine learning,deep learning
Assigned Readings:
1.Ian Goodfellow,Yoshua Bengio and Aaron Courville.Deep learning.Chapter 5 and chapter 6.
2.Grimmer,Justin,Margaret E.Roberts,and Brandon M.Stewart.“Machine Learning forSocial Science:An Agnostic Approach.”Annual Review of Political Science 24(2021):395-419.
3.Molina,Mario,and Filiz Garip.“Machine learning for sociology.”Annual Review ofSociology 45(2019):27-45.
4.Athey,Susan,and Guido W.Imbens.“Machine learning methods that economists shouldknow about.”Annual Review of Economics 11 (2019):685-725.
5.Mehrabi,N.,Morstatter,F.,Saxena,N.,Lerman,K.and Galstyan,A.,2021.A survey onbias and fairness in machine learning.ACM Computing Surveys(CSUR),54(6),pp.1-35.
6.Francois Chollet.Deep Learning with Python,Second Edition.Chapter 1-3.
7.Optional:Max Kuhn and Kjell Johnson.Applied Predictive Modeling.
8.Optional: R caret Package :https://topepo.github.io/caret/
Lab:
1.Using R caret package to do some basic supervised machine learning
2.Train a model to predict the gender of U.S.baby names using SSA data (code challenge)
3.SSA baby name data
Module 2:Text as Data
Unit 4.Making sense of text as data
Topics:Natural Language Processing;Big Data and Parallel Computing
Assigned Readings:
1.Grimmer,Justin,Stewart,Brandon M.2013.“Text as Data:The Promise and Pitfalls ofAutomatic Content Analysis Methods for Political Texts.”Political Analysis 21:267-97.
2.Barberá,P.,Boydstun,A.E.,Linn,S.,McMahon,R.and Nagler,J.,2021.Automatedtext classification of news articles:A practical guide.Political Analysis,29(1),pp.19-42.
3.Monroe,Burt L.,Colaresi,Michael P.,Quinn,Kevin M..2008.“Fightin’Words:Lex-ical Feature Selection and Evaluation for Identifying the Content of Political Confict.”Political Analysis 16:372-403.
4.Nardulli,Peter F.,Althaus,Scott L.,Hayes,Matthew.2015.“A Progressive Supervised-learning Approach to Generating Rich Civil Strife Data.”Sociological Methodology 45:148-83.
Lab:
1.Introduction to basic text analysis
2.How to run basic NLP tasks in R or Python
3.Read Chapters 1-3.Natural Language Processing with Python.
Unit 5.Getting Textual Data:Web Scraping,API,and Big Data
Topics:Introducing web scraping and API.
Assigned Readings:
1.Sobel,Benjamin LW.“A New Common Law of Web Scraping.”Lewis Clark L.Rev.25 (2021):147.
2.Luscombe,A.,Dick,K.and Walby,K.,2022.Algorithmic thinking in the public interest:navigating technical,legal,and ethical hurdles to web scraping in the social sciences.Quality Quantity,56(3),pp.1023-1044.
3.Twitter Academic API:https://developer.twitter.com/en/products/twitter-api/academic-research
4.Lin,H.,Nalluri,P.,Li,L.,Sun,Y. and Zhang,Y., 2022, May.Multiplex Anti-Asian Sentiment before and during the Pandemic:Introducing New Datasets from Twitter Mining. In Proceeings of the 12th Workshop on Computational Approaches to Subjectivity,Sentiment Social Media Analysis(pp.16-24).
5.Google Cloud APIs:https://cloud.google.com/apis
6.Google APIKey:https://developers.google.com/maps/documentation/maps-static/get-api-key
7.Google BigQuery:https://cloud.google.com/bigquery/docs/quickstarts
8.Lazer,D.and Radford,J.,2017.Data ex machina:introduction to big data.Annual Re-view of Sociology,43,pp.19-39.
9.Brown,J.R.and Enos,R.D.,2021.The measurement of partisan sorting for 180 mil-lion voters.Nature Human Behaviour,5(8),pp.998-1008.
10.Hofstra,B.,Kulkarni,V.V.,Galvez,S.M.N.,He,B.,Jurafsky,D.and McFarland,D.A.,2020.The diversity-innovation paradox in science.Proceedings of the National Academy of Sciences,117(17),pp.9284-9291.
11.Congressional Record for the 43rd-114th Congresses:Parsed Speeches and Phrase Countshttps://data.stanford.edu/congress_text
Lab:
1.Using R or python to scrape data from websites or social media platforms (e.g.,twitter)(code challenge)
2.Understand how to use webdriver in data scraping
3.Understand how to use Google Cloud Service/Google API(e.g.,How to use python to get data from bigquery)
Unit 6.Retrieving Information:Topic Modeling
Topics:LDA and Structural Topic Model;Model Applications.
Assigned Readings:
1.Blei,David M.2012.“Probabilistic Topic Models.”Communications of the ACM55:77-84.(LDA)
2.Mohr,John W.,Bogdanov,Petko.2013.“Introduction—Topic Models:What They Are and Why They Matter.”Poetics 41 (6):545-69.
3.Roberts,M.E.,Stewart,B.M.and Tingley, D. 2014. stm: R package for structural topicmodes.Journal of Statistical Software,10(2),pp.1-40.******(A package intro paper)
4.DiMaggio,Paul,Nag,Manish,Blei,David.2013.“Exploiting Affinities between Topic Modeling and the Sociological Perspective on Culture:Application to Newspaper Cover-age of U.S.Government Arts Funding.”Poetics 41:570-606.
5.Roberts,M.E.,Stewart,B.M.,Tingley,D.,Lucas,C.,LederLuis,J.,Gadarian,S.K.,Albertson,B.and Rand,D.G.,2014.“Structural topic models for open-ended survey responses.”American Journal of Political Science,58(4),pp.1064-1082.(STM)
6.Barron,A.T.,Huang,J.,Spang,R.L.and DeDeo,S.,2018. “ Individuals,institutions, and innovation in the debates of the French Revolution.”Proceedings of the National Academy of Sciences,115(18),pp.4607-4612.
7.Choudhury,P.,Wang,D.,Carlson,N.A.and Khanna,T.,2019.Machine learning approaches to facial and text analysis:Discovering CEO oral communication styles.Strategic Management Journal,40(11),pp.1705-1732.
8.Barberá,P.,Casas,A.,Nagler,J.,Egan,P.J.,Bonneau,R.,Jost,J.T.and Tucker,J.A.,2019.Who leads?Who follows?Measuring issue attention and agenda setting by legislators and the mass public using social media data.American Political Science Review,113(4),pp.883-901.
9.Grootendorst,M.,2022.BERTopic:Neural topic modeling with a class-based TF-IDFprocedure.arXiv preprint arXiv:2203.05794.
Lab:
1.Steps to implement topic models via R(R library stm and topicmodels).
2.Steps to implement topic models via python gensim
Unit 7.Word Embedding and Transformers
Topics:Word embedding and transformers
Assigned Readings:
1.Mikolov,T.,Sutskever,I.,Chen,K.,Corrado,G.S.and Dean,J.,2013.Distributed representations of words and phrases and their compositionality.Advances in neuralinformation processing systems,26.
2.Rong,Xin.“word2vec parameter learning explained.”arXiv preprint arXiv:1411.2738 (2014).
3.Pennington,J.,Socher,R.and Manning,C.D., 2014,October. Glove:Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)(pp.1532-1543).
4.Wu,X.,Lin,W.,Wang,Z.and Rastorgueva, E.,2020.Author2vec: A framework forgenerating user embedding.arXiv preprint arXiv:2003.11627.
5.Nikhil Garg,Londa Schiebinger,Dan Jurafsky,and James Zou.2018.Word embeddingsquantify 100 years of gender and ethnic stereotypes.PNAS 201720347 (2018).
6.Nelson,Laura K.“Leveraging the alignment between machine learning and intersectionality:Using word embeddings to measure intersectional experiences of the nineteenth cen-tury US South.”Poetics (2021):101539.
7.Kozlowski,A.C.,Taddy,M., and Evans,J.A.,2019.The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings. American Sociological Review
8.Rheault,L.and Cochrane,C.,2020.Word embeddings for the analysis of ideologicalplacement in parliamentary corpora.Political Analysis,28(1),pp.112-133.
9.Murray,D.,Yoon,J.,Kojaku,S.,Costas,R,Jung,W.S.,Milojevic,S.and Ahn,Y.Y.,2020.Unsupervised embedding of trajectories captures the latent structure of mobility.arXiv preprint arXiv:2012.02785.
10.Devlin,Jacob,et al.“Bert:Pre-training of deep bidirectional transformers for language understanding.”arXiv preprint arXiv:1810.04805(2018).
11.Wankmüller,Sandra.“Neural Transfer Learning with Transformers for Social Science Text Analysis.”arXiv preprint arXiv:2102.02111(2021).
12.Liu,Q.,Kusner,M.J.and Blunsom,P.,2020.A survey on contextual embeddings.arXivpreprint arXiv:2003.07278
13.Vicinanza,P., Goldberg, A.and Srivastava,S., 2021. Quantifying Vision through Language Demonstrates that Visionary Ideas Come from the Periphery.
Lab:
1.How to use word embedding/transformer in R or Python to achieve NLP tasks
Unit 8.Text as Data(4)-Sentiment Analysis
Assigned Readings:
1.Paxton,Pamela,Kristopher Velasco,and Robert W.Ressler.”Does use of emotionincrease donations and volunteers for nonprofits?.”American Sociological Review 85.6 (2020):1051-1083.
2.Flores,René D.“Do anti-immigrant laws shape public sentiment?A study of Arizona's SB1070 using Twitter data.”American Journal of Sociology 123.2(2017):333-384.
3.Hassan,Tarek A.,et al.“Firm-level political risk:Measurement and effects.”The Quarterly Journal of Economics 134.4(2019):2135-2202.
4.De Amicis,C.,Falconieri,S.and Tastan,M.,2021.Sentiment analysis and gender differences in earnings conference calls.Journal of Corporate Finance,71,p.101809.
5.Cook,Gavin,Junming Huang,and Yu Xie.“How COVID-19 has Impacted AmericanAttitudes Toward China:A Study on Twitter.”arXiv preprint arXiv:2108.11040(2021).
6.Cook,G.,Huang,J.andXie,Y.,2021.How COVID19 has Impacted American Attitudes Toward China:A Study onTwitter.arXivpreprintarXiv:2108.11040.
Lab:
1.How to conduct sentiment analysis in R or Python.
2.Use Twitter data to train a sentiment analysis model (code challenge).
Module 3:Image as Data
Unit 9.Making sense of image as data
Topics:why image as data?How do we use images for social research?
Assigned Readings:
1.Torres,M.and Cantú,F.,2022.Learning to see:Convolutional neural networks for the analysis of social science data.Political Analysis,30(1),pp.113-131.
2.Jean,N.,Burke,M.,Xie,M.,Davis,W.M.,Lobell,D.B.,and Ermon,S.,2016.Combining satellite imagery and machine learning to predict poverty.Science,353(6301),pp.790-794.
3.Jean,N.,Burke,M.,Xie,M.,Davis,W.M.,Lobell,D.B.,and Ermon,S.,2016.Combiningsatellite imagery and machine learning to predict poverty.Science,353(6301),pp.790-794.Read the supplement:https://science.sciencemag.org/content/sci/suppl/2016/ 08/19/353.6301.790.DC1/Jean.SM.pdf
4.Joo,J.and Steinert-Threlkeld,Z.C.,2018.Image as data:Automated visual contentanalysis for political science.arXiv preprint arXiv:1810.01544.
5.Han Zhang and Jennifer Pan.2019.CASM:A Deep-Learning Approach for IdentifyingCollective Action Events with Text and Image Data from Social Media.SociologicalMethodology.
6.Deng,J.,Dong,W.,Socher,R.,Li,L.J.,Li,K.and Fei-Fei,L.,2009,June.Imagenet:A large-scale hierarchical image database.In Computer Vision and Pattern Recognition,2009.CVPR 2009.IEEE Conference on(pp.248-255).Ieee.
7.Denton,E.,Hanna,A.,Amironesei,R.,Smart,A.and Nicole,H.,2021.On the ge-nealogy of machine learning datasets:A critical history of ImageNet.Big Data Society,8(2),p.20539517211035955.
8.Reeves,B.,Robinson,T.and Ram,N.,2020.Time for the human screenome project.
Lab:
1.Basic knowledge about using R or Python to obtain and process images
2.Extra Resource:http://neuralnetworksanddeeplearning.com/
Unit 10.A brief survey of computer vision tools for social sciences
Topics:Methods to implement image recognition,extract useful image information,machine learning methods,transfer learning approach,etc.
Assigned Readings:
1.LeCun,Y.,Bengio,Y.and Hinton,G.,2015.Deep learning.Nature,521(7553),pp.436-444.
2.Goodfellow et al.Deep Learning.Chapter 6-10.
3.Watch the video and read notes of Introduction to Convolutional Neural Network:http://cs231n.github.io/convolutional-networks/
4.Krizhevsky,A.,Sutskever,I.and Hinton,G.E.,2012.Imagenet classification with deep convolutional neural networks.In Advances in neural information processing systems(pp.1097-1105).
5.Szegedy,C.,Liu,W.,Jia,Y.,Sermanet,P.,Reed,S.,Anguelov,D.,Erhan,D.,Van-houcke,V.and Rabinovich,A.,2015.Going deeper with convolutions.In Proceedings ofthe IEEE conference on computer vision and pattern recognition (pp.1-9).
6.Bressem,K.K.,Adams,L.C.,Erxleben,C.,Hamm,B.,Niehues,S.M.and Vahldiek, J.L., 2020. Comparing different deep learning architectures for classification of chestradiographs.Scientific reports,10(1),pp.1-16.
7.Li,Shan,and Weihong Deng.“Deep facial expression recognition:A survey.”IEEE transactions on affective computing (2020).
8.Liu,Zhuang,et al.“A ConvNet for the 2020s.”arXiv preprint arXiv:2201.03545(2022).
9.Dosovitskiy,A.,Beyer,L.,Kolesnikov,A.,Weissenborn,D.,Zhai,X.,Unterthiner,T.,Dehghani,M.,Minderer,M.,Heigold,G.,Gelly,S.and Uszkoreit,J.,2020.An image is worth 16x16 words:Transformers for image recognition at scale.arXiv preprintarXiv:2010.11929
10.Liu,Z.,Lin,Y.,Cao,Y.,Hu,H.,Wei,Y.,Zhang,Z.,Lin,S.and Guo,B., 2021.Swintransformer: Hierarchical vision transformer using shifted windows.In Proceedings ofthe IEEE/CVF International Conference on Computer Vision(pp.10012-10022).
11.Chaudhari,S.,Mithal,V.,Polatkan,G.and Ramanath,R.,2021.An attentive survey of attention models.ACM Transactions on Intelligent Systems and Technology(TIST),12(5),pp.1-32.
12.Khan,S.,Naseer,M.,Hayat,M.,Zamir,S.W.,Khan,F.S.and Shah,M.,2021.Transformers in vision:A survey.ACM Computing Surveys(CSUR).
Lab:
Image data storage,cleaning,and processing in Python.
Unit 11.Main frameworks to analyze image data
Topics:how to use Keras (or Pytorch)frameworks to analyze image data and train your neural network
Assigned Readings:
1.Chollet,Francois.Deep learning with Python.Simon and Schuster,2021.Chapter 4-9.
2.Deep Learning with Pytorch.Chapter 1-8.
3.TensorFlow Tutorial:https://www.tensorflow.org/tutorials
4.Explore some code examples:https://keras.io/examples/
5.Pytorch Tutorial:https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
6.Check face net project:https://github.com/davidsandberg/facenet
7.Check open face project:http://cmusatyalab.github.io/openface/
Lab:
1.Google Cloud Service;TensorFlow;Keras;R packages (torch or keras).
2.An example using pytorch to replicate the Jean et al's result:https://github.com/joshzyj/predicting-poverty-replication
3.Use google map api to obtain google images/Use Jean's model to predict economic out-comes/visualize and map the outcomes (code challenge)
Optional Module:Audio and Video as Data
Unit 12.Audio and Video data
Topics:Introducing basic methods to process acoustic data and speech recognition;Introducing basic methods to process video data;introducing social research based on video data
Assigned Readings:
1.Dietrich,Bryce J.,Matthew Hayes,and Diana Z.O'brien.“Pitch perfect:Vocal pitchand the emotional intensity of congressional speech.”American Political Science Review 113.4 (2019):941-962.
2.Dietrich,Bryce J.“Using motion detection to measure social polarization in the US Houseof Representatives.”Political Analysis 29.2(2021):250-259.
3.Qin and Yang.2019.What you say and how you say it matters.Proceedings of the57th Annual Meeting of the Association for Computational Linguistics,Florence,Italy.
4.Choudhury,Prithwiraj,et al.”Machine learning approaches to facial and text analysis:Discovering CEO oral communication styles.”Strategic Management Journal 40.11(2019):1705-1732.
Lab:
1.Using Keras,Tensorflow,and DeepSpeech to build and train a speech recognition model and object detection model.
2.Explore Essentia:https://essentia.upf.edu/documentation.html and https://mtg.github.io/essentia-labs/news/tensorflow/2020/01/16/tensorflow-models-release
3.Explore librosa:https://librosa.org/doc/latest/index.html
4.Stanford Cable TV News Analyzer:https://tvnews.stanford.edu/methodology
5.YOLO3:https://pjreddie.com/media/files/papers/YOLOv3.pdf
6.Explore YOLO3 (You Only Look Once-for Object Detection):https://www.youtube.com/watch?v=MPU2HistivI&feature=youtu.be&ab_channel=JosephRedmon
7.Here is the instruction of how to use YOLO3:https://pjreddie.com/darknet/yolo/
Optional Module:Place and Map as Data
Unit 13.Place and Map data
Topics:Introducing basic methods to process geospatial data.
Assigned Readings:
1.Moro,E.,Calacci,D.,Dong,X.and Pentland,A.,2021.Mobility patterns are associated with experienced income segregation in large US cities.Nature communications,12(1),pp.1-10.
2.Chang,S.,Pierson,E.,Koh,P.W.,Gerardin,J.,Redbird,B.,Grusky,D.and Leskovec,J., 2021.Mobility network models of COVID-19 explain inequities and inform reopening.Nature,589(7840),pp.82-87.
3.Hou,X.,Gao,S.,Li,Q.,Kang,Y.,Chen,N.,Chen,K.,Rao,J.,Ellenberg,J.S.and Patz, J.A.,2021.Intracounty modeling of COVID-19 infection with human mobility:Assessing spatial heterogeneity with business traffic,age,and race.Proceedings of the National Academy of Sciences,118(24).
4.Athey,S.,Ferguson,B.A.,Gentzkow,M.and Schmidt,T.,2020.Experienced segregation(No.w27572).National Bureau of Economic Research.
5.Wang,Q.,Phillips,N.E.,Small,M.L.and Sampson,R.J.,2018.Urban mobility and neighborhood isolation in America's 50 largest cities.Proceedings of the National Academy of Sciences,115(30),pp.7735-7740.
6.Small,M.L.,Akhavan,A.,Torres,M.and Wang,Q.,2021.Banks,alternative institu-tions and the spatial-temporal ecology of racial inequality in US cities.Nature Human Be-haviour,5(12),pp.1622-1628.
Lab
1.Learn how to use QGIS,R,and Python to conduct geospatial analysis.
Unit 14.Research Paper Presentation