Natural Language Processing (NLP): part of the solution to the replicability crisis?

I am collaborating with researchers at Basis Technology to explore the potential of their new text vectors NLP system, Semantic Search. The figure on the right is a preliminary architecture diagram of how we hypothesize tha we could achieve extremely high accuracy in predicting which social and behavioral sciences papers could be replicated.

Preliminary findings suggest that NLP can reveal the difference between papers that can be replicated vs not papers that will fail replication.

Pictured below, Dr. Alex Jones' word cloud depiction of his Naive Bayes analysis of the 100 replicated papers vs papers that failed replication from Colin F. Camerer, et al, “Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015,” Nature Human Behaviour, Volume 2, pp. 637–644 (2018). https://www.nature.com/articles/s41562-018-0399-z

This word cloud depiction shows how obviously different they are on average. The challenge is to narrow down which individucal research papers could almost certainly be replicated vs those that cannot. Dr. Jones' approach correctly identified 80% of these papers. The architecture diagram to the right is how my colleagues and I hypothesize that we might improve on Jones' impressive results, presuming that we can win a contract with DARPA's Systematizing Confidence in Open Research and Evidence (SCORE) program.

Suggested reading

  • Adam Altmejd, et al, “Predicting the Replicability of Social Science Lab Experiments,” preprint (2019). https://osf.io/preprints/bitss/zamry/  
  • Isaiah Andrews and Maximilian Kasy, “Identification of and correction for publication bias,” Revise and resubmitted, American Economic Review, 2018. https://maxkasy.github.io/home/files/papers/PublicationBias.pdf
  • Giangiacomo Bravo, et al, “The effect of publishing peer review reports on referee behavior in five scholarly journals,” Nature Communications, (2019) 10:322.  https://doi.org/10.1038/s41467-018-08250-2 
  • Colin F. Camerer, et al, “Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015,” Nature Human Behaviour, Volume 2, pp. 637–644 (2018). https://www.nature.com/articles/s41562-018-0399-z
  • Valentin Danchev, Andrey Rzhetsky and James A. Evans, “Centralized “big science” communities more likely generate non-replicable results.” Submitted on 15 Jan 2018 arXiv:1801.05042. https://arxiv.org/abs/1801.05042
  • Anna Dreber, et al, “Using prediction markets to estimate the reproducibility of scientific research.” PNAS December 15, 2015 112 (50) pp. 15343-15347.  https://doi.org/10.1073/pnas.1516179112
  • Andrew Gelman and John Carlin, “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors”, Perspectives on Psychological Science. 2014, Vol. 9(6) pp. 641–651. https://journals.sagepub.com/doi/10.1177/1745691614551642
  • Hossein Hosseini, et al, “Deceiving Google’s Perspective API Built for Detecting Toxic Comments,” arXiv:1702.08138v1 [cs.LG] 27 Feb 2017 https://arxiv.org/abs/1702.08138
  • Alex Jones, “Using natural language processing to predict replicability of psychological science” 2018. https://github.com/alexjonesphd/nlp-replication  
  • Carole J. Lee and David Moher, “Promote scientific integrity via journal peer review data: Publishers must invest, and manage risk,” Science 21 Jul 2017: Vol. 357, Issue 6348, pp. 256-257. DOI: 10.1126/science.aan4141.  http://science.sciencemag.org/content/357/6348/256
  • Lotterhos KE, Moore JH, Stapleton AE, “Analysis validation has been neglected in the Age of Reproducibility.” PLoS Biol 16(12) (2018) https://doi.org/10.1371/journal.
  • Motyl, Matt, et al, “The state of social and personality science: Rotten to the core, not so bad, getting better, or getting worse?” Journal of Personality and Social Psychology, Vol 113(1), Jul 2017, 34-58, http://dx.doi.org/10.1037/pspa0000084 
  • Nuijten, M. B., Van Assen, M. A. L. M., Hartgerink, C. H. J., Epskamp, S., and Wicherts, J. M. (2017). The validity of the tool “statcheck” in discovering statistical reporting inconsistencies. PsyArXiv. November 16. doi:10.31234/osf.io/tcxaj.  https://psyarxiv.com/tcxaj/ 
  • Plesser HE (2018) Reproducibility vs. Replicability: A Brief History of a Confused Terminology. Front. Neuroinform.  https://www.frontiersin.org/articles/10.3389/fninf.2017.00076/full
  • Frank Renkewitz and Melanie Keiner. 2018. “How to Detect Publication Bias in Psychological Research?  A Comparative Evaluation of Six Statistical Methods.” PsyArXiv. December 20, 2018. doi:10.31234/osf.io/w94ep.  https://psyarxiv.com/w94ep/                                    
  • Seyed Mahdi Rezaeinia, et al, “Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis” (2017) https://arxiv.org/abs/1711.08609
  • Silberzahn, R., et al, “Many analysts, one data set: Making transparent how variations in analytic choices affect results.” Advances in Methods and Practices in Psychological Science, 1, pp. 337-356 (2018). https://doi.org/10.1177/2515245917747646Uri Simonsohn. “Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone,”  Psychological Science Volume: 24 issue: 10, pp. 1875-1888. 2012. Xia Hu, et al,  “Exploiting social relations for sentiment analysis in microblogging,” Proceedings of the sixth ACM international conference on Web search and data mining (2013) pp. 537-546. http://faculty.cs.tamu.edu/xiahu/papers/wsdm13Hu.pdf
  • Xia Hu, et al,  “Exploiting social relations for sentiment analysis in microblogging,” Proceedings of the sixth ACM international conference on Web search and data mining (2013) pp. 537-546.  http://faculty.cs.tamu.edu/xiahu/papers/wsdm13Hu.pdf

 

carolyn.meinel [at] cmeinel.com

(505) 281-0490

Return to home page


Copyright 2019 Carolyn Meinel. All rights reserved.