The Experimental and Computational Linguistics Ensemble Lab (ECOLE)

Language is an essential part of human experience. We use computational, experimental, and corpus linguistic methods to understand the unique role of language in our lives and in our society. The Experimental and Computational Linguistics Ensemble Lab (ECOLE) at San Francisco State University focuses on research on both cognitive and social aspects of language.

Students sitting around a table in discussion

Research

Our research projects focus on both cognitive and social aspects of language. On one hand, we study linguistic phenomena that shed light on the question of grammar architecture and the relation between language and cognition. At the same time we are interested in what language reveals about its users and about society in general.

Generative AI and Large Language Models

Our most recent line of research focuses on Generative AI and Large Language Models. We study how Large Language Models perform compared to humans on a range of tasks, from world knowledge and common sense reasoning to text simplification.

The Syntax and Semantics of Queries

The internet gave rise to a new form of language use – queries. Yet little is known about how we formulate queries. Queries are processed by search engines as ‘word salad’ with little attention to word order. In this project we investigate the hypothesis that queries are more than ‘word salad’ and that they have systematic structural properties.

Subjectivity in Language

Do you say Democrats and Republicans or Republicans and Democrats? In a series of papers we show that word order in binomials (above) and in prenominal adjectives depends on subjective preferences of the speaker: the attribute that is psychologically closer to the speaker is mentioned first.

Recent Publications

  • Iliev, R., Smirnova, A. 2025. The valence of abstraction: A paradox revisited. Journal of Psycholinguistic Research 54, 4. https://doi.org/10.1007/s10936-024-10122-4 
  • Smirnova, A., Chun, K., Rothman, W. L., Sarma, S. 2025. Text Simplification for Children: Evaluating LLMs vis-à-vis Human Experts. CHI Conference on Human Factors in Computing Systems.https://doi.org/10.1145/3706599.3719889

  • Smirnova, A. 2024. Syntactic variation in reduced registers through the lens of the Parallel Architecture. Topics in Cognitive Science. Special volume on “Parallelism in the Architecture of Language,” Giosuè Baggio, Neil Cohn, and Eva Wittenberg (Topic Editors). https://doi.org/10.1111/tops.12747
  • Smirnova, A. 2024. Productivity and Creative Use of Compounds in Reduced Registers: Implications for Grammar Architecture. In Proceedings of the 46th Annual Conference of the Cognitive Science Society. https://escholarship.org/content/qt1c39n81t/qt1c39n81t.pdf
  • Reese, M. and Smirnova, A. 2024. Comparing ChatGPT and Humans on World Knowledge and Commonsense Reasoning Tasks. In CHI Conference on Human Factors in Computing Systems (CHI).

Presentations

  • Smirnova, A., Lee, Erin S., Li, S. 2025. Numeric information in elementary school texts generated by LLMs vs human experts. Artificial Intelligence in Measurement and Education Conference (AIME-Con). October 27-29, 2025. Pittsburgh, PA.
  • Reese, M. L., & Smirnova, A. 2025. Linguistic proficiency of humans and LLMs in Japanese: Effects of task demands and content. Artificial Intelligence in Measurement and Education Conference (AIME-Con). October 27-29, 2025. Pittsburgh, PA.
  • Reese, M.L, & Smirnova, A. 2025. Human and LLM performance on linguistic test: Content effect and task demands. 47th Annual Meeting of the Cognitive Science Society. July 30-August 2, 2025. San Francisco, CA.
  • Smirnova, A., Chun, K., Rothman, W. L., Sarma, S. 2025. Text Simplification for children: Evaluating LLMs vis-à-vis human experts. CHI Conference on Human Factors in Computing Systems, April 26-May 1, 2025. Yokohama, Japan.
  • Kyu beom Chun, Diya Garg, Wil Louis Rothman, Siyona Sarma, Shruti Sujal Vora. 2025. Text Simplification for Children. Stanford Undergraduate Research Conference. April 12, 2025. Stanford University, Palo Alto, CA.
  • Anastasia Smirnova (PI) and Whitney Taylor (Political Science, SFSU). 2024. Students’ Perception on Research and Career-readiness Competencies. ConnectUR Annual Conference. June 20-21, 2024.

Events and Announcements

  • Raz Parker’s paper on Automatic Identification of Phonetic and Semantic Patterns for Iconicity Research: A Transformer Approach was accepted for presentation at IcoLL2026, to be held on February 21-23, 2026 at Nagoya University.
  • ECOLE lab received Best Abstract award at 10th Annual UC Davis Symposium on Language Research for the project on Advantages and Challenges in using ChatGPT in Text Simplification for Children. May 24, 2024.

ECOLE Lab Members

Lab members are SF State students who come from diverse backgrounds and bring to the table their expertise in computer science, data analysis, linguistics and psychology.

Lab Director

Anastasia Smirnova, Associate Professor

Professor Smirnova’s primary research focus is on the grammar architecture and the structure of the lexicon, as well as on how temporal and modal information is expressed in language. Her research employs a variety of methods, from fieldwork to experimental and corpus studies.

Lab Members

Khue Nam Do (UC Berkeley Data Discovery Program)

Khue is a second-year undergraduate student studying Data Science at University of California, Berkeley. She is deeply passionate about leveraging Data Science and Machine Learning to create solutions that drive social good. Inspired by her background of originally coming from Vietnam, she hopes to apply her learning to contribute impact innovation for both locally and globally. Now as a part of ECOLE Lab at SFSU, Khue is excited to examine how LLM can reshape human experience in literature and learning, with a focus on developing ethical and practical applications to benefit the community. Beyond academics, Khue loves exploring new foodie spots, staying active and practicing any form of “capturing moments” activities like journaling, taking pictures, filming.

Holden Fees (UC Berkeley Data Discovery Program)

Holden is a data science, computer science and cognitive science major at Berkeley. Holden is passionate about AI research in fairness as well as explainable AI. 

Shiying Li

Shiying is a software engineer (data platform and engineering) and a second-year graduate student in Philosophy at SFSU. She earned her Master’s degree in Computer Science from Brown University and has worked in labs on projects involving Human-Computer Interaction (HCI) and the application of NLP to social science. She looks forward to applying her technical skills to support linguistic analysis and to contribute to impactful applications leveraging NLP. Shiying is also interested in the intersection of philosophy, language, and technology. Outside of work, she enjoys traveling and playing strategy video games.

Siya Patel (UC Berkeley Data Discovery Program)

Siya is an undergraduate student at the University of California, Berkeley, majoring in Cognitive Science and Data Science. She is passionate about artificial intelligence, natural language processing, and exploring how technology can be applied to support healthcare, education, and human well-being. With experience in machine learning, algorithmic fairness, and web development, she is excited to contribute to the Text Simplification for Children project and deepen her expertise in NLP. Outside of academics, she enjoys dancing, the outdoors, journaling, and exploring new foods.

Thomas “Raz” Parker

Raz graduated from San Francisco State University in 2024 with an MA in Linguistics and a Graduate Certificate in Computational linguistics. Their capstone explored automatic detection of phonestheme-like features in Japanese. Their current research interests include Japanese sound symbolism, phonology and iconicity, and they have a particular focus on natural language processing approaches. Before starting at SFSU they worked as a German and Japanese translator in the automotive, medical and academic fields. In their free time, Raz plays the bass and the theremin, and enjoys spending time cooking and baking.

Cassia Reddig

Cassia is currently pursuing a Master of Science in Interdisciplinary Studies (emphasis in Language, Cognition, & Computation) with a Graduate Certificate in Ethical AI at San Francisco State University, where she also obtained a BA in Psychology and Computational Linguistics. She has contributed to research in the domains of cognitive psychology and media psychology through SFSU LACE Lab and Stanford Social Media Lab. She is interested in applying multidisciplinary approaches to research natural language processing and human-centered artificial intelligence.

May Reese

May is an SFSU alumna and a visiting scholar. Her current research is on Large Language Models and Natural Language Understanding benchmark tests with a focus on Japanese. Her academic interests include language typology, computational linguistics and psycholinguistics. In her free time, you’ll find her reading, playing board games or out on a cycling trip.

Senula Wijeratne (UC Berkeley Data Discovery Program)

Senula is an undergraduate student at the University of California, Berkeley, majoring in Cognitive Science, Data Science, and Neuroscience. He is passionate about cognitive and computational neuroscience, applied machine learning, and building assistive technologies that enhance accessibility and improve quality of life. With a strong foundation in cognition, natural language processing (NLP), and machine learning through coursework and academic projects, he is eager to deepen his experience with Large Language Models (LLMs) while contributing to research that drives meaningful, real-world impact.

Peixi “Max” Xie

Max Xie is a recent graduate with a B.A in Linguistics from the University of California, Santa Cruz, and is currently pursuing an M.A degree in Linguistics at San Francisco State University. Max is experienced in linguistics research, computer programming, and startup management. Adept in Natural Language Processing (NLP), Computational Linguistics, and AI applications. He is known for strong leadership, communication, and analytical skills, and is actively seeking opportunities in the fields of AI and NLP that leverage linguistic expertise. Outside of school, Max has many different hobbies including swimming, hiking, biking, ping pong, pool, computer robotics, computer programming, reading, and gaming. 

Lab Alumni

Helena Almassy. Professor of Mathematics, Cañada College.

Lauren Baker. Finance manager at DTR Consulting Services.

Kyu beom “Kyle” Chun (UC Berkeley Data Discovery Project).

Angie Garcia. Manager at Sound Hound.

Diya Garg (UC Berkeley Data Discovery Project).

Skyler Ilenstine. Computational Linguist at Microsoft vis DISYS.

Malleeswari Jagabattuni (MJ). PhD student in Linguistics, University at Buffalo.

Jonathan Kakama. Data Analyst at Vaco.

Chohee Kim. Senior Software Engineer at LinkedIn.

Rose Kitchel. Executive Assistant at the Reeds Center.

Helena Laranetto. Machine Learning Data Linguist II, Alexa Devices at Amazon.

Erin Lee (UC Berkeley Data Discovery Project).

Sujung Nam. PhD student at the University of Hawaii, Honolulu.

Mikey Pagán. M.A. student, Comparative & World Literature at SFSU.

Jasmine Rivero. Chatbot Operations Manager at Sense.

Amanda Robinson. Computational Linguist at Samsung.

Ricardo Romero Sanchez. Linguistic Project Manager at Google.

William "Wil" Louis Rothman (UC Berkeley Data Discovery Project). Research associate at Neuro-Robotics Lab, Tohoku University. 

Siyona Sarma (UC Berkeley Data Discovery Project).

Laurel Selvig. Data Analyst at Axos Bank.

Erly Tang. PhD student in Linguistics and Anthropology at the University of Arizona.

Olivia Vallejo. LV Quality Specialist / Linguist.

Shruti Vora (UC Berkeley Data Discovery Project).