Each April, a familiar ritual takes over Princeton seniors’ social media feeds. Students proudly tout their black-bound theses in front of Blair Arch, with titles ranging from dense academic language to “Pinocchio Pinocchio Pinocchio Pinocchio Pinocchio.”
The senior thesis, often over a year in the making, is the University’s academic rite of passage and a centerpiece of senior spring. But behind each photo op lies a project that has slowly but steadily evolved over the past decade.
Today, the average senior thesis clocks in at just under 20,000 words, or about 80 double-spaced pages. Humanities students sometimes surpass 100 pages, while engineering majors average closer to 60.
Using natural language processing (NLP) and AI analysis, The Daily Princetonian examined over 9,300 senior theses — representing the vast majority of seniors, though some departments like Computer Science B.S.E. and others have different capstone requirements. For all NLP analyses, the ‘Prince’ ignored the acknowledgements, table of contents, citations, and appendices of all the theses.
The ‘Prince’ analyzed theses submitted to the archive between 2014 and 2024 to see how the nature of Princeton’s defining tradition has changed across disciplines, time, and a shifting technological landscape. Some things have changed — theses have gotten shorter, and there was a slight decrease in readability.
Theses have gotten shorter
Since 2014, the average word count has fallen by just over five percent. While there is some fluctuation, the overall page counts for theses have remained consistent over the past decade.
Social science theses are the toughest to read
While word count tells us how much students write, the Flesch Reading-Ease score hints at how approachable the writing is. The lower the score, the more “dense” the text is, approaching 100 for children’s books and single digits for technical content. In 2014, the median thesis score was 47.8, which is “college-level” readability. By 2024, it had fallen to 46.1, which signals a slight uptick in sentence length and terminology.

When averaged across all 10 years, Flesch scores for the humanities, natural sciences, and engineering all hover around 51, while social sciences decrease to 44, providing the toughest reads.
No discernible rise in AI-driven writing in recent years
When ChatGPT gained popularity in 2022, there was debate over whether or not universities should ban the chatbot due to its ability to generate text quickly and convincingly. In the context of the senior thesis, AI may be able to make writing more predictable, templated, and ultimately easier for language models to anticipate.
To test this, the ‘Prince’ analyzed thesis writing using GPT-2 perplexity, a measure of how uncertain a language model is when predicting the next word. Text generated by models like ChatGPT typically has lower perplexity because it’s more predictable by a large language model (LLM). However, we found that theses have remained just as complex and unpredictable as before, suggesting they haven’t shifted toward a more AI-like writing style.
The ‘Prince’ found no clear trend tied to ChatGPT’s emergence, with perplexity fluctuating year-to-year and higher in the years after the release of ChatGPT.
Humanities theses were the most complex and unpredictable by the model, with an average perplexity at just over 31. Social science, natural sciences, and engineering all range from 15–20, reflecting a more technical, structured writing style.
The University’s academic policies might partially explain why the thesis has resisted major AI-driven changes. Rights, Rules, and Responsibilities states that any misrepresentation of AI content as original work is an academic integrity violation. At a departmental level, guidelines vary widely as well. The Department of Economics requires students to disclose AI usage, including logs or records from models they use, while the Department of French & Italian treats any AI usage as plagiarism.
Humanities theses have the highest burstiness
The ‘Prince’ also measured burstiness — or how clustered key terms appear — and found that humanities had the highest burstiness at 0.65 on a scale of 0 to 1. This is likely due to humanities often building around key ideas, authors, or texts, concentrating references and terminology tightly instead of spreading them out evenly throughout the document. In contrast, the other fields may have more uniform structure, with methods, results, and discussions flowing more predictably instead of being heavily clustered in specific places.
The most common proper nouns are countries: China, Russia, and Japan
Using an NLP technique called named-entity recognition, the ‘Prince’ found the most frequent proper nouns in theses.
Most of the top 10 were names of countries or nationalities, with the exception of the word “Jews,” which came in sixth. “Israel” came in seventh.
In 2014, zero machine-learning terms cracked the top-10 list for computer science. By 2024, four terms did: large language models (LLM), convolutional neural network (CNN), generative pre-trained transformer (GPT), and bidirectional encoder representations from transformers (BERT). This reflects the field’s tilt towards AI over time.
The School of Public and International Affairs’ top-10 entities were mostly countries or regions in 2014, like Europe, China, and France. But by 2024, more domestic terms filtered into the top 10, like Democrat, Republican, Congress, and the Supreme Court, hinting at a United States-centric shift.
NES theses are most similar to one another
The ‘Prince’ looked at the overlap between thesis content of majors by calculating the Jaccard Index, or the similarity of shared named entities to the total unique entities across two documents. The maximum score, for two identical documents, is one.
For intra-major similarity, Near Eastern Studies theses top the list, with an average Jaccard of around 0.03, trailed by Slavic Languages and Literatures and East Asian Studies. This may be due to tightly focused regional or linguistic projects which drive higher entity reuse, while fields with broader topic sets like Chemistry and Music naturally diverge more.
When we extend the analysis across majors, the strongest cross-major pairing is Politics and SPIA, with a Jaccard index near 0.07. This is followed by Sociology and Anthropology at 0.05, each far above the overall average cross-major similarity of 0.02. This suggests that while certain fields naturally share common ground like governance and human behavior, most departments remain distinct in topics and entities that their seniors explore.
The Class of 2025 deadline for theses is April 28, with many departments requiring submission in the weeks prior. Theses should be uploaded to Thesis Central by May 6.
Jasin Cekinmez is a staff Data writer and an associate Puzzles editor for the ‘Prince.’
Nathan Beck is a former head Copy editor for the ‘Prince.’
Please send any corrections to corrections[at]dailyprincetonian.com.