Undoing the Language of Soviet Repression: Indigenous Peoples of Crimea in the KGB Files (1944–91)
Pursuing a new collaborative initiative in fulfillment of its engaged scholarship mission, in 2025 CIUS won two major research grants, a Partnership Development Grant from Canada’s and a Research Cluster Grant from the ֱ’s Kule Institute for Advanced Study (KIAS). Over the next three years, this funding will enable CIUS and the Security Services of Ukraine’s Sectoral State Archive (HDA-SBU)—together with seven other academic organizations in four countries—to develop and launch a suite of AI-based interpretive tools and research protocols aimed at tracking, analyzing, and exposing repressive language in KGB files that was used to erase Crimean Tatars as an Indigenous people of Crimea from historical maps and the social memory of Soviet society, both within and outside of Ukraine (1944–91).
THE ISSUE
Like many Indigenous people around the world and under colonial rule, in the Soviet Union (1921–91) the Crimean Tatars were subject to systemic injustice and persecution at the hands of the Soviet government. The lowest point of this injustice was the forced deportation in May 1944 of the entire nation, some 230,000 persons, from their indigenous land to remote locations in Siberia, Kazakhstan, and Uzbekistan. Nearly half of the entire people died within a few years as a result of this brutal ethnic cleansing operation. The Crimean Tatars’ fight for the right to return to Crimea lasted half a century; permission was granted in 1989, as the USSR was on the verge of collapsing. Yet, their return to Crimea was marred by much prejudice and misunderstanding; by that time, Crimean Tatars were little known, and even feared, in Soviet society.
RESEARCH QUESTION
What were the motivations and processes that led to the successful Soviet erasure of the Indigenous people of Crimea, within and outside of the peninsula? The answers to this question are to be sought in the archives of the Committee for State Security (KGB), the infamous Soviet security agency that carried out the surveillance and persecution of the Crimean Tatars.
The project posits that the KGB, in order to succeed in the systematic erasure of the Crimean Tatars, employed its own language in order to frame the Indigenous nation as “enemies of the people.” In doing so, the KGB engaged in deliberate “othering” and marginalization of Crimean Tatars. As this language was perpetuated across all Soviet state media channels, this intentional and well-resourced framing had a lasting impact on the population, well beyond the Soviet period.
KEY CHALLENGES
Paradoxically, the language of KGB files has also proved to be an obstacle to understanding the relevant history of the problem, posing a profound epistemic challenge to archive users. The complex and layered phrasing in KGB files points not merely to the well-established institutional language of a large bureaucracy, it is also a deliberate construct employed to conspire and confuse. Its structure, logic, and patterns prevent users, especially those who are not well-versed in KGB parlance, from accessing underlying meanings that are encoded in the files.
Without understanding the ideological layering of this language in their own research archive, users risk perpetuating the ideological biases that were built into KGB documents. Undoing these layers requires additional resources, time, and training that the interested public or international academic users of the archives may not possess.
Other challenges exist as well. Unlike Soviet-era documents in other archival collections, the KGB files in Ukraine include key details of KGB operations that were implemented specifically to discredit Crimean Tatar civic activism and leadership. Yet in the current setup of these archives in Kyiv, documents that refer to Crimean Tatars are hard to access: (1) HDA-SBU lacks an electronic management system that would allow for effective searching of the thousands of documents that contain such references; (2) the language itself of these documents remains under-studied; and (3) the reading room of the KGB archives is closed to in-person visits by researchers due to the Russo-Ukrainian war.
A new, modernized approach is needed—one that takes advantage of innovative technologies, diverse expertise, and effective interdisciplinary collaboration.
OVERALL GOAL & OBJECTIVES
The primary goal of the project “Undoing the Language of Soviet Repression” is to establish and develop a well-coordinated research partnership that will fulfill the project objectives and remain viable and productive past the grant tenure, allowing continued collaboration thereafter. Using ethically and socially responsible best practices, research expertise and resources shall be directed toward a project outcome that tangibly and effectively supports Ukraine’s endangered Indigenous people, the Crimean Tatars. Specifically, the driving project objective is to design, test, and refine comprehensive AI-informed and related research protocols that will enable identification and deciphering of key structuring principles in the language of relevant KGB files that framed the Crimean Tatars as “enemies of the state.”
PARTNERSHIP
The project brings together nine academic institutions from four countries, representing the research fields of Indigenous studies, philosophy, history, anthropology, archival studies, computer science, digital humanities, and Ukrainian studies. Coordinated at CIUS and in collaboration with HDA-SBU, the partnership engages institutions and researchers from the ֱ, , the , , and Ukraine’s .
More information will follow.
CORE PARTNERS
Dr. Natalia Khanenko-Friesen, Principal Investigator
As director of the Canadian Institute of Ukrainian Studies (CIUS) at the ֱ and interim director of its Contemporary Ukraine Studies Program, Dr. Khanenko-Friesen’s expertise in cultural anthropology, community-based research, and 20th-century Ukrainian history, along with her long-standing collaboration with the Sectoral State Archive of the Security Service of Ukraine (HDA-SBU), are key assets to the project’s success. As Principal Investigator, Dr. Khanenko-Friesen is responsible for the overall coordination and strategic direction of the initiative. She coordinates the activities of all working groups involved in the partnership, supervises the development of project deliverables, and leads the analysis of collected data.

Dr. Geoffrey Rockwell, Project Partner
A professor of philosophy and leading expert in artificial intelligence and ethics, Dr. Rockwell brings extensive experience in digital research infrastructure and interdisciplinary collaboration. He previously served as associate director of the Artificial Intelligence for Society (AI-4-Society) Signature Area at the ֱ and was a key contributor to the GRAND Network of Centres of Excellence project, funded in 2009. On this project, Dr. Rockwell will lead the development of ethics protocols related to the collection, use, and dissemination of research data.

Dr. Andriy Kohut, Project Partner
As director of the Sectoral State Archive of the Security Service of Ukraine (HDA-SBU), Dr. Kohut brings essential expertise in archival access, digital infrastructure, and historical documentation. He is the architect behind the Digital Archive of the Ukrainian Liberation Movement and has led multiple international collaborations across Eastern Europe. On this project, Dr. Kohut will coordinate all research activities conducted at the HDA-SBU and the Ukrainian Catholic University (UCU). He will oversee the preparation of archival files for inclusion in the AI-driven research workflow, ensuring their compatibility with project tools and protocols.

Dr. Denilson Barbosa, Project Partner
A professor of computer science at the ֱ, Dr. Barbosa brings advanced expertise in AI, data science, and knowledge graph technologies. He has worked extensively with Dr. Geoffrey Rockwell on a range of Digital Humanities initiatives, including the Linked Infrastructure for Networked Cultural Scholarship (LINCS)—a national consortium focused on building open knowledge graphs for cultural and heritage data. He also served as co-principal investigator and data quality lead in an NSERC Strategic Network on Data Science. Dr. Barbosa has previously collaborated with CIUS on AI tools for detecting bias and propaganda in Russian-language media related to the war in Ukraine. On this project, Dr. Barbosa will direct the creation of natural language processing (NLP) tools to extract structured information from archival texts which will directly support research into the identification, surveillance, and treatment of Crimean Tatars in Soviet records.

Dr. Serhii Plokhy, Project Partner
He holds the Mykhailo S. Hrushevskyi Chair in Ukrainian History at Harvard University. A leading expert on Eastern European history, Dr. Plokhii has authored numerous influential works on the region’s political, cultural, and geopolitical transformations. As part of this project, the Ukrainian Research Institute at Harvard University (HURI) will contribute the intellectual and technical capacity of its “MAPA: Digital Atlas of Ukraine” program, a cutting-edge digital platform for analyzing the historical and political geography of Ukraine. Dr. Plokhii will lead a team of GIS specialists that will integrate project data—drawn from Soviet archival records—into the MAPA platform. This collaboration will enable the visualization of geographic patterns in the surveillance, repression, and resistance of Indigenous peoples, offering new insights into the spatial dimensions of state violence and resistance.

Dr. Pavel Ircing, Project Partner
An associate professor in the Department of Cybernetics at the Faculty of Applied Sciences, University of West Bohemia (UWB), Dr. Ircing specializes in natural language processing (NLP), with a focus on automatic speech recognition (ASR) and optical character recognition (OCR). Dr. Ircing will oversee the development of advanced OCR and image-cleaning methods for typed Cyrillic documents. His contributions are critical to transforming scanned archival pages into machine-readable and analyzable texts.

Adam Hradilek, MA, Project Partner
A researcher with long-standing experience studying the KGB archives of the former Soviet Union, Hradilek has focused on the persecution of Czechoslovak citizens in the USSR and the integration of Ukrainian archival materials into the Online Digital Archive of NKVD/KGB Files. On this project, he will advise on the interpretation of KGB terms and guide the preparation of historical records for integration with tools such as MAPA. Working with computer scientists, he will contribute to the deconstruction and computational analysis of archival documents.

Dr. Oleh Turiy, Project Partner
As vice-rector for strategic cooperation, director of the Institute of Church History, and associate professor of church history at the Ukrainian Catholic University (UCU), Dr. Turiy brings expertise in church history, historical memory, and archival studies. Dr. Turiy will lead efforts at UCU to improve OCR recognition of marginalia and non-textual visual elements in archival materials, including handwritten notes, stamps, and special formatting.

Dr. Olexii Ignatenko, Project Partner
An associate professor at the Ukrainian Catholic University (Lviv), he will contribute specialized expertise in OCR technology and computational linguistics, with a focus on processing Cyrillic-language data. His work is central to the development of AI models for text recognition, processing, and annotation—crucial components for decoding both handwritten and typed archival records.

Dr. Larysa Bilous, Project Coordinator
Dr. Bilous is a research associate at CIUS.

Questions?