Frédéric Kaplan

Editer le profil

Directeur du collège CDH

frederic.kaplan@epfl.ch +41 21 693 02 53 http://cdh.epfl.ch

Linkedin ID

https://orcid.org/0000-0002-6991-5730

Publons - Web of Science ID

Google Scholar ID

Scopus ID

+41 21 693 02 53
EPFL > CDH > CDH-DIR > CDH-DI

EPFL CDH DHI DHLAB
INN 141 (Bâtiment INN)
Station 14
1015 Lausanne

+41 21 693 02 53
+41 21 693 19 01
Local: INN 141
EPFL > CDH > DHI > DHLAB

Web site: Site web: https://dhlab.epfl.ch

EPFL IC-DO
INN 141 (Bâtiment INN)
Station 14
1015 Lausanne

+41 21 693 02 53
Local: INN 141
EPFL > IC > IC-DEC > IC-DO

Web site: Site web: https://ic.epfl.ch/page8797.html

+41 21 693 02 53
EPFL > ENAC > ENAC-SAR > SAR-ENS

+41 21 693 02 53
EPFL > CDH > CDH-SODH > SODH-ENS

EPFL CDH DHI-GE
INN 137 (Bâtiment INN)
Station 14
1015 Lausanne

+41 21 693 02 53
Local: INN 137
EPFL > CDH > DHI > DHI-GE

+41 21 693 02 53
EPFL > VPA > VPA-FAC > ASC

+41 21 693 02 53
EPFL > CDH > CDH-DIR > CF-CDH

vCard
Données administratives

Courte biographie

Le professeur Frédéric Kaplan dirige le Collège des Humanités de à l'École polytechnique fédérale de Lausanne (EPFL). Il est également titulaire de la chaire de Digital Humanities (humanités digitales / humanités numériques) et président de la Time Machine Organisation, une entité à but non lucratif regroupant plus de 600 institutions. Il est l'auteur d'une dizaine de livres, traduits en plusieurs langues, et de plus d'une centaine de publications scientifiques. Ses travaux ont également donné lieu à des expositions dans plusieurs grands musées dont la Biennale d'architecture de Venise, le Grand Palais, le Centre Pompidou à Paris et le Museum of Modern Art à New York.

Liens

Twitter
Google Scholar
Talk on the Radio (Avis d

Publications

Publications Infoscience

2024

[147] Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

E. Boros; M. Ehrmann; Matteo Romanello; S. Najem-Meyer; F. Kaplan

The quality of automatic transcription of heritage documents, whether from printed, manuscripts or audio sources, has a decisive impact on the ability to search and process historical texts. Although significant progress has been made in text recognition (OCR, HTR, ASR), textual materials derived from library and archive collections remain largely erroneous and noisy. Effective post-transcription correction methods are therefore necessary and have been intensively researched for many years. As large language models (LLMs) have recently shown exceptional performances in a variety of text-related tasks, we investigate their ability to amend poor historical transcriptions. We evaluate fourteen foundation language models against various post-correction benchmarks comprising different languages, time periods and document types, as well as different transcription quality and origins. We compare the performance of different model sizes and different prompts of increasing complexity in zero and few-shot settings. Our evaluation shows that LLMs are anything but efficient at this task. Quantitative and qualitative analyses of results allow us to share valuable insights for future work on post-correcting historical texts with LLMs.

2024-02-18. The 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature , St Julian's, Malta , March 22, 2024. p. 133-159.

Frédéric Kaplan

Directeur du collège CDH

Courte biographie

Liens

Publications

Publications Infoscience

2024

[147] Post-correction of Historical Text Transcripts with Large Language Models: An Exploratory Study

2023

[146] Where Did the News Come From? Detection of News Agency Releases in Historical Newspapers

[145] From Archival Sources to Structured Historical Information: Annotating and Exploring the "Accordi dei Garzoni"

[144] Machine-Learning-Enhanced Procedural Modeling for 4D Historical Cities Reconstruction

[143] Ce que les machines ont vu et que nous ne savons pas encore

2022

[142] Automatic table detection and classification in large-scale newspaper archives

[141] Opacité et transparence dans le design d'un dispositif de surveillance urbain : le cas de l'IMSI catcher

2021

[140] Aux portes du monde miroir

[139] Une approche computationnelle du cadastre napoléonien de Venise

[138] Les vingt premières années du capitalisme linguistique : Enjeux globaux de la médiation algorithmique des langues

[137] Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

2020

[136] I sistemi di immagini nell’archivio digitale di Vico Magistretti

[135] Swiss in motion : Analyser et visualiser les rythmes quotidiens. Une première approche à partir du dispositif Time-Machine.

[134] A digital reconstruction of the 1630–1631 large plague outbreak in Venice

[133] The Advent of the 4D Mirror World

[132] Neural networks for semantic segmentation of historical city maps: Cross-cultural performance and the impact of figurative diversity

[131] Historical Newspaper Content Mining: Revisiting the impresso Project's Challenges in Text and Image Processing, Design and Historical Scholarship

[130] Building a Mirror World for Venice

2019

[129] Transforming scholarship in the archives through handwritten text recognition Transkribus as a case study

[128] A deep learning approach to Cadastral Computing

[127] Repopulating Paris: massive extraction of 4 Million addresses from city directories between 1839 and 1922

[126] Frederic Kaplan Isabella di Lenardo

2018

[125] dhSegment : A generic deep-learning approach for document segmentation

[124] Comparing human and machine performances in transcribing 18th century handwritten Venetian script

[123] The Scholar Index: Towards a Collaborative Citation Index for the Arts and Humanities

[122] Mapping Affinities in Academic Organizations

[121] New Techniques for the Digitization of Art Historical Photographic Archives - the Case of the Cini Foundation in Venice

[120] Extracting And Aligning Artist Names in Digitized Art Historical Archives

[119] Negentropic linguistic evolution: A comparison of seven languages

[118] dhSegment: A generic deep-learning approach for document segmentation

[117] Deep Learning for Logic Optimization Algorithms

[116] Making large art historical photo archives searchable

[115] The Intellectual Organisation of History

[114] Mapping affinities: visualizing academic practice through collaboration

2017

[113] Layout Analysis on Newspaper Archives

[112] Machine Vision Algorithms on Cadaster Plans

[111] Analyse multi-échelle de n-grammes sur 200 années d'archives de presse

[110] A Simple Set of Rules for Characters and Place Recognition in French Novels

[109] Big Data of the Past

[108] Narrative Recomposition in the Context of Digital Reading

[107] Optimized scripting in Massive Open Online Courses

[106] The references of references: a method to enrich humanities library catalogs with citation data

[105] Studying Linguistic Changes over 200 Years of Newspapers through Resilient Words Analysis

2016

[104] From Documents to Structured Data: First Milestones of the Garzoni Project

[103] Ancient administrative handwritten documents: virtual x-ray reading

[102] Rendre le passé présent

[101] La modélisation du temps dans les Digital Humanities

[100] L’Europe doit construire la première Time Machine

[99] Visual Link Retrieval in a Database of Paintings

[98] Diachronic Evaluation of NER Systems on Old Newspapers

[97] Wikipedia's Miracle

[96] Le miracle Wikipédia

[95] La culture internet des mèmes

[94] Visual Patterns Discovery in Large Databases of Paintings

[93] Visualizing Complex Organizations with Data

[92] Navigating through 200 years of historical newspapers

[91] Studying Linguistic Changes on 200 Years of Newspapers

[90] The References of References: Enriching Library Catalogs via Domain-Specific Reference Mining

2015

[89] S'affranchir des automatismes

[88] The Venice Time Machine

[87] Venice Time Machine : Recreating the density of the past

[86] On Mining Citations to Primary and Secondary Sources in Historiography

[85] Text Line Detection and Transcription Alignment: A Case Study on the Statuti del Doge Tiepolo

[84] Anatomy of a Drop-Off Reading Curve