data-futures

Data Futures is a not-for-profit which provides digital research workflows and data preservation solutions. The company undertakes contract data management and forensics contracts and operates services for partners including Basel, Cornell, ENS-Lyon, Heidelberg, Notre Dame, Oxford, Princeton and Westminster universities, and organizations such as DODIS, the Institute for German Language, MOM Jean Pouilloux and Fotomuseum Winterthur. Additionally, Data Futures manages the hasdai partnership under a CERN memorandum on behalf of these partners.

This website provides access to both research projects to which Data Futures contributes, as well as to repositories—move between the two types of content using the buttons at the top. Many of the research projects are active and do not all provide public access—though summaries are available; repositories are public snapshots—research 'at rest'—and support browsing and in some cases machine interfaces.

Commercial Records

This Invenio rerepository comprises a benchmark dataset based on the listings of foreign residents in the 1896, 1899 and 1934, 1937 volumes of the Asian Directories & Chronicles serial, which was published annually by The Hong Kong Daily Press between 1863 and 1941. Almost all of the volumes have been assembled by the Europa Institute at the University of Basel and in a collaboration with Data Futures, high-resolution digitization of the pages of the volumes and analysis of OCR data has enabled automated detection of each person record in the foreign resident listings and generation of more than 900,000 annotations. The OCR has been corrected and tokenized with the aid of surname and location dictionaries created from the corpus, to produce searchable person 'instance' data, which for this benchmark can be found at https://doi.org/10.5281/zenodo.2580997 using the Zenodo annotation dataset type. In this repository the annotations form primary research records.

view repository

Ancient Manuscripts

The Book of Curiosities is an illustrated anonymous cosmography, compiled in Egypt during the first half of the 11th century. The treatise, which was acquired in June 2002 by the Bodleian Libraries, is extraordinarily important for the history of science, especially for astronomy and cartography, and contains an unparalleled series of diagrams of the heavens and maps of the earth. This Invenio repository has been developed under a Bodleian contract to Data Futures, after annotation originally developed in 2007 became vulnerable. A freizo workflow enabled the original annotation to be converted to OADM and WADM and scaled against page images in Bodleian's IIIF service, which were of higher preservation quality and a different aspect ratio compared to those originally annotated. In this repository these annotations form primary research records.

view repository

Biodiversity

This is an experimental metadata enrichment and annotation workflow developed by Data Futures for the Zenodo ICEDIG herbaria sheet corpus. It was built automatically from individual Zenodo deposits and supports normalization of metadata, which currently varies according to contributing institutions. Improved metadata consistency is important for the flexibility and reliability of searches that Zenodo can support. Standards-based annotation implemented here enables new contribution by the scientific community and provides a browsable visual interface to the corpus. This project also supports research into cataloguing of scientists, including collectors and identifiers, in order to develop tools for automating the management of contributing authors in biodiversity literature. ICEDIG is funded under EU grant agreement ID: 777483 and provides a foundation for the Distributed System of Scientific Collections (DiSSCo). Zenodo is the global catch-all repository for scientific research which supports the European Commission Open Data policy, and was implemented by CERN.

view research workflow