-
Notifications
You must be signed in to change notification settings - Fork 3
Presentations
This page lists presentations about the reconciliation and matching framework. To date, these have mainly been aimed at audiences working in Biodiversity Informatics. Images are links to slidedecks on SlideShare.
###Biodiversity Information Standards (TDWG) conference, September 2015, Nairobi, Kenya.
####A simple model for large-scale data mobilization across a diverse organisation (Nicky Nicolson) Abstract The talk presents a simple model for the mobilization of biodiversity data from a data rich, diverse organisation, based on open source tools compatible with those taught in the data carpentry syllabus.
####Strings to things: a user-friendly framework for data reconciliation (Nicky Nicolson)
(This talk follows from the talk above, with enhanced technical details).
Abstract: The talk presents an open-source toolkit (https://github.com/RBGKew/Reconciliation-and-Matching-Framework ; http://data1.kew.org/reconciliation/) to configure an Open Refine compatible reconciliation service over any tabular file or structured database.
Reconciliation is the process of converting a text string representation of a thing into a usable identifier for that thing, e.g. to convert the text string “Tahina spectabilis” to “http://ipni.org/urn:lsid:ipni.org:names:77086615-1”.
Although the toolkit was developed first for scientific name reconciliation, it can be configured to reconcile any entity type (people, specimens etc). Micro-components of the tool (for data transformations - https://github.com/RBGKew/String-Transformers) are available as drop-ins in the Open Refine data cleaning package.
This approach is an alternative to existing services development, which have largely been aimed at technical users. The guiding principle is to open data services to a wider range of users by lowering the barrier to entry, such that hands-on scientists and data curators - those who know their data best - can link it with external sources. Technical choices were made to fit with approaches taught in the software and data carpentry initiatives.
The toolkit aids progress towards Tim Berners-Lee’s Linked Open Data principle #4: Refer to other things using their HTTP URI-based names when publishing data on the Web and shows how we can build the foundations of the biodiversity knowledge graph.
###Research Data Alliance Plenary 5, March 2015, San Diego, USA.
Concept presented in an environment related plenary session, with representation from the Agriculture Data Interoperability, Biodiversity Data Integration, ELIXIR Bridging Force, Geospatial, Marine Data Harmonization and Metadata interest groups.
####If you don't know the names, your knowledge gets lost (Nicky Nicolson) Abstract: Scientific names are used in all domains as entry points into biodiversity datasets - but names are updated over time as we refine our understanding of species diversity. Resolution services are essential for data integration efforts to build the linked open data that researchers require: these allow navigation between old and new names as represented in different taxonomies (organising systems) and thus provide access to all content linked to name variants. Names services, operating on high-quality, expert-curated, structured, linked data representing names and their inter-relationships allow the transition of static text to actionable data. These services should be usable by any domain of basic or applied science dealing with scientific names of organisms.
###pro-iBiosphere hackathon, February 2014 Leiden, Netherlands. pro-iBiosphere was an EU funded project aimed at defining the open Biodiversity Knowledge system. Towards the end of the funded period, a hackathon was held to try to integrate data - informed by use cases and tools proposed and defined during the project.
####Kew at the pro-iBiosphere hackathon (Nicky Nicolson, Matt Blissett)
The participants prepared a publication on the outcomes of the hackathon:
R. Vos, J. Biserkov, B. Balech, N. Beard, M. Blissett, C. Brenninkmeijer, T. van Dooren, D. Eades, G. Gosline, Q. Groom, T. Hamann, H. Hettling, R. Hoehndorf, A. Holleman, P. Hovenkamp, P. Kelbert, D. King, D. Kirkup, Y. Lammers, T. DeMeulemeester, D. Mietchen, J. Miller, R. Mounce, N. Nicolson, R. Page, A. Pawlik, S. Pereira, L. Penev, K. Richards, G. Sautter, D. Shorthouse, M. Tähtinen, C. Weiland, A. Williams, and S. Sierra, “Enriched biodiversity data as a resource and service,” Biodiversity Data Journal, vol. 2, p. e1125, Jun. 2014. DOI: 10.3897/BDJ.2.e1125 URL: http://bdj.pensoft.net/articles.php?id=1125