RDA Research Data Alliance Plenary 13 at Philadelphia, Day 1
Under the motto “With Data comes Responsibility”, the RDA 13th plenary at the beautiful city of Philadelphia.
Morning, 8:30-11:00
Very interesting opening keynote by Julia D. Stoyanovich on responsible data science and bias against legally protected communities in machine learning algorithms. This is caused by lack of representation of certain groups in the data that powers these algorithms, among many other causes. Link to documentation here.
Post-meeting
Ralph Mueller showed me the MASI system, which extracts metadata automatically from files uploaded to the system, according to a pre-defined set of metadata schemas. It has modular ingestion capabilities for multiple file types.
Another work in progress is an end-to-end data management solution being developed by the GeRDI.
This is promising as a way to automatically extract metadata from resources deposited in our platform, Dendro.
Morning Breakout 1 11:30-13:00
Collaborative Session Notes: here
Requirements were that users would:
- Provide the minimum possible input
- Import as much as possible from existing systems to help create maDMPs.
Goal: get estimations and recommendations
Goal: update DMP with real intormation by reusing (linking) info from other systems like GitHub.
Porcesses help identify
Tasks performed by stakeholders
- ICT operator needs to provide costs of storage Systems needed to be put in replacement
- is there a need for a repository?
- Costing service concepts to be developed or agreerd
- cost model for storage
Processes are useful in deploying maDMPs Allow us to narrow down the focus of this WG
Common model does not contain business logic
Common model is an information carrier.
The model for maDMPs diagrams
The model diagram is available on GitHub. See it below:
Metadata is not about capturing the full metadata of the Dataset, but rather to link to the standard in which the metadata is represented, because there are many standards
A FAQ on the model is also available
Use cases for adoption
DMPs - System integration Funders, Authentication & administration, Repositories, Repository Registries, IT Services.
Further notes
How to integrated maDMP in a workflow where there is already data that needs to be added to the maDMP (reuse, build on existing work)? How to verify that data was actually produced according to a pre-specified maDMP? Pre-reserve PIDs for later automated verification of compliance? Integration between Dendro dataset deposit interface and maDMP creation tools. Formalisation of the Common DMP Model as OWL ontology in progress.
Afternoon Breakout 2 14:30-13:00
What is a common definition of a Commons? An issue is that it lacks a common definition. Many groups have their own repositories, but these become silos.
At the lowest layer, you have some sort of hardware. HPC, cloud, tape, etc. FAIR is at this layer. Interoperability, Identification, Data Citation, Metadata.
After you have the data at the hardware, you need to build APIs. Build piping to connect the hardware to the upper layers. On top, you have identity management and Authorization and Authentication.
On top, transformation layer with a data connection switchboard, which allows you to change data between formats. You need to have then queryability and findability. If you use Google, you are gonna use BigTable.
Workspace is the Notebook kind of thing. Then on top you have the analytics modules to do the interesting stuff.