From Networks to Machine Learning
Instructors: Val Pentchev and Dr. Filipi Silva
Network science and data driven approaches have enabled the study of science as a complex system, resulting in the emergence of a new field, the “science of science”. Studies in this nascent field rely on open and proprietary big bibliometric data sets such as Web of Science (WoS), and Microsoft Academic Graph (MAG). Yet, cost and expertise needed to host and service large open and proprietary data, are significant access barriers to many. Moreover, data use agreements often prohibit data/algorithm sharing hampering collaboration and reproducibility.
Collaborative Archive & Data Research Environment (CADRE) is a new, cloud-based, science gateway developed by the tutorial organizers, that overcomes these barriers. We will demonstrate how CADRE can help you build scholarly networks and share complex machine learning workflows. These include code and datasets, written in personal Jupyter notebooks, or uploaded from local environments. We will show how to use CADRE to encapsulate them using cloud-native containerized technologies and assign persistent DOI links. This makes research reproducible and research assets easy to share, cite and reuse, while increasing efficiency and reducing cost. We will also showcase the new USPTO data, powered by the latest graph database technologies, and a graphical interface for users without programming skills. By pooling resources to build a single shared instance, member institutions obtain a superior solution at a fraction of the cost they would pay to develop their own. CADRE’s open datasets and basic tools are free for public use from anywhere in the world. Building a community of users is central to achieving CADRE’s goals of breaking down disciplinary and geographic silos, and building a cohesive, cross-disciplinary international community of scientometric and informetric researchers. Thus, this tutorial serves to generate interest in CADRE among the research community so that this vision can be realized.
The tutorial will focus on hands-on experience for an audience interested in using CADRE’s online query interface for science of science research. Attendees will be guided through step-by-step instructions on how to register as a CADRE user; complete a series of hands-on exercises designed to help new users become familiar with CADRE’s features; and prepare them to use CADRE for their own research. A hands-on tutorial will walk attendees through CADRE‘s registration process and a series of exercises: 1) create a working network from citation data; 2) apply a machine learning based on neural network embeddings (e.g., node2vec) 3) conduct their own CADRE queries and reproduce the analysis with them; 4) containerize and share it in CADRE marketplace.
To ensure accessibility to all attendees, all tutorial activities will be conducted on CADRE’s free tier and using the open Microsoft Academic Graph and U.S. Patent and Trademark Office (USPTO) datasets, and in accordance to the "GOTO" principle, (Good, Open, Transparent and Objective, in terms of data, resources and materials). Instructions (in English) will be provided orally and in writing. Exercises will be conducted through a programming interface or an intuitive, graphical user interface for those without programming skills. Attendees will be encouraged to walk through examples using both interfaces. Attendees will provide feedback via questionnaires. Real-time technical support for CADRE will be available during the session.
Prerequisites: a modern web browser.
This workshop will be held on-line, with a limited number of seats available in-person. Both attendance options require registration. Click here to register.
Return to the workshop series home page.