C&RL News October 2017 482

The June 2017 ACRL/SPARC Forum at the ALA Annual Conference focused on re-
cent efforts to build cooperative programs to 
ensure persistent access to open data, includ-
ing science data provided by the U.S. federal 
government. Data Rescue events, inaugurated 
at the University of Pennsylvania, catalyzed 
librarians, scientists, technologists, and other 
open data advocates to build a broad and 
resilient coalition to ensure against future data 
loss. Here, the three speakers from the forum 
reflect upon their own experiences with Data 
Rescue events and how they view opportuni-
ties for collective action going forward.

University of Pennsylvania
The Penn Libraries joined the Penn Program 
in Environmental Humanities (PPEH) in late 
2016 in an effort to safeguard federal envi-
ronmental and climate data as part of the 
Data Refuge project. The Data Refuge effort, 
grounded in concerns from PPEH Graduate 
fellows and from scientists, journalists, and 
others, grew into a broad and open network 
supporting more than 50 local events. 

These Data Rescue events brought thou-
sands of people together to support ongoing 
web archiving efforts, as well as downloading 
and harvesting many web-based datasets that 
are not accessible to web archiving. By the end 
of spring when most events were completed, 
many thousands of websites were archived 
in the Internet Archive, and terabytes of data 
were harvested and stored in datarefuge.org 

and within library repositories. The events also 
included teach-ins, panels, and a wide range 
of locally created opportunities for commu-
nity learning. Storytelling programs provided 
opportunities for people to reflect and learn 
about the vast role of federal datasets in the 
daily lives and health of communities, and the 
act of harvesting, web archiving, and identi-
fying valuable datasets was eye-opening for 
many of us as we learned just how deeply 
complex and fragile the digital information 
landscape can be.

Working alongside PPEH and many oth-
ers in supporting these events as they gained 
momentum throughout the winter and into 
spring was deeply inspiring for many of us at 
the Penn Libraries. Data Refuge allowed us to 
learn from people across scientific communi-
ties, within open data communities, talented 
government data stewards, digital preservation 
experts, data storytellers, activists, artists, and 
many librarians working at institutions who 
saw a role for themselves in ensuring that 
multiple copies of vital data would remain 
available for our communities into the future. 

Laurie Allen, Claire Stewart, and Stephanie Wright

Strategic open data preservation
Roles and opportunities for broader engagement by librarians  
and the public

Laurie Allen is assistant director for digital scholarship, 
at the University of Pennsylvania, email: laallen@upenn.
edu, Claire Stewart is associate university librarian for 
Research and Learning at the University of Minnesota, 
email: cstewart@umn.edu, and Stephanie Wright is 
program lead for Mozilla Science Lab at the Mozilla 
Foundation, email: stephanie@mozillafoundation.org
 
© 2017 Laurie Allen, Claire Stewart, and Stephanie Wright

scholarly communication

mailto:laallen%40upenn.edu?subject=
mailto:laallen%40upenn.edu?subject=
mailto:cstewart%40umn.edu?subject=
mailto:stephanie%40mozillafoundation.org?subject=


October 2017 483 C&RL News

Moving across these communities helped us 
see a tremendous opportunity that some librar-
ies are already beginning to take advantage 
of—to reach out to federal data publishers and 
stewards and open data advocates in forming 
new partnerships. 

As spring advanced, we decided to retire 
the bucket brigade workflow that we had 
collaborated in developing over the course 
of winter and turn our attention to absorbing 
and sharing the lessons of the project in other 
ways. We joined with ARL and Mozilla to form 
the Libraries+ Network1 to support further col-
laboration across communities and hosted a 
large meeting in Washington, D.C., in May to 
bring together leaders from government, open 
data, open science, and library communities, 
among others. 

Across that meeting and many other activi-
ties, one lesson that repeatedly emerged was 
the inseparable nature of data and stories. 
That is, the more we listened to the stories 
from communities about their environment, 
the more valuable the data became. And the 
more we dove into the technology, politics, 
and logistics of copying and storing data, the 
more it became clear that stories were our 
most vital tool for understanding the creation 
and stewardship of federal information, as 
well as its use. 

To expand and enrich the effort to share 
stories about the uses of federal data, Bethany 
Wiggin and PPEH continue collecting data 
stories as part of the Three Stories in Our 
Town project with support from the National 
Geographic Foundation. Within the librar-
ies, we are now pursuing and supporting 
new collaborations for safe and distributed 
approaches to federal data replication and 
preservation, and broadening our interest 
in local collaborations. We continue to look 
outwards towards the data needs of the wider 
world, and are beginning to look with fresh 
eyes at the data needs of the University of 
Pennsylvania community, rethinking our role 
in caring for the data that is produced locally 
and on campus, and for the data that is needed 
by members of our community now and into 
the future. 

The University of Minnesota Libraries
The University of Minnesota (UM) Libraries, 
in conjunction with colleagues in Liberal Arts 
Technologies and Innovation Services, hosted 
a Twin Cities Data Rescue event on Febru-
ary 24 and 25, 2017. Modeled on events2 held 
around the country, the event attracted 150 
participants who took on roles such as select-
ing URLs to harvest, organizing and describ-
ing datasets, and scraping unharvestable sites. 
Many of the participants were new to work-
ing with data, metadata, and web harvesters. It 
was an eye-opening, though fulfilling, experi-
ence, though the 15GB harvested and 26 da-
tasets “bagged” over the two days were noth-
ing more than a drop in the ocean of valuable 
data. The event attracted local media interest, 
including a Minnesota Public Radio interview 
with our Government Publications and Re-
gional Depository librarian.

As with many other events, one of the Twin 
Cities event’s greatest benefits was creating op-
portunities to connect with the broader campus 
and local community and to identify advocates 
for the cause of well-preserved and accessible 
science data. But there were also some in our 
community who wondered why the effort was 
needed and whether the data were really at 
risk. This presented opportunities to share, and 
to reflect internally, about the libraries existing 
research data program, and to consider expand-
ing its activities to include partnerships around 
preserving public data. 

Consistent with the university’s land grant 
mission, the UM Libraries have offered research 
data services for many years. In fact, some of 
the earliest efforts have been the preservation 
of critical government information through 
our Regional Depository of U.S. Government 
Documents. The libraries have been gradually 
increasing our investment in services to support 
public access to information, including data. 
Our regional program expanded to a three-state 
program in 2010. We offer robust data man-
agement education to our local research com-
munity, which has proved particularly popular 
with graduate students, who have overfilled 
every research data management boot camp 
we’ve offered. In 2015, services expanded to 


C&RL News October 2017 484

include a Data Repository for the University of 
Minnesota. We are developing a new strategic 
plan for research data, exploring many new 
potential services.

The conversations sparked this year have 
also brought welcome attention to the im-
portance of multi-institution collaboration 
and alignment with faculty initiatives. It has 
also highlighted opportunities for libraries to 
expand their work with local agencies, includ-
ing government bodies. One such opportunity 
may arise through UM’s work with the Big Ten 
Academic Alliance (BTAA) Geoportal,3 which 
provides advanced discovery tools for more 
than 6,000 GIS data files and historical maps. 
The project has begun to grapple with the pos-
sibility of expanding its role to include primary 
preservation of these data. Some local and state 
agencies, under increasing financial pressure, 
face difficult choices between providing access 
to current versions of spatial information, and 
retaining older versions. These are truly public 
data at risk, and their loss would be catastrophic 
to researchers and policymakers. Of course, aca-
demic libraries are far from immune to financial 
pressures, but the BTAA Geoportal project is 
demonstrating that a modest shared financial 
commitment can yield significant returns. UM 
Libraries, like many academic libraries, will be 
seeking to balance its sense of commitment 
to broader data preservation with a pragmatic 
view of potential financial impact and a strong 
interest in working collectively with academic 
and nonacademic partners. 

Mozilla Science 
While well known for the Firefox browser, 
Mozilla is less known for the Foundation4 be-
hind the browser, which has a mission to keep 
the Internet healthy,5 open, and accessible to 
all. One of the areas in which they try to do 
this is through the Mozilla Science Lab (MSL),6 
a program within the Foundation, focused on 
providing support for open data and open 
scholarship. The organization believes that 
building the capacity of researchers, librarians, 
educators, and developers through training 
programs, collaborative events, mentorships, 
and fellowships will lead to better adoption 

rates for open research and mobilize commu-
nities to advance open, data-driven science. 

MSL became involved when one of our 
Fellows for Science,7 Danielle Robinson, heard 
about the Data Refuge events taking place 
around the country and jumped into action 
to host one at the Mozilla offices in Portland, 
Oregon, along with another Mozilla Network 
member in Portland, Max Ogden. The two had 
several conversations with the Data Refuge 
team at the University of Pennsylvania and, 
in the end, rather than spending the evening 
downloading new datasets, the Portland team 
decided to focus on creating metadata files for 
the datasets already downloaded by others. In 
preparation for that, they developed a tutorial 
on how to create a metadata file in JSON and 
shared it on the GitHub repository created for 
the event,8 where more experienced participants 
started. Those new to GitHub were given a 
quick overview of GitHub and how to use it for 
the project, then they moved on to the metadata 
creation with the others. 

The event took place one evening after “nor-
mal” work hours. Pizza, drinks, and eventually 
coffee, were provided to keep energy levels 
high. This was a true community-building event 
and participants of all levels of experience were 
invited to join in. All that was required was a 
laptop, interest, and a GitHub account. There 
were people from widely varying backgrounds: 
beginners to laptop jockeys, academia to gen-
eral public. Contrary to expectations, they didn’t 
think metadata was boring. They all showed 
passion and interest toward the cause of mak-
ing sure this data was available for all, long 
into the future. It was enough to raise the (still 
unanswered) question: How much are we be-
ing limited on what we can do collaboratively 
because we are held back by the assumptions 
we make about how others perceive our work?

It was this question that led MSL to pursue 
a partnership with Penn Libraries and the 
Association of Research Libraries to form a 
broader community of experts across libraries, 
archives, government, lawyers, and more, as 
part of a longer-term plan around the issues 
of long-term open access to federal data. 

(continues on page 495)


October 2017 495 C&RL News

DF: I know that there are many reasons to 
take a new job. Sometimes you simply need out 
of a bad or boring situation. Sometimes you need 
more money or a bigger challenge. Regardless of 
why you are looking for your next opportunity, 
our experiences seem to share two very impor-
tant themes. First, we were all excited by a new 
challenge, something that would drive each of 
us and help us grow personally. Second, we all 
interviewed our potential employers as much as 
they interviewed us. We had questions that needed 
answers, at least perfunctorily, and we had bench-
marks that we were looking for. Though I would 
generally encourage self-confidence and making 
that humbling leap into leadership, it is important 
to ask yourself questions. Is this the right time? Is 

this the right position? Is this the right institution? 
An interview is not a one-way street, and becoming 
a leader, not just a manager, director, or dean, is 
not either. Being a leader is about lifting up your 
team, not yourself, so it is important that you know 
what you are getting yourself into from the start. 

Conclusion
This is part one in a three-part series. In part 
two, Powers, Garnar, and Fife will address in-
tegrating themselves into new organizations 
and teams. They will focus on the essential na-
ture of humility for new leaders, asking ques-
tions, identifying stakeholders, and accepting 
that you can still lead, while admitting that you 
do not know everything.  

We wanted to make sure the open data com-
munity members we worked with were a part 
of this larger community, called the Libraries+ 
Network, because there was so much these 
individual communities could learn from each 
other. The details are laid out in the recently 
released report9 from the workshop.

To quote Danielle Robinson, “Usually aca-
demia does not hack! We form subcommittees.” 
In the open data community, one sees a lot of 
hacking, which for this purpose we are using 
the Oxford English Dictionary definition of 
“providing a quick or inelegant solution to a 
particular problem.” 

Hacking has its benefits. These events were 
successful because event hosts could work 
quickly to put them together without having to 
wait for central coordination and committees to 
agree upon standards. It also has its drawbacks: 
a greater possibility of duplication of effort or 
needing to revisit work that was done to “clean 
it up.” Neither of these workflows is right or 
wrong, but each has valuable components that 
can help the other. Our next steps are to take 
the lessons and viewpoints learned from these 
events and the Libraries+ Network and think 
bigger about what we can accomplish together.

Notes
1.  Libraries Plus Network, “Libraries Plus 

Network,” Libraries+ Network, accessed Au-
gust 13, 2017, https://libraries.network/. 

2.  University of Pennsylvania Program in 
the Environmental Humanities, “DataRescue 
Events,” PPEH Lab, accessed August 13, 2017, 
www.ppehlab.org/datarescue-events/ and 
www.ppehlab.org/datarefuge/.

3.  Big Ten Academic Alliance, “Big Ten 
Academic Alliance Geoportal,” accessed Au-
gust 13, 2017, https://geo.btaa.org/.

4.  Mozilla, “Mozilla Network,” Mozilla 
Network, accessed August 13, 2017, https://
network.mofoprod.net/.

5.  Mozilla, “The Internet Health Report 
v.0.1,” The Internet Health Report, January 
2017, https://internethealthreport.org/v01/.

6.  Mozilla, “Mozilla Science Lab,” Mozilla 
Science, accessed August 13, 2017, https://
science.mozilla.org.

7.  Mozilla, “Mozilla Science Fellow-
ships,” Mozilla Science, accessed August 13, 
2017, https://science.mozilla.org/programs 
/fellowships/overview.

8.  Danielle Robinson, Data-Rescue-PDX: 
Volunteer Guide, and Other Materials for 
DATA RESCUE PDX, 2017, https://github.com/
daniellecrobinson/Data-Rescue-PDX.

9.  The Grove, “Libraries+ Network Meet-
ing Report,” May 8, 2017, https://libraries.
network/s/may-meeting-report.pdf. 

(“Strategic open data preservation,” continues from page 484) 

https://libraries.network/
http://www.ppehlab.org/datarescue-events/
http://www.ppehlab.org/datarefuge/
https://geo.btaa.org/
https://network.mofoprod.net/
https://network.mofoprod.net/
https://internethealthreport.org/v01/
https://science.mozilla.org
https://science.mozilla.org
https://science.mozilla.org/programs /fellowships/overview
https://science.mozilla.org/programs /fellowships/overview
https://github.com/daniellecrobinson/Data-Rescue-PDX
https://github.com/daniellecrobinson/Data-Rescue-PDX
https://libraries.network/s/may-meeting-report.pdf
https://libraries.network/s/may-meeting-report.pdf