Michigan State University

Datasets for Digital Research

The datasets and data-finding tools listed below are not meant to be used as a source for reading material, but rather as data for text mining or other "non-consumptive" research, that is, research conducted by computational methods which does not reproduce significant portions of text for personal or public display.

In addition to the materials below prepared by the MSU Libraries, also be aware of additional corpora available for linguistic research.

Text and metadata for analysis can also be often be obtained via publishers or content vendors, either through direct negotiation, via API, or a web interface. JSTOR, for example, offers access to ngram word counts via their Data for Research portal.

Recommendations for acquisition of new datasets, requests for assistance gathering and preparing data, and questions about how to use data may be directed to the MSU Libraries Digital Scholarship Collaborative.

Text

Schoolhouse
Sunday School Books in Nineteenth Century America  (open to non-MSU users)
Number of Works: 166 works
Years covered: 1809-1887
Size: 11.6 MB
Newspaper
The Grange Visitor 
(open to non-MSU users)
Number of Works: 429 issues
Years covered: 1875-1896
Size: 8.53 GB
Cookbook
Feeding America 
(open to non-MSU users)
Number of Works: 76 books
Years covered: late 18th - early 20th century
Size: 78+ MB
MAC/MSC Record
M.A.C/M.S.C Record Dataset 
(open to non-MSU users)
Number of Works: 2694 works
Years covered: 1896-1955
Size: 24+ GB
Congress building
U.S. Congressional Collection 
(open to on-campus users)
Number of Works: 17,000+ daily records
Years covered: 1789-1918
Size: 120+ GB
Books
Google Books Dataset
(open to on-campus users)
Number of Works: 3,000,000 approx.
Years Covered: 1500 - 2012
Size: 2.9 TB
Academic Building
MSU Libraries Catalog (in progress)
Number of Works:
Years Covered:
Size:

Image Credits: Schoolhouse by Chris Cole, Newspaper by John Caserta, Book by Derrick Snider, Library designed by libberry, Congress by Martha Ormiston, Cooking by Rafael Farias Leao, UX Personas by Matt Wasser