Michigan State University

Datasets for Digital Research

The datasets and data-finding tools listed below are not meant to be used as a source for reading material, but rather as data for text mining or other "non-consumptive" research, that is, research conducted by computational methods which does not reproduce significant portions of text for personal or public display.

In addition to the materials below prepared by the MSU Libraries, also be aware of additional corpora available for linguistic research.

Recommendations for acquisition of new datasets, requests for assistance gathering and preparing data, and questions about how to use data may be directed to the MSU Libraries Digital Scholarship Collaborative.


Sunday School Books in Nineteenth Century America  (open to non-MSU users)
Number of Works: 166 works
Years covered: 1809-1887
Size: 11.6 MB
The Grange Visitor 
(open to non-MSU users)
Number of Works: 429 issues
Years covered: 1875-1896
Size: 8.53 GB
Feeding America 
(open to non-MSU users)
Number of Works: 76 books
Years covered: late 18th - early 20th century
Size: 78+ MB
MAC/MSC Record
M.A.C/M.S.C Record Dataset 
(open to non-MSU users)
Number of Works: 2694 works
Years covered: 1896-1955
Size: 24+ GB
Congress building
U.S. Congressional Collection 
(open to on-campus users)
Number of Works: 17,000+ daily records
Years covered: 1789-1918
Size: 120+ GB
Google Books Dataset
(open to on-campus users)
Number of Works: 3,000,000 approx.
Years Covered: 1500 - 2012
Size: 2.9 TB
Academic Building
MSU Libraries Catalog (in progress)
Number of Works:
Years Covered:

Image Credits: Schoolhouse by Chris Cole, Newspaper by John Caserta, Book by Derrick Snider, Library designed by libberry, Congress by Martha Ormiston, Cooking by Rafael Farias Leao, UX Personas by Matt Wasser