Wikistats: Pageview complete dumps
Maintained by the Wikimedia Analytics team
Link to the dumps
Pageview complete is our best effort to provide a comprehensive timeseries of per-article pageview data
for Wikimedia projects. Data spans from December 2007 to the present with a uniform format and compression.
Features of the dataset
- Unified, human-readable project names (e.g. es.wikisource instead of es.s)
- Page IDs (from 2015 onwards) that allow this dataset to be joined with Wikidata and Mediawiki History
- Corrected one-hour skewing problem
- Where available, agent type and access method dimensions exposed.
- From 2020 on, it includes numbers on agents detected through our bot detection filters
- KNOWN ISSUE: rows without Page IDs have only 5 columns, while rows with Page IDs have 6. We are applying a fix which will take some time.
Details on data segments
Sets of daily files are derived from the best data available
at the time:
Data format
Compression of dataset is similar to that of pagecounts-ez: bzip files with hourly data embedded on each row, following this format:
- wiki code (subproject.project)
- article title
- page id
- daily total
- hourly counts
Hourly counts can be deciphered as follows:
- Hour:
- from 0 to 23, written as 0 = A, 1 = B ... 22 = W, 23 = X
All Analytics datasets are available under the Creative Commons CC0 dedication.