This partial mirror of Wikimedia Enterprise HTML dumps is an experimental service.
Dumps are produced for a specific set of namespaces and wikis, and then made available for public download. Each dump output file consists of a tar.gz archive which, when uncompressed and untarred, contains one file, with a single line per article, in json format. Among the attributes defined in the file are the following below, however to see all attributes please visit the data dictionary for Wikimedia Enterprise APIs:
Accompanying the tar.gz file is a file containing the md5sum and the date the dump was produced, also in json format.
All files for a dump run are included in a single directory of the format YYYYMMDD. The dumps will eventually become available on a regular schedule, around the 2nd and 21st of each month.
Initially, we hope to keep three runs for public download, covering about a 6-week period.
If you are in need of more frequent runs, dumps of just article change updates, or the ability to query individual articles from the dumps, visit Wikimedia Enterprise to sign up for a free trial.
View the directories here: other/enterprise_html/runs
Back to other data bundles | the main index page