Wikipedia:The hidden encyclopedia that resides in the article histories

This essay is about the value of the knowledge that is hidden in the revision history of Wikipedia articles. The value of article's history is not just about authorship attribution to comply with CC-BY, or studying Wikipedia's own evolution: it's much more than that.

Introduction

[edit]

The amount of information that we can easily read at first glance in Wikipedia is immense. A full human life would not give enough time to read Wikipedia in full. But, sometimes, you find out that some important information is missing. In many cases, the solution is just to add it. But in many other cases, the missing information is already there, but you can't see it at a first glance. It's hidden in the article's history.

How to use a previous revision of an article

[edit]

Just click on View history link on top of the article. There, you will see the full list of revisions the article has. The navigation is difficult, particularly for articles that have a lot of revisions. But, with a bit of work, you can find and read any past revision of any article (unless such precise revision was deleted for a good reason, such as copyright infringement).

When using past revisions of articles, be careful with vandalism, unsourced statements, original research and similar problems. Reverted editions and short-lived contributions should generally not be trusted. The more revisions a text is present in, the more reliable it is, particularly if it is supported by citations.

Beware of some updated content that may be seen in some previous article revisions

[edit]

Please note that not all content from a certain article version will necessarily look exactly as it looked when that version was the current one: some images (whether stored in Wikipedia itself or in Wikimedia Commons) may have been deleted, or, more confusing when looking at a past revision of an article, they may have been updated with new content, especially in the case of certain maps, graphics or charts that are updated when new information becomes available. A similar situation may arise with data from Wikidata or from the Data namespace, or when Wikipedia templates are used.

In summary, you can view the rendered wiki text of past revisions of articles, but not the articles as exactly looked at the date of those revisions. The vast majority of the content should remain the same, but please be careful with images and the data that may be rendered from sources external to the Wikipedia article.

In Wikipedia dumps

[edit]

For advanced users wanting to deploy a full Wikipedia replica using MediaWiki software, to perform data analytics or machine learning, or, simply, to store a full copy of all information in Wikipedia, following the LOCKSS principle, it's possible to also include all non-deleted revisions of all non-deleted articles (very useful to enlarge as much as possible the knowledge base for analytics or machine learning, and crucial when talking about LOCKSS and digital preservation: why should just the current versions of articles be more worthy of preservation than any other previous revision?).

Suppose that, for example, some article was probably just vandalized, so, if you limit yourself to the current versions only, you will analyze/use for machine learning/preserve a version that has no value at all, while ignoring the ones that were highly valuable.

To download a Wikipedia dump that includes full revision history for all articles, just choose the files that include "pages-meta-history" in their name, shown under the All pages with complete edit history title.

Please note that such dumps are really big (tens of terabytes when uncompressed). Of course, the dumps containing only the last versions of articles need to be used if resources don't allow otherwise. The recommendations here only apply to the cases when it's possible, as should always be true when storing a copy of Wikipedia just for LOCKSS (the set of files containing a compressed copy of all English Wikipedia with full version history is only several hundred gigabytes in size, as of 2025).

Why do I need to browse an article's history to search for information?

[edit]

While Wikipedia doesn't have the space limitations that paper encyclopedias have, there must be a limit to the number of existing articles, and to the size of those articles.

In many articles about big cities or countries, and about their demographics, their past populations at different dates can be easily found. That changes a lot when the article is about a small town or a village: when new census data became available, it's probable that a Wikipedia editor just replaced the old number with the new one, while updating the reference year. If, in addition to the total number of residents, you want to know details about the composition of that population, the matter worsens: only the latest information is likely to be available. There's just no space for more, or the article would grow too much.

There are many different cases that are similar to this: while there are articles about historic data of many things, it's just not possible to keep all the data for all the topics. When it's about prose descriptions and not numeric data, it's even worse: there is just no way at all to keep them. Things change over time, and the history sections, or articles about a topic's history, can only describe a tiny fraction of what the full Wikipedia article said about that topic at a past date.

The Sun Microsystems article describes in high detail how the company was in 2010, when it was acquired by Oracle Corporation. If we can see its situation in 2010 so easily, it's just because it ceased to exist just then. To read about Oracle's situation at the same date, with a similar level of detail, there is no other option than browsing through its article's history, and finding a revision from around that date.

In other cases, an article (or part of it) was rewritten with better content. Yes, the old content was probably inferior to the newer one, but maybe some valuable detail ceased to be included in the article: now, it's only in a past revision, that can still be read without a problem.

The solution is the same in all cases: just look at the article's version history, and some past revision will include the data you were looking for.

What about articles that no longer exist?

[edit]

There are several reasons an article may cease to exist. When an article is completely deleted from Wikipedia, there should be a good reason for it: for example, the article was created for self-promotion, for defamation, or it was just about a topic that was not notable enough to merit an encyclopedia article. A deleted article (and therefore its past revisions) can not be read by the public, but only by Wikipedia administrators. When we are talking about the encyclopedic usefulness of past revisions of articles, there is no point in thinking about it in articles that were deleted because of being unencyclopedic, so we can just forget about it.

Other articles are not deleted, but merged into other articles. Unlike deleted articles, for merged articles, the full history is still visible. Merged articles must never be deleted, so their contents are not lost, but it's a bit difficult to reach them, when compared to ordinary articles.

For example, the Agnes Skinner article was merged into the List of recurring The Simpsons characters article, so the link redirects to it. But, if you click that link and go to the beginning of the page, you will see a text that reads (Redirected from Agnes Skinner). If you click on the link that is included in that text, you will go to a different page. That page is the article that was merged: if you click on View history, you will see all the revisions of an article that no longer exists. It was a good, encyclopedic article, but someone considered it was better to include it as a part of a bigger one, in place of being an independent article, and, probably, the person who decided that made the right decision. There was nothing wrong with the existence of the article, so it was merged and not deleted. This way, its valuable past revisions remain publicly available, even if the article no longer exists as such. Only for articles that have no place in Wikipedia should past revisions be removed from public view, as is the case when articles are truly deleted.

Other ways of navigating (parts of) an article's history

[edit]

The current MediaWiki interface for article history is not easy to navigate, especially for articles with a really high number of revisions. If you need a quick way to get a version from a certain date, using Internet Archive's Wayback Machine will be much easier. Not all versions of all articles are there, but it should be enough to get some version for any article in any year. Other archive sites, such as archive.today, also have some versions of many articles, from different years.

While the usage of such external tools may be easier and quicker, it's important to note that their existence in no way diminishes the importance of Wikipedia offering the full history of all articles, both through the View history link, and through All pages with complete edit history XML dumps. The drawbacks of such external tools are:

  • archive.today and other similar archive sites provide no information about their owners, so there is absolutely no guarantee of their continuity.
  • As of 2025, Internet Archive has been through a series of lawsuits, that have cost it unknown amounts of money. Their budget was already quite low for storing as much as 175 petabytes of data (as of October 2025, according to their website). As of 2016, they were able to store only 2 copies of their data, both of them in the San Francisco Bay Area, where the risk of earthquakes is high. While there is no updated information on this, it seems unlikely that they can host many copies of their data, given the exponential growth of the total size of their data over recent years, combined with the lawsuits and associated financial problems they were experiencing at the same time.
  • Some of the content offered by Internet Archive in the Wayback Machine comes from Common Crawl. Common Crawl includes content from many Wikipedia articles, although not necessarily all. While Common Crawl data is kept independently of Internet Archive, its mission statement makes no explicit mention of the perpetual preservation of their datasets, while Wikimedia Foundation's one does: The Foundation will make and keep useful information from its projects available on the internet free of charge, in perpetuity.

In summary, while external tools may be the best solution for reading past revisions of articles in many cases, it's very important not to neglect Wikipedia's own capacity to offer its past versions of articles to the public, since it has far better (both financial and infrastructural) resources than any other organization currently doing so, it does it in a far more comprehensive way (all non-deleted versions of all articles), and is commited to keep doing so in perpetuity.