About |

Internet Memory Foundation

Search help

The UK Government Web Archive Search Facility is a beta release, and forms part of ongoing work to promote the contents of the web archive and make the collection more accessible to a wide range of users.

Quick search

  • Search results contain at least one occurrence of all the search words entered. The search terms can occur in any order in the page, and any number of times.
  • Search results will not return parts of search terms, e.g. a search for "expenditures" will not return results that only contain "expend"
  • Search allows the use of ? as a single character wildcard, e.g. m?seum will return pages containing museum.
  • Search does not support AND, OR, NOT operators.

Advanced search

The advanced search page provides greater control over the results returned.

  • All of these words. Using this filter without any of the other advanced search options is equivalent to using the "Quick Search" page.
  • None of these words. Returns results that exclude the search terms entered.
  • This exact word or phrase. Returns results containing at least one match on the exact word or phrase.
  • Categories. All Government websites in the collection are organised into one or more categories. Select one or more categories (up to a maximum of four) in order to narrow the scope of your search. If you do not select any categories, the search takes place across the entire collection.
  • File Types. Search can be limited to include only html files (ending with either .html or .htm) or Portable Document Format files (ending with .pdf).

Top of page

Search results - general

For most searches, the results will be drawn from several domains or websites. The initial result set returned from a query is, however, limited to two results per domain or website (except when using categories with "Advanced Search"). Figure 1 shows an example extract from the initial results page following a search, from either the "Quick Search" or "Advanced search" page.

Figure 1: Extract from general results page

Figure 1: Extract from general results page image

Each search result contains the following:

  • Page title - Taken from the title of page at the time it was captured, i.e. the html <title> tag. Forms a hyperlink to the archived page.
  • Summary snippet - A summary drawn from the contents of the page. The user's search term(s) will not always appear in the summary but, where they do, background highlighting is used.
  • Organisation information – the name of the department or other government body from which the page originated. (This may sometimes be missing if, for example, the page comes from a site not in the main collection but crawled via a link from another site.)
  • More results from this domain – this link (which appears only where additional results for a domain have been retrieved but not displayed) opens a further set of search results limited to the domain from which the current result originates. See Figure 2 and the explanation below for more information.
  • Original URL – the original location of the archived page.
  • Archived at – the date the page was archived. This may be some time after the page was originally placed on the web site.
  • All versions of this page by date – Links to an index page listing all versions of the page held in the web archive. The other versions will not necessarily contain the search term(s)

If the result set was generated from an Advanced Search, any filters applied (i.e. if the user uses any search option other than "All of these words") are listed at the top of each page of the results. A "Remove" link is available next to each filter. Clicking this link re-runs the search without this filter restriction.

Results pages may also contain a "Show hidden results" link. This takes the user to an expanded result set without the two results per domain limit, and this can potentially contain a very large number of hits. Each page in this result set shows an estimated result count and the current position within the results, e.g. Results 1 – 10 of 4500 results.

Top of page

Figure 2: Extract from domain specific results page

Figure 2: Extract from domain specific results page image

Top of page

Search - questions and answers

Can I search for a specific URL?

Yes. If you know the full URL, enter this in the Quick Search box, and select the "URLs" option in the drop down menu. If you do not know the full URL, type in as many characters as you know and select "the collection" in the drop down menu. Where the domain consists of two or more words joined together, you will normally obtain better results by entering the joined term, e.g. "britishmuseum" rather than "British Museum."

Does the search include hidden page content?

Yes, the search covers all page content that can be read and indexed by the search engine. This includes content html <meta> tags, but not image content (unless it is duplicated in readable text, e.g. via an alt tag).

Can I search for results within only one domain or department?

The simplest way to do this is to start with a general search using one or more search terms likely to bring back results for that domain. You can then locate a result from the target domain in the result set, and click the "More results from this domain" link.

Why is the same page listed more than once in the results?

This can occur because the archive often holds multiple versions of web site pages harvested at different dates. In the initial summary results, only two hits from each domain are presented. Normally, these will be two different pages but in some cases they can be two dated instances of the same page. If you click the "View hidden results" link, you may see more instances of the same page listed.

You can use the "View All Versions" link next to each result to view a listing of all versions of that page captured in the web archive. You can then navigate from there to a specific version.

I can't see an estimate of the total number of results from my search. How many results are there?

Due to the size of the web archive, it is unfortunately not always possible for technical reasons to provide a reliable estimate of the initial result count returned by a search query. If you click the "View hidden results" or "More results from this domain" links, these result sets then include an estimate for the total number of hits.

How comprehensive are these results? I've come across a page that matches the search term but it's not returned in the results.

The initial search results from "Quick Search" (and "Advanced Search" if no category restriction is applied) include a maximum of two hits per domain, and a limited set of results overall. If the page is not listed in the initial result set, it may be returned in the search results by selecting either the "All Results" or "More from Domain" buttons. Furthermore, the current search index does not include pages later than January 2009.

Why isn’t the site I'm looking for higher up the search results, given the importance of the search term to that site?

The search takes in over 1000 different domains with varying numbers of pages, and the search term(s) may occur in many of these. The search has no means of ranking sites according to the relative importance of search terms on a site specific basis.

Why does the summary snippet and /or result contain strange characters?

This can arise occasionally due to the character formats and encoding used on the original web pages at the time of capture.

Top of page

The National Archives logo