Skip to content

GFDL text corpus

From the Quicksilver Metaweb.

GFDL text corpus

The GFDL text corpus is the body of all text licensed under the GFDL. This includes all text at all Wikimedia projects including the Wikipedia in all languages, the Disinfopedia text (though this is hardly neutral), the Internet Encyclopedia (a much more credible encyclopedia structure but so far only in English) and the Consumerium and Metaweb texts. Because the GFDL has no parameters and no options, any text from any of these can be copied from one to the other with no restrictions. Contrary to popular belief one need not attribute the text to its specific origin, merely note it is GFDL-licensed and indicate where source text can be retrieved. This may not be the place where most texts originated.

Please note that this corpus is not the same as the body of text managed in mediawiki nor by wikimedia, and does not include Creative Commons by-sa licensed projects like wikitravel which also use mediawiki but not the GFDL.

Note also that there is a de facto wikitext standard imposed on this corpus by mediawiki itself, but there is no common intermediate page format, thus no obvious translation to a semantic web.

See Metaweb:Projects for some suggestions as to how to better manage the GFDL text corpus and remove some of the idiosyncratic wiki management problems that arise due to biases of Wikipedia and Disinfopedia.

More on these issues at Consumerium: GFDL text corpus.

WikiInfo: GNU Free Documentation License Text Corpus

The GFDL text corpus is the body of all texts licensed under the GNU Free Documentation License. At present it consists of at least hundreds of thousands of general interest articles in dozens of languages. The number is many more than all other free documentation licenses combined, making this corpus a theoretical target for any wikitext standard, or markup management based on MediaWiki or GetWiki.

The corpus is not coterminous with the texts and content presented by Wikimedia. Some web services use other software, such as GetWiki, while others plan to produce entirely different packages. There is disagreement on the problems of how to manage the corpus, especially given that different point of view rules are evolving. Text corpus management, web services management, nonprofit governance, wiki management, and wikitext standard issues are separate, but because of technological dependencies, are very often intertwined.

Legal issues with GFDL, for example, are poorly defined, as some consider its use nearly equivalent to placing work in the public domain. Text in the corpus cannot easily mix with text from any other corpus, such as Creative Commons, or others. Some think the Creative Commons should become the overall corpus, often those who believe there must be a central organization with legal competence. Wiki-sites, such as Wikipedia, Disinfopedia, Metaweb and Wikinfo have established standards for attribution, reference checking, and neutral standards, all key components of management of the corpus on the internet.