FFII Document Metadata Structure
-> [ KWiki | RSS | Portal | Aktiv | Knecht | MLHT | DB | BibTeX | Project News ]
By "document metadata" we refer to data such as document ID, title, description, keywords, author, date, language etc which are specific to one document and which need to be made available to internal databases as well as to bibliography generators, search indexes and other programs, so as to make the vast amount of information that we collect more easily accessible. Document metadata also provide a layer of abstraction that allows us to weave different content managment systems together. Below we try to keep track of such efforts, as made at FFII.
News & Chronology
2004-12-12 parskwikipag script generates html metadata for kwiki
2004-10-29 RSS newsfeeds generated from Kwiki pages, document metadata markup and parsing becomes a high priority issue
2004-08-17 database tables dokprop and traprop become primary sources for document metadata, including URL/filename generation rules
2004-07-25 this page started in view of portal project IRC meeting at 22.00
2004-07-20 some specifications of document metadata laid down on KWiki page
2003-06 plans for new FFII Portal: ideas for interoperation with other ffii sites include metadata
- 2003 mlht metadata stored in ffii database
2001 mlht metadata stored in text files such as http://swpat.ffii.org/nodes.txt
Outline
Soon the top section of each Kwiki page should no longer be edited by !KWiki but generated from the database and editable via all kinds of frontends, including knecht, aktiv, mlht and a kwiki metadata editing form (possibly based on aktiv).
Items included in Metadata
- title, description and other stuff that is stored as metadata in html header (and thereby improves google ranking)
- position of the static page in the directory hierarchy (from which the URL can be calculated)
- version
- document class, e.g. newsitem, project, event-related project, information collection project, action project etc
- languages in which the document is available
ffii user id of authors, translators, maintainers, watchers and other persons somehow in charge of or related to the page and dates related thereto (e.g. day of translation)
- documents on which this document belongs into the top links section
- in case of news items
- information from which RSS Newsfeed can be generated
documents on which this item belongs into the "news & chronology" section
Metadata Syntax in FFII Database
- tables
- langtxts: translatable strings in (multilinugal) documents
- dok: document
- lang: language
- mlid: identifying tag, 'title' and 'descr' corresponding to title and description of HTML header
- mltxt: text string
- dokprop: document properties that allow derivation of URL, file location etc
- dok: document
- doktmp: time-stamp
- dokvers: version, e.g. 3001017 for 3.1.17
- dokdirp: whether the document is a folder/directory
- doknom: default document stem (language-specific stems specified in table doknoms)
- supdok: upper level document (redundant, to be phased out, only pretyp+predok needed)
- pretyp: type of precedence: 0: none, 1: seq: prior in sequence, 2: grp: upper in grouping, 3: sub: upper in directory structure
- predok: preceding document (note: pretyp+predok are intended to replace supdok, not used in mlht yet)
- doknoms: language-specific document names
- dok: document
- lang: language
- doknom: language-specific document name (only for directories, realisable as a link to default name directory)
- langtxts: translatable strings in (multilinugal) documents
Note that the database itself is self-explanatory. Explanations of tables and fields can be obtained as follows
ffii> select fldval from flddes where fld = 'fld' and fldkey = 'dok' and lang = 'en' ; document ffii> select fldval from flddes where fld = 'tab' and fldkey = 'doknoms' and lang = 'en' ; language specific document names
Metadata Syntax in nodes.txt files
Example http://swpat.ffii.org/nodes.txt
- In addition to the database information above, these contain extra information, some of which is specific to mlht and not to be considered basic document metadata that needs to be shared between web applications
- documentation to be cont'd