Documenting the database that contains the text strings for the multinlingual hypertext (mlht) system as well as, on top of it, most other ffii-related data.
News & Chronology
- 2004-07-17 this page established
The MLHT database is the set of relational database tables used for describing multilingual documents. It is applied by the MLHT system but is designed to be used by others as well.
Examples of MLHT database tables are
- dokprop (properties relevant to each document)
traprop (properties relevant to each translation variant of a document)
- mlhtdok (small primary database in which document ids are defined)
- langtxts (very large, contains all text strings in mlht documents)
Some tables contain database metadata, i.e. they define how the actually used tables behave. Such tables include
- klesi : classes of tables
- deftab : tables (feklee: "See my comments concerning the table deffld, below.")
- deffld : fields (each field name is unique across tables: same names mean that 'natural join' can be used) (feklee: "IMO, this table should be removed or converted into a view. 1. It doesn't seem contain any information that's not already in !PostgreSQL's native meta data. 2. It's hard to keep the information in this table up to date. 3. The table doesn't seem to be documented.")
- tabflds: positions of fields in tables
- flddes : multilingual descriptions of fields, tables and many other kinds of entities, used for documentation
Metadata in PostgreSQL System Catalogs
feklee: !PostgreSQL already maintains meta data in the form of System Catalogs. These catalogs could be used to replace some, if not all, of the above meta data tables. The advantage when it comes e.g. to documentation is that tools such as psql or phpPgAdmin offer facilities to easily view and modify comments of tables, fields, etc..
phm: We need a script that synchronises metadata between the !PostgreSQL System Catalogs and the flddes tables. It seems to me that there are certain things that each of these does that the other can't replace. Since the !PostgreSQL idocs don't provide examples that allow me to immediately and use the !PostgreSQL metadata, I may be underestimating their power. It would be nice to have some examples in the section below.
feklee: Using a script for synchronizing metadata is not a good solution IMO. Some reasons:
- Data likely gets out of sync in between runs of the script.
- It probably is hard to get up running reliably and it is probably hard to maintain.
Also, having two sources of meta data is problematic since it introduces redundant data. Thus I propose that the use of views generating MLHT metadata from !PostgreSQL metadata should be investigated.
Example of meta documentation using FLDDES and PostgreSQL System Catalogs
Finding out what the table 'ven' and the field 'ven' mean using MLHT's meta data:
- psql ffii
ffii> select * from flddes where fld = 'tab' and fldkey = 'ven' ; ffii> select * from flddes where fld = 'fld' and fldkey = 'ven' ;
Setting and retrieving documentation from system catalogs is shown in the example below. More information on comments can be found in the chapter Database Management in Practical !PostgreSQL and in the section on COMMENT in the !PostgreSQL documentation.
- psql ffii
ffii> COMMENT ON TABLE ven IS 'My Participation in Events'; ffii> COMMENT ON COLUMN ven.ven IS 'Event ID'; ffii> \dd ven ffii> \d+ ven ffii> SELECT description, relname FROM pg_description, pg_class WHERE pg_description.objoid = pg_class.oid AND relname = 'ven';
Opinions by feklee:
- A disadvantage of system catalogs is that they only allow documentation in one language. But
00 the database comments are not browsed by end users but by system administrators/programmers/... that are used to having code/data documented in english. 00 If multilingual comments are desired, then we could make up a system that encodes comments in multiple languages in a single string. Such a system could also be used when documenting code e.g. in Perl.
A disadvantage of the MLHT system is that column names must be unique in the whole database. While some may like such a convention, I think it's bad: One has to embed the table name or a (mostly unreadable) shortcut in the field name which makes code harder to read and field names less "self-documenting", IMHO. One may of course argue that this reduces the amount of documentation, but who sais that every field name has to be documented? Often it's for example sufficient to document just primary keys and leave away documentation for foreign keys.
An important advantage of storing documentation in system catalogs is that user interfaces such as [=psql] and php!PgAdmin allow easy access to system catalog documentation. People are already using these facilities and some probably don't update [=flddes] since that's non-obvious or since that's duplicate work.