lang plugin by Matthias Watermann
This plugin allows for adding markup to indicate other languages.
Last updated on 2007-08-15. Provides Syntax.
Compatible with DokuWiki 2005-07-13+.
Sometimes there arises the need to use words, phrases or even whole sentences or paragraphs in a language different from the document's main language1). To support the readers2) of such a document using several languages it's advisably to explicitly markup all language changes in a document.
This plugin allows for adding markup to indicate such language changes.
It is implemented – technically speaking – by adding appropriate span tags around the text in question.
To actually make use of this plugin embed the text using another language than the document's rest in lang tags:
<lang code> ... </lang>
The language-code part is usually the two-letter language code as defined by ISO standard 639, Code for the representation of names of languages, the details of its use are explained in
RFC 3066 Tags for the
Identification of Languages.
See the
HTML specs as well for further details.
Please note that this is socalled inline markup, meaning it is to be used inside block elements3).
The lang tag (as well as its HTML equivalent span) does not constitute a text block but is part of it.
In consequence this means that you'll have to open a new block (by inserting an empty line) in case you want to markup a whole paragraph as can be seen in the following examples.
Suppose a document written in plain English. Some sentences, however, are to be given in another language. Therefore those “foreign” parts are marked up as in the following example:
**1** This is an __English__ sentence. <lang de>Dies ist ein //deutscher// Satz.</lang> This is a second __English__ sentence. **2** This is an __English__ sentence. <lang de-DE>Dies ist ein //deutscher// Satz.</lang> This is a second __English__ sentence. **3** This is an __English__ sentence. <lang de> Dies ist ein //deutscher// Satz. </lang> This is a second __English__ sentence. **4** This is an __English__ paragraph. <lang de-> Dies ist ein //deutscher// Absatz. </lang> This is a second __English__ paragraph. **5** This is an __English__ paragraph. <lang x-klingon>Well, I, er ... dunno how to, hmmm ... write klingon.</lang> This is a second __English__ paragraph.
As can be seen the formatting4) follows the usual
rules for inline markup.
In sections one to three the text portion in a different language5) is just a part (here: sentence) between other parts.
In sections four and five, however, there are newlines (empty lines) before and after the lang markup which renders that part to become a paragraph between other paragraphs.
The resulting HTML, btw, looks as follows:
<p><strong>1</strong></p> <p>This is an <u>English</u> sentence. <span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Satz.</span> This is a second <u>English</u> sentence.</p> <p><strong>2</strong></p> <p>This is an <u>English</u> sentence. <span lang="de-DE" xml:lang="de-DE">Dies ist ein <em>deutscher</em> Satz.</span> This is a second <u>English</u> sentence.</p> <p><strong>3</strong></p> <p>This is an <u>English</u> sentence. <span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Satz. </span> This is a second <u>English</u> sentence.</p> <p><strong>4</strong></p> <p>This is an <u>English</u> paragraph.</p> <p><span lang="de" xml:lang="de">Dies ist ein <em>deutscher</em> Absatz. </span></p> <p>This is a second <u>English</u> paragraph.</p> <p><strong>5</strong></p> <p>This is an <u>English</u> paragraph.</p> <p><span lang="x-klingon" xml:lang="x-klingon">Well, I, er ... dunno how to, hmmm ... write klingon.</span></p> <p>This is a second <u>English</u> paragraph.</p>
It's quite easy to integrate this plugin with your DokuWiki:
{dokuwiki}/lib/plugins (make sure, included subdirectories are unpacked correctly); this will create the directory {dokuwiki}/lib/plugins/lang.chown apache:apache dokuwiki/lib/plugins/* -Rc
You might as well use the plugin manager for installing or updating this plugin.
Here comes the GPLed PHP source6) for those who'd like to scan it before actually installing it:
<?php if (! class_exists('syntax_plugin_lang')) { if (! defined('DOKU_PLUGIN')) { if (! defined('DOKU_INC')) { define('DOKU_INC', realpath(dirname(__FILE__) . '/../../') . '/'); } // if define('DOKU_PLUGIN', DOKU_INC . 'lib/plugins/'); } // if // include parent class require_once(DOKU_PLUGIN . 'syntax.php'); /** * <tt>syntax_plugin_lang.php </tt>- A PHP4 class that implements * a <tt>DokuWiki</tt> plugin to specify an area using a different * language than the remaining document. * * <p> * Markup a section of text to be using a different language, * <tt>lang 2-letter-lang-code</tt> * </p><pre> * Copyright (C) 2005, 2007 DFG/M.Watermann, D-10247 Berlin, FRG * All rights reserved * EMail : <support@mwat.de> * </pre> * <div class="disclaimer"> * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either * <a href="http://www.gnu.org/licenses/gpl.html">version 3</a> of the * License, or (at your option) any later version.<br> * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * </div> * @author <a href="mailto:support@mwat.de">Matthias Watermann</a> * @version <tt>$Id: syntax_plugin_lang.php,v 1.4 2007/08/15 12:36:19 matthias Exp $</tt> * @since created 1-Sep-2005 */ class syntax_plugin_lang extends DokuWiki_Syntax_Plugin { /** * @publicsection */ //@{ /** * Tell the parser whether the plugin accepts syntax mode * <tt>$aMode</tt> within its own markup. * * @param $aMode String The requested syntaxmode. * @return Boolean <tt>TRUE</tt> unless <tt>$aMode</tt> is * <tt>plugin_lang</tt> (which would result in a * <tt>FALSE</tt> method result). * @public * @see getAllowedTypes() * @static */ function accepts($aMode) { return ('plugin_lang' != $aMode); } // accepts() /** * Connect lookup pattern to lexer. * * @param $aMode String The desired rendermode. * @public * @see render() */ function connectTo($aMode) { // See http://www.w3.org/TR/html401/struct/dirlang.html#h-8.1.1; // better (specialized) REs are used in 'handle()' method. $this->Lexer->addEntryPattern( '\x3Clang\s+[a-z\-A-Z0-9]{2,})?\s*\x3E\s*(?=(?s).*?\x3C\x2Flang\x3E)', $aMode, 'plugin_lang'); } // connectTo() /** * Get an associative array with plugin info. * * <p> * The returned array holds the following fields: * <dl> * <dt>author</dt><dd>Author of the plugin</dd> * <dt>email</dt><dd>Email address to contact the author</dd> * <dt>date</dt><dd>Last modified date of the plugin in * <tt>YYYY-MM-DD</tt> format</dd> * <dt>name</dt><dd>Name of the plugin</dd> * <dt>desc</dt><dd>Short description of the plugin (Text only)</dd> * <dt>url</dt><dd>Website with more information on the plugin * (eg. syntax description)</dd> * </dl> * @return Array Information about this plugin class. * @public * @static */ function getInfo() { return array( 'author' => 'Matthias Watermann', 'email' => 'support@mwat.de', 'date' => '2007-08-15', 'name' => 'LANGuage Syntax Plugin', 'desc' => 'Markup a text area using another language', 'url' => 'http://www.dokuwiki.org/plugin:lang'); } // getInfo() /** * Where to sort in? * * @return Integer <tt>498</tt> (doesn't really matter). * @public * @static */ function getSort() { return 498; } // getSort() /** * Get the type of syntax this plugin defines. * * @return String <tt>'formatting'</tt>. * @public * @static */ function getType() { return 'formatting'; } // getType() /** * Handler to prepare matched data for the rendering process. * * <p> * The <tt>$aState</tt> parameter gives the type of pattern * which triggered the call to this method: * </p> * <dl> * <dt>DOKU_LEXER_ENTER</dt> * <dd>a pattern set by <tt>addEntryPattern()</tt></dd> * <dt>DOKU_LEXER_MATCHED</dt> * <dd>a pattern set by <tt>addPattern()</tt></dd> * <dt>DOKU_LEXER_EXIT</dt> * <dd> a pattern set by <tt>addExitPattern()</tt></dd> * <dt>DOKU_LEXER_SPECIAL</dt> * <dd>a pattern set by <tt>addSpecialPattern()</tt></dd> * <dt>DOKU_LEXER_UNMATCHED</dt> * <dd>ordinary text encountered within the plugin's syntax mode * which doesn't match any pattern.</dd> * </dl> * @param $aMatch String The text matched by the patterns. * @param $aState Integer The lexer state for the match. * @param $aPos Integer The character position of the matched text. * @param $aHandler Object Reference to the Doku_Handler object. * @return Array Index <tt>[0]</tt> holds the current * <tt>$aState</tt>, index <tt>[1]</tt> the match prepared for * the <tt>render()</tt> method. * @public * @see render() * @static */ function handle($aMatch, $aState, $aPos, &$aHandler) { if (DOKU_LEXER_ENTER == $aState) { $hits = array(); // RFC 3066, "2. The Language tag", p. 2f. // Language-Tag = Primary-subtag *( "-" Subtag ) if (preg_match('|\s+([a-z]{2,3})\s*>|i', $aMatch, $hits)) { // primary _only_ (most likely to be used) return array($aState, $hits[1]); } // if if (preg_match('|\s+([a-z]{2,3}\-[a-z0-9]{2,})\s*>|i', $aMatch, $hits)) { // primary _and_ subtag return array($aState, $hits[1]); } // if if (preg_match('|\s+([ix]\-[a-z0-9]{2,})\s*>|i', $aMatch, $hits)) { // 1-letter primary _and_ subtag return array($aState, $hits[1]); } // if if (preg_match('|\s+([a-z]{2,3})\-.*\s*>|i', $aMatch, $hits)) { // convenience: accept primary with empty subtag return array($aState, $hits[1]); } // if // invalid language specification return array($aState, FALSE); } // if return array($aState, $aMatch); } // handle() /** * Add exit pattern to lexer. * * @public */ function postConnect() { $this->Lexer->addExitPattern('\x3C\x2Flang\x3E', 'plugin_lang'); } // postConnect() /** * Handle the actual output creation. * * <p> * The method checks for the given <tt>$aFormat</tt> and returns * <tt>FALSE</tt> when a format isn't supported. <tt>$aRenderer</tt> * contains a reference to the renderer object which is currently * handling the rendering. The contents of <tt>$aData</tt> is the * return value of the <tt>handle()</tt> method. * </p> * @param $aFormat String The output format to generate. * @param $aRenderer Object A reference to the renderer object. * @param $aData Array The data created by the <tt>handle()</tt> * method. * @return Boolean <tt>TRUE</tt> if rendered successfully, or * <tt>FALSE</tt> otherwise. * @public * @see handle() * */ function render($aFormat, &$aRenderer, &$aData) { if ('xhtml' != $aFormat) { return FALSE; } // if static $VALID = TRUE; // flag to notice invalid markup switch ($aData[0]) { case DOKU_LEXER_ENTER: if ($aData[1]) { $aRenderer->doc .= '<span lang="' . $aData[1] . '" xml:lang="' . $aData[1] . '">'; } else { $VALID = FALSE; } // if return TRUE; case DOKU_LEXER_UNMATCHED: $aRenderer->doc .= str_replace(array('&','<', '>'), array('&', '<', '>'), $aData[1]); return TRUE; case DOKU_LEXER_EXIT: if ($VALID) { $aRenderer->doc .= '</span>'; } else { $VALID = TRUE; } // if default: return TRUE; } // switch } // render() //@} } // class syntax_plugin_lang } // if //Setup VIM: ex: et ts=2 enc=utf-8 : ?>
2007-08-15:
* added GPL link and fixed some doc problems;
2007-01-05:
* minor internal changes (added comments, date updated);
2005-09-04:
+ initial release;
Matthias Watermann 2007-08-15
Hints, comments, suggestions …
Dosn't seem to work too well in Internet Explorer.
Word 2003 has an option to manually insert phonetics above specified words… I was wondering if it was possible to create a module or plugin for docuwiki that does the following for Koine-Greek: a) allows the user to upload a two column wordlist; first column source text, second column phonetic text. b) specify the fonts for the source and phonetic text. c) Have the docuwiki, automatically recognise the words from the source text on any text [as one types] and auto-insert and center the phonetic text ABOVE each (tagged) occurance…
An optional button to insert tags on selected text would be great also, not to mention unicode capability for the source text column, and the option to configure both language and fonts as per source text and phonetic output, if necessary Thanx a million…
Please contact keith (at) pm-intl (.) org Keith
See http://www.dokuwiki.org/wiki:bounties for such requests.
Suggestion: Add dir=“rtl” to span tag for RTL languages. It can possibly be determined by $lang['direction'] in lang.php of that language.
<lang ...>
markup in regard to the surrounding text and newlinesExcept where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported