====== DokuWiki UTF8 conversion ======
[[DokuWiki]] uses UTF-8 encoding for storing data since release 2005-02-06. This allows you to add all kind of languages to the same Wiki-Installation. This means if you upgrade from an older version you need to reencode your data files.
**If you are installing DokuWiki for the first time, you don't need to do anything** - DokuWiki will work out of the box.
You can either recode all existing pages yourself, eg. using [[man>iconv]] or [[man>recode]] or use the "UTF-8 conversion helper" described below.
If you do the conversion yourself, please note that DokuWiki stores filenames [[phpfn>rawurlencode|urlencoded]] so you may have to rename your files, too.
If after upgrading your installation to a UTF8 version of DokuWiki you find that the search function is very, very slow, even to the point where search results fail to show then check PHP has been compiled with //--enable-mbstring// (PHP 4.3.0+)
Teste de verificação
===== UTF-8 conversion helper =====
:!: **This script wasn't updated for a long time and is not compatible with newer DokuWiki releases** so it will not work out of the box anymore. You should have a look at the bash script below for an alternative way to upgrade old datafiles.
-- up to what version is it working?
The simplest way to upgrade your datafiles to UTF8 is to use the "dokuwiki-convert" script available at
http://www.splitbrain.org/Programming/PHP/DokuWiki/dokuwiki-convert.tgz.
The script will walk through your data directory and reencode all the files for you.
==== Usage ====
- Recommended: Deny writing for all users to your Wiki using the [[ACL]] feature or a [[http://httpd.apache.org/docs/howto/htaccess.html|.htaccess]] file
- create a Backup of all your files :!:
- upgrade your DokuWiki to the newest version [[install|as usual]]
- install dokuwiki-convert somewhere on your webserver ((you can put it as an additonal directory in your DokuWiki directory if you like))
- edit the ''dokuwiki-convert/index.php'' file
* You need to set the full filesystem path to your DokuWiki at the very top eg. ''/var/www/dokuwiki/''
- point your webbrowser to the dokuwiki-convert script
- choose your current file encoding
- hit the ''Do the conversion'' button
==== Additional Notes ====
* The script __does not__ convert your old revisions.
* You need to delete them, or convert them your self.
* The script __does not__ convert your changes.log.
* You need to delete them, or convert them your self.
* The script may timeout when running in safemode
* just rerun it multiple times until it says it has finished
* if it does not work for you, you need to do the conversion yourself
* For english wikis the script will skip a lot of files
* US-ASCII is a subset of UTF-8 so there is no need for converting these files
===== Sample Bash script for conversion with iconv =====
> The following code might be helpful in doing the conversion yourself with iconv. Besides converting the data dir, this script __does__ convert changes.log and the old revisions. Run this script from the data directory
#!/bin/bash
FROM=latin1
TO=utf8
ICONV="iconv -f $FROM -t $TO"
# Convert changes.log
cp changes.log changes.log.bak
$ICONV < changes.log.bak > changes.log
rm changes.log.bak
# Convert pages/ subdir
find pages/ -type f -name "*.txt" | while read fn; do
cp ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
rm ${fn}.bak
done
# Convert attic/ subdir (where the script assumes gzip compression)
find attic/ -type f -name "*.txt.gz" | while read fn; do
cp ${fn} ${fn}.bak
{ gzip -cd | $ICONV | gzip -c; } < ${fn}.bak > ${fn}
rm ${fn}.bak
done
> To use this script in WindowsXP Pro (or Windows 2000 Pro) with Cygwin, for ISO8859-15 (pt_PT), I had to change the first lines of the script to:
#!/bin/bash
FROM=ISO8859-15
TO=UTF-8
> Everything else remains the same, and the result of the execution was successful. I've been able to convert two entire DokuWiki-enabled sites in less than 5 minutes. I found out about the correct encodings after issuing the following command on a Cygwin-Bash Prompt:
iconv -l
> I have modified the script to keep the timestamps for the files in ''data/'' --- //[[andrea@gualano.net|Andrea]] 2005-11-04 11:57//
# Convert data/ subdir
find data/ -type f -name "*.txt" | while read fn; do
cp -p ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
touch -r ${fn}.bak ${fn}
rm ${fn}.bak
done
> I have modified it again to keep unmodified files with the same timestamps, for easier using with CVS, it looks for *.java and *.jsp files in the current directory and subdirs --- //[[fbotelho@stj.gov.br|Flavio]] 2008-01-29//
#!/bin/bash
FROM=cp1252
TO=utf8
ICONV="iconv -f $FROM -t $TO"
find . -type f -name "*.java" -or -name "*.jsp" | while read fn; do
cp ${fn} ${fn}.bak
touch -r ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
TEST=`cmp ${fn} ${fn}.bak`
if [ -z "$TEST" ]; then
touch -r ${fn}.bak ${fn}
else
echo MODIFIED - ${fn}
fi
rm ${fn}.bak
done
===== manual conversion with editpad lite =====
as i couldn't get the above scripts working, i converted my pages manually, using the ansi>utf-8 converter from [[http://www.editpadpro.com/editpadlite.html|edit pad]]