Keywords: UTF-8,romanize, cyrillic, latin, convert, filename
When upgrading from previous versions that did not yet have the “romanize” function, you will encounter a completely 'unreadable' directory structure.
For example : %D0%BA%D1%8B%D1%80%D0%B3%D1%8B%D0%B7%D1%81%D1%82%D0%B0%D0%BD.txt is the same as кыргызстан.txt
This is because UTF-8 filenames have been urlencoded.
In later versions, the “romanization” option has been added to circumvent this problem. 1)
The script below will convert this unreadable directory structure to “romanized” filenames.
You will have to include the UTF8.php file which is part of the dokuwiki installation.
Please note: this script is not error free: for example: there are some cyrillic characters that will end your filename with ”'”. Please check your pagestructure after conversion for invalid filenames.
I hope this will help someone. Any improvements welcome.
<?php include("utf8.php"); /** * Copy a file, or recursively copy a folder and its contents, and clean up the filenames according to the dokuwiki UTF-8 * * @original_author Aidan Lister <aidan@php.net> * @link http://aidanlister.com/repos/v/function.copyr.php * @param string $source Source path * @param string $dest Destination path * @return bool Returns TRUE on success, FALSE on failure */ function copyr($source, $dest) { $dest2=cleanID($dest); echo $source."->".$dest." ->$dest2<br/>"; // Simple copy for a file if (is_file($source)) { return copy($source, $dest2); } // Make destination directory if (!is_dir($dest)) { mkdir($dest2); } // Loop through the folder $dir = dir($source); while (false !== $entry = $dir->read()) { // Skip pointers if ($entry == '.' || $entry == '..') { continue; } // Deep copy directories if ($dest !== "$source/$entry") { copyr("$source/$entry", "$dest/$entry"); } } // Clean up $dir->close(); return true; } copyr("/dokuwiki/data/pages/","/dokuwiki/data/pagesnew/"); function cleanID($id,$ascii=false){ $id = trim(urldecode($id)); $id = utf8_strtolower($id); $id = utf8_romanize($id); utf8_deaccent($id,-1); $id = preg_replace('#\'+#','_',$id); return($id); } ?>
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported