Is this just me, but I find the regexp used to detect windows share links quite complicated and not-working, espacially when used outside of link syntax.
I find usefull the fact that the string “\\server\share” (outside ”[ [ ] ]”) is automatically translated into a windows share link (like this: \\server\share ): it is a convenient “direct text-to-link translation”
But in this mode (direct text-to-link), it does not detect:
The regular expression used is defined in the new parser.php class, within Doku_Parser_Mode_WindowsShareLink class:
function preConnect() { $ltrs = '\w\$\s'; $gunk = '/\#~:+=&%@!\-'; $punc = ':?\-;,\\\\'; $host = $ltrs.$punc; $any = $ltrs.$gunk.$punc; $this->pattern = "[$gunk$punc\s]\\\\\\\\[$host]+?\\\\[$any]+?[$punc]*[^$any]"; }
Simplify the regexp:
function preConnect() { $end = '\s\*'; $this->pattern = "\\\\\\\\[^\\\\]+\\\\[^$end]+"; }
This regexp seems to works fine in my wiki website. (Warning: this regexp is incomplete: see next section 'Complete Regexp')
What do you think ? Could that regexp risk to detect other cases than windows share link ?
– Daniel Chaffiol 30/05/2005
There was some discussion about completely removing this feature. But you are right the regexp currently used is suboptimal and I like your approach.
I think spaces shouldn't get recognized - this may make more problems then it solves. Another thing is you can currently use a link in a sentence without adding punctuation to it. Eg. www.example.com, does not contain the comma in the link. This should be true for server links as well so the regexp needs to be little bit more complicated… A darcs patch would be welcome. — Andreas Gohr 2005-06-01 12:22
Warning
the regexp “\\\\\\\\[^\\\\]+\\\\[^$end]+”; does only detect '\\server\somechar', not '\\server\dir1\dir2\dir3 with space\…'
Why ?
Because the '[^\\\\]+\\\\' part of the regexp should be applied as much as possible to detect the full length of a windows share link
That is call a masque or a group (here a 'non-capturing group'): (?:[^\\\\]+\\\\)+
[ Bug 368 added ]
The function _getCompoundedRegex of Lexer.php builds the huge regexp which will parse the dokuwiki text.
That function automatically escape '(' and ')' (into '\(' and '\)')!!
That means one can not add a masque or group '(xxx)+', hoping to capture 'xxx' as much as possible, without actually trying to capture '\(xxx\)'!
In order to be able to introduce a non-capturing group in the pattern of Doku_Parser_Mode_WindowsShareLink (in parser.php), I changed Lexer.php, replacing:
$pattern = str_replace( array('/', '(', ')') array('\/', '\(', '\)), $pattern );
with
$pattern = str_replace( array('/', '(', ')', '<{<','>}>'), array('\/', '\(', '\)', '(?:', ')'), $pattern );
"\\\\\\\\<{<[^\\\\\"]+\\\\[^\\\\\"]>}>+[^$end]+"
The ”<{<[^\\\\\”]+\\\\[^\\\\\”]>}>+” will be translated by the getCompoundedRegex function of Lexer.php into '(?:[^\\\\\”]+\\\\[^\\\\\”])+', meaning:
try to capture as many time as possible:
any char different of '\' and '"', followed by a '\', followed by one char different from '\' and '\"
The complete regexp is abble to capture any windows share link throwed at it
$end = "\s\*;,!\\"; \\(?:[^\"]+\\[^\"])+[^\s\*;,!\\\"]+ or \\(?:[^\"]+\\[^\"])+[^$end]+
See for yourself:

(click on the picture to enlarge)
update 12 june 2005
Andi has just closed bug 368, and has fixed the windowssharelink regexp with the following:
function preConnect() { $this->pattern = "\\\\\\\\\w+?(?:\\\\[\w$]+)+"; }
That is much simpler than my proposition… and… it almost works: it only misses the spaces within an UNC windows share path.
It only take a little extra '\s' ([\w\s$] instead of [\w$]) to be perfect:
function preConnect() { $this->pattern = "\\\\\\\\\w+?(?:\\\\[\w\s$]+)+"; }
Adding spaces isn't a good idea. This will catch too much in most cases. If you want to link to a share with spaces use the squarebracket link syntax instead. If nobody disagrees, I'd like to remove this page. — Andreas Gohr 2005-06-13 19:05Adding space will not catch too much, and will give the exact same results than the one illustrated by the picture above (where the paths are followed by a ”.” or a “**” or a newline or … and so on). Spaces in windows path are a reality that much be taken into account, not ignored. As for the suggestion to use the squarebracket link syntax … well this all page is about refining an existing mechanism avoiding the use of that syntax! So: no squarebracket, spaces added please.
As for this page, I would like to have it around, since it can provide interesting illustrations about both an internal mechnism of Dokuwiki new lexer and some regexp samples. But it is your web site, so if you disagree with the instructive value of this page, feel free to delete it at any time. – Daniel Chaffiol 2005-06-14 09:05