====== Security Overview ======
In order to motivate a well-structured discussion on the issue of DokuWiki security, we wish to introduce a standard set of security related terms, a list of threats, and an analysis of risk and costs of each threat. Chris Lee is by no means a security expert and would encourage improvements upon this topic.
> I added some comments how a few of the mentioned issues are handled in DokuWiki. --- //[[andi@splitbrain.org|Andreas Gohr]] 2005-03-11 11:01//
> Thank so much for such a rapid reply. I'll continue to fill-out the text as originally planned, and I will try to address your comments as I update the corresponding sections. Most of my writing deals with general security systems and not specifically the current implementation of DokuWiki. This is done so that we can guide //future// implementation of the wiki. --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-11 16:48//
> Further reading is available at http://www.owasp.org , http://www.cgisecurity.com, and http://www.webappsec.org
===== Introduction =====
The internet is a wonderfully free yet hazardous place. An arms race between attackers and defenders is ensuing and the defenders are currently losing because of underestimation of the profitability of the attackers' attacks. It is imperative that system designers become aware of security threats and increase the cost for attackers. End users should not have to be security concious beyond perhaps some very simple basics. System architects should be doing the heavy-lifting in the security department but need help. This document is a rally cry to the software community to create an open-source, web-based authentication system that works and protects users' privacy((no, this is not a slam to Microsoft's Passport system... really!)).
===== Terms and Definitions =====
Security is generally expressed in terms of confidentiality, authenticity, integrity, and availability.
* **Confidentiality**: the concealment of information
* **Authenticity**: the identification and assurance of the origin of information
* **Integrity**: refers to the trustworthiness of data or resources in term of preventing improper and unauthorized changes
* **Availability**: refers to the ability to use the information or resource desired.
These attributes cover the needs of most security systems, but there is one more dimension that would be helpful to enumerate among these that is closely related to availability, //Pertinence//, which directly addresses the issue of WikiSpam. **Pertinence** is the attribute of being able to obtain the information desired without having to sort through unrelated/unwanted information.
A **threat** is a //potential// violation of security and an **attack** is any action that violates a security goal. The classes of attacks are as follows:
* Confidentiality => eavesdropping
* Integrity => message tampering
* Authenticity => fabrication
* Availability => DoS, server crash, deletion
* Pertinence => SPAM
To form an effective security implementation, first a **policy** needs to be stated that acurately describes the "who, what, when, how, and where" of accessing and modifying system objects (documents, messages, logs, and HTML pages). Then a set of effective mechanisms need to be employed to enforce the policy. Lastly, monitoring and response tools need to be enlisted to observe attacks and to recover from possible damages.
===== Threat Models =====
Web-based systems are under constant attack and suffer from the worst security measures of software systems. The attacks are numerous because of:
* traditional network attacks (bucket brigade, IP spoofing, DoS, replay attacks, password guessing, eavesdropping),
* improper checking of user inputs on web pages (buffer overflows, cross-site scripting, SQL injections, phishing),
* improper web-style security measures (cookie domain stealing, cookie hijacking, plain-text passwords),
* and the general verbosity of web servers and the HTTP 1.1 protocol that reveal too much configuration information (robots.txt, directory listings, version numbers, OS versions).
In this document, we intend on addressing the issue of secure authentication and access privileges.
> Most of the mentioned attacks are against the webserver(-configuration) or the network infrastructure itself and thus are out of scope for the security issues within DokuWiki.
> Security measures taken at the application layer (in this case the wiki) can avoid or diminish the effectiveness of most of these attacks. There are also PHP commands that can check the configuration of the web server that can in turn alert the administrator of possible security threats. One such example is ini_get('magic_quotes_gpc'). Using hashing, if done properly, can relieve the problem of bucket brigade and replay attacks. Checks on user inputs can avoid HTML injection leading people to place their login into a box that actually posts to a different webserver. Granted, there are many issues that *should* have been handled at lower layers of the protocol stack and makes security next to impossible at the application layer, but a concentrated effort should be made to provide security as needed by the policy in spite of the lower-level shortcomings --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-11 17:05//
==== Authentication Threats ====
Wiki-style collaborative web sites, such as the wonderful implementation of DokuWiki, need to have high availability for user input without impediments of strong authentication (digital certificates, smart cards, trusted PGP keys). This emphasis on openness and convenience often leads to abuse by spammers, hackers, and other undesirables (such as professors). To find the correct balance between security and availability requires evaluating risks and costs of different security levels. We will discuss the different authentication mechanisms commonly used and address their effectiveness, the cost for implementation, and the cost borne by the attacker to circumvent the security measure.
=== Username and Password ===
The most effective security measure for the costs involved is the username and password. However, password guessing techniques prove very, very effective in cracking above ninety percent of passwords (even of people who believe they are so 31337). Additional pitfalls include accidental divulging of passphrases (sticky-note attack), reuse of password on other sites (oh, so you are loser@yahoo.com and your passphase is 's1mp1is7ic' on my site?, let me try logging into Yahoo! Mail as you with the same password you gave me), reading of email requests for passwords, and reading of passwords locally on the user's machine.
Another common point of attack is the server-side user database. If passwords are stored in plain-text, then any administrator can view the stored password and use the //password reuse// attack, and if the database becomes compromised then all of the passwords are immediately compromised.
Hashes, such as MD5 and SHA1, of passwords produce much harder outputs for attackers to derive the password, however if used improperly are useless. If the hash is used as the password, then it is no different than a plaintext password, it actually harms the legitimate user more because of repeated calculations of the hash upon their password. Also if passwords are hash on the server side and thus the user sends their plaintext password, (ignoring eavesdropping and bucket brigade attacks) then reverse-dictionary-mapping attacks can be used where the attacker has already created a database of plaintext to MD5 (for example) mappings indexed by the MD5 sum and then takes the MD5 sum from the website's user databases and finds the plaintext from the MD5 sum . The cost of the reverse-dictionary-mapping attack is one the order of a couple hundred dollars (i.e. very cheap in comparison to multi-billion dollar attacks for Rivest, Shamir, Adleman (RSA)). This attacker cost can be made dramatically more expensive with relatively low expense on part of the implementor by adding a long seed value that is prepended to the passphrase before hashing. This makes the attacker go through 264 operations for DES and 280 for SHA1 to find a hashing collision (unless they use the compressor vulnerability which saves them about 20% of the guessing space). This would still take multiple millions of dollars and several weeks per password.
To resolve the eavesdropping (and thus replay) and bucket-brigade attacks, the webserver can offer a challenge (ala Yahoo! Mail's login page), have javascript to compute a hash, and submit the answer back to the server. As long as the challenge never repeats, there is no possibiliy of replay. However, the attacker can continue to reload the page until the challenge repeats, unless the server persists in generating the same challenge until it is successfully used for login.
Save for administrative accounts, accidental password exposure has fairly low risk for the security of a collaborative web site, but it is common for people to use tools in ways they were never meant to be deployed. It is wise to integrate strong security measures as long as they do not prohibit people from utilizing the service.
=== Cookies ===
Since the HTTP protocol is **sessionless**, i.e. it does not maintain a connection that can be used for two-way communication, to keep state, information must either be stored at the client and resent with each request or the state information can be stored on the server and indexed by a **[[wp>HTTP_cookie]]** presented by the client. In general, it is a bad idea to allow the client to maintain and convey the state information for technical implementation and security reasons.
Cookies are commonly used to maintain **HTTP Sessions**, henceforth called //sessions//. With the widespread use of PHP, many websites are creating sessions and issuing cookies even when there is no need to maintain state. This is turn has increased the utilization of cookie blockers which often blocks necessary cookies. In quite a few cases, the users can never figure out which domains to allow to utilize the service (as in MSN Hotmail) and the "please turn on your cookies" page doesn't ever specify with domains to allow (probably because they are in bed with the advertisers). That being said, cookies are a wonderful technology if used properly.
Cookies are set in two ways: javascript and by the server (in the HTTP headers). The implementation of creating cookies and sessions has been greatly simplified in recent years and can be done in only a few lines of code. The cookie contains a domain, an expiration, and up to 4kB of data. The cookie is transmitted by the client in each request it makes to the server. This is better than basic http authentication in that if someone eavesdrops the cookie data, then they can only hijack that specific session (even if it lasts an entire year). On the other hand if they overheard the basic authentication, then they would be able to repeatedly log in until the password was changed.
An alternative to cookies is to embed a challenge inside the webpage that the user would use in the next page request to the server. This has a high implementation cost, but if used with other mechanisms, it can provide a resilent authentication protocol for plaintext communication.
== Cookie Problems ==
In general a cookie can only be set by a member of the domain that the cookie is generated for. This means that ''foo.bar.org'' can read and write cookies with the domain set to ''foo.bar.org'' or ''bar.org'', but not for ''aol.com''. If there the domain was set to ''bar.org'' and the user surfed to ''cheers.bar.org'', then ''cheers.bar.org'' can read the cookie and hijack the users session with ''foo.bar.org''.
> :!: This has to be checked. DokuWiki should limit the cookie to the BaseURL - I'm not sure if this is done already.
Cookies also expire or could be deleted requiring the user to reauthenticate with the server. In general this is a great idea, but repeated authentications, even with some security mechanisms in place, can cause information leakage for an attacker to use against the user. This issue is easily solved in various ways.
> DokuWiki cookies use session lifetime by default, but can be made permanent (one year) by clicking the "remember me" checkbox.
> This actually makes the system security weaker since the user is now vulnerable to local attacks (someone walking up to their computer) and to remote attacks (session hijacking for the duration of the session). The cookie could be encrypted with a different challenge each time it is sent to prevent hijacking, but reestablishment after the user closes their browser becomes technically difficult. --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-11 19:13//
The data inside the cookie is a key to identify the session. If the key is easily guessable, an attacker can sequentially iterate through the guesses until a session with desire privileges is found. If the key is both long and random, then guessing becomes very costly for the attacker.
> PHP sessions hijacking or forgery does not give you any higher privileges in DokuWiki. (You could see someone elses [[wiki:breadcrumbs]])
> But if the //right// session is stolen, especially if administration is done through the same interface and not through the shell, then an incredible amount of damage can occur before the attacker is caught. --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-11 19:24//
Combining source IP address verification with a session can greatly aid in limiting the attack vectors that an attacker could exploit. However, many users have rapid IP address changes (DSL or wireless) and would have to initiate a new login on each change.
> Combining cookies with a IP isn't the best way either - This locks out users of common proxies (AOL). DokuWiki combines cookies with a pseudo UID generated from Browser Headers - not perfect but better than nothing. Cookies themselfs are encrypted with Blowfish.
>> The script may check first 24 bits of IPv4 address only (xx.yy.zz.*). This should alleviate problems with proxies and dynamic IPs while retaining some reliance on IP security. This approach is used in phpBB.
> I'm not familiar with AOL proxies, but I would welcome a discussion on how to work with them. I would need to further research your implementation to comment on it. Could you describe how you set the passphrase for the cookie? The code looks quite interesting, and it is obvious that you've done a lot of work on trying to secure the cookie data. --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-11 19:24//
=== IP Address Verification ===
> Many of the following topics may not relate to the needs of DokuWiki, but are included for completeness, such that other systems can be designed around this work.
> Could someone check my "facts" on the following three paragraphs, please?
IP address-based verification suffers from rapidly changing IP addresses, multiplexed clients into one IP address either by proxies or network address translation (NAT), and IP address spoofing. As mentioned above, IP address verification has issues with rapidly changing IP addresses and large aggragate web proxies. IP address ranges are purchased in bulk by internet service providers (ISP). Then when a user signs into the ISP, the ISP will "lease" that IP address to the user for a limited time. Depending on the needs of the ISP, the lease may be several minutes to usually up to one day. Cable companies often continuously renew the same IP address each day. DSL companies often change the IP after a few short hours. Modem connections keep their assigned IP address throughout the connection, and are often given an IP address from a private address space (e.g. 192.168.0.0/16).
> Actually, I have a scenario where I would like to allow world-wide viewing, but editing only from my local subnet. Is there an easy solution for this? --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-21 01:30//
== Web Proxies ==
Web proxies receive requests from webpages from users and then download the webpage from the desired webserver on behalf of the user. It then delivers the webpage back to the user. Proxies give two security benefits that are worth mentioning; first, they hide the original IP of the viewer of the website, and secondly, they allow firewalling of all inbound and outbound connections except to the proxy itself. Additional "features" are logging, caching, and content-filtering. Proxies do suffer from performance issues in most instances and ofter more frustration to the administrator and end users.
== Network Address Translation (NAT) ==
NAT is a useful tool for connecting one network to another through only one addressible IP address. It works by intercepting all outgoing IP packets, creating an internal IP and port mapping, if one doesn't exist, to an external IP and port, and then rewriting the packet with the external mapping. When a packet comes into the NAT from the external network, it checks for a reverse mapping, and if one exists, it will rewrite the packet with the internal parameters and deliver it to the end client. This has much the same benefits as a web proxy, but tends to be easier for end clients to use (almost transparent for "normal" usage).
== DNS Poisoning ==
A domain name server (DNS) answers queries for mappings between hosts names (e.g. wiki.splitbrain.org) and IP addresses. When a user wishes to visit a website and types in the URL (e.g. %%http://wiki.splitbrain.org/doku.php%%), the web browser first tries to determine the IP address of wiki.splitbrain.org. It does this by checking its own memory, then, failing that, will contact the user's primary DNS server. From here, it becomes complex because there are two normal behaviors that can exist if the primary DNS server does not have a mapping for the host name: either it can work behind the scenes and ask other DNS servers, or it can tell the user's computer the IP address of another DNS server that should have a better clue on how to find the mapping. I will have to leave the rest of the details upon the reader to investigate. (try howstuffworks.com).
DNS poisoning is a much lesser problem than in previous years due to better security on the DNS servers (sometimes we do learn from our mistakes). Poisoning refers to the changing of the mappings in the DNS server's database that would then give an incorrect answer to user's queries. For example, if an attacker can convience a DNS server that it is PsiberBank.com (fictional), then customers of that bank might be redirected to the attacker's webserver. The attacker can then mimick or proxy the real PsiberBank.com while viewing all the information that the users sends and receives.
== ARP Poisoning ==
There are many types of networking technologies that support different networking topologies. A **hub** is a device that connects all networked devices together such that they can overhear each other (putting everyone in the same room within earshot). A **switch** acts more like a mail carrier in that it only delivers the packet to the recipient(s) of the local network. A **router** is typically a switch, but connects two networks: an internal (local) network and an external network.
When two IP-based devices want to communicate with each other on the same local network, they must find the mapping between their IP addresses and their Media Access Control (MAC) address. These mappings are formed by hosts broadcasting an ARP request with a specified IP, and either the router or the end host with the specified IP will reply with an ARP response that contains the end host's MAC address. (A MAC address is also known as an ethernet address because ethernet is the most common type of networking device. It is also sometimes called the hardware address.)
ARP poisoning is done by an attacker continuously broadcasting their MAC address as the mapping for the IP address(es) they want to spoof. Commonly the attacker will spoof the router's address in attempt to overhear each packet before resending it to the actual router (and therefore keep the user ignorant of the eavesdropping). This is a highly effective attack if an attacker can gain access to a device on a network because most networks trust the devices on their network. The only way to avoid this attack is to hard-code all IP to MAC addresses for every device on the network in every device on the network and then ignore or drop all ARP replies (a highly-inflexible solution).
=== Server Side SSL/TLS ===
Without going into details of how SSL works, it is sufficient to say that it is a good technology for allowing private (i.e. encrypted) communication through the internet. However, the best privacy is useless if you end up telling your secrets to the wrong person. The solution is to have the server you are contacting provide some form of proof that they are who they say they are. This is done through SSL certificates. The trick here is that there has to be someone who you trust (certificate authorities) who can verify the server for you. A company with a webserver (e.g. WebCo) will contact a certificate authority (CA) (e.g. CertsRus) to obtain a certificate. CertsRus will then do a lot of paperwork and verification to make sure that WebCo is who they say they are, and that their certificate they want to use, accurately reflects their webserver. At that point, CertsRus will digitally sign the certificate so that everyone who trusts CertsRus can verify WebCo's certificate. This process close to one thousand U.S. dollars because of the verification effort required and the maintaining of various services, and thus private individuals can rarely afford to obtain a certificate.
=== Client Side Digitial Certificates ===
> This section needs some better wording and some expanding for completeness. I'm running out of steam in my writing. --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-21 02:22//
Client side certificates can be done cheaply if the server creates its own certificate authority (since it can trust its own CA) and then generates certificates for each user of the system. The user's certificate would have to reside on the user's computer (or memory stick, whatever) and then be presented to the webserver on each connection to establish the SSL connection. This is incredibly secure, but is equally difficult to administer. The server must keep track of each user to certificate mapping including which certificates have been compromised or lost. The server could issue each legitimate user the same user certificate and use username/passwords to authenticate each user, but if one certificate gets compromised, they ALL get compromised. This solution is usually only feasible in government installations and small hacker groups.
==== Confidentiality Threats ====
When using a Wiki for project coordination, it is sometimes desirable to hide your project ideas from outsiders until you get the patent, publish your research, or get that A+ in class. In these cases (among many others), it is important to have a flexible but solid framework that allows different users different levels of access to the system. It is also important to have a good way to testing the implementation of these access levels, which is a feature lacking in most security systems (including DokuWiki).
It is also desirable that if the webserver is compromised such that an attacker can read the data on the hard drive, that thay cannot read private information. Usually the costs of implementing this greatly outweigh the //risk// of compromise and the //damage// associated with the loss of confidentiality. This is a good research area for those who would like to find a "cheap" way of securing data in spite of server compromise. (Just encrypting the data on the hard drive and having the server decrypt it before sending it to the user is not sufficient.)
> DokuWiki supports hiding information through [[wiki:ACLs]]. So yes there may be private pages in a wiki. Possible attacks that could reveal such information could be:
* Attack the authentication eg. get higher permissions
* Circumvent the ACL logic eg. by providing bogus data and hope for a bug in the code
> Excellent points. I've been working with the ACLs on my wiki and am very happy with it. I would like to see the final kinks worked out and a better testing environment. I was thinking of simply listing each page of the system on the left-side of a matrix, with each user class on the top, and have each cell "spell" out the rights of each user class on each page. Once again, I would suggest a view only interface. --- //[[dokuwiki@chrislee.dhs.org|Chris Lee]] 2005-03-21 02:36//
===== Conclusion =====
===== Proposed Architecture =====
Let us place the architecture on a separate page so that it can stand on its own.
* [[wiki:discussion:security:WSA|Web Security Architecture]]
===== Open Discussion =====
DokuWiki readers may want to receive a free trial web security test on their websites and web applications and to receive a free vulnerability assessment report. [[https://www.gamasec.com/gsf/FreeTrial.aspx| Free Web Site Security Test]]
==== Captcha ====
Could Dokuwiki have a setting 0/1 to enable captchas to make sure it isn't a script updating pages?
Or maybe at least at registr... Wait a minute, there is a [[plugin:captcha|plugin]] for that! --- //[[t@nospamtulpe-mediadot.de|Th.S]] 2007-10-07//
==== Installation ====
Would it be possible to separate the DokuWIki installation into a webroot and other directories ? The 'default' installation is now not secure, since some directories are exposed.
Although a .htaccess file is used, these will not work when IIS is used as webserver.\\ --- //charlieMOGUL 20-12-2006//
> see [[http://wiki.splitbrain.org/wiki:security]] --- //[[chris@jalakai.co.uk|Christopher Smith]] 2006-12-20 18:57//
Sorry, I wasn't making myself clear. I mean that the **default** installation should be secure. Now someone has to secure the installation, which probably not many administrators/users do. It would be better if the **default** installation was secure, so that the product as a whole will be more secure out-of-the-box \\
--- //charlieMOGUL 20-12-2006//
> Debian, for example, already does some desired directory layout changes. However the individual SysAdmin/Webmaster would still need to generate the OpenSSL certificates that make the https/443 secure login portion possible. --- //[[http://insecurity.org/|-Sx-]] Oct 8, 2007 2:15PM //
==== Force new password ====
To enhance password security it would be very good to force users to create new passwords in certain situations.
- New users who were created by the admin should have to provide a password of their own when they login for the first time.
- Users who got their password emailed on their own request should have to create a new password on their next login
- All users may be forced to create a new password
* after a certain time (like every year, every month, or something)
* after a certain amount of logins?
* on request of the admin (after he recognized attempts to hack the wiki for example)
- Some specific user may be requested to provide a new password by the admin, because he's is the BOFH he recognized that they are using a weak or easily guessable password.
I would suggest a simple boolean value //force_new// associated with each user. This could be available via checkbox in the user-settings for the admin. Additionally it could be set when the password has been sent out via mail (see points 1. und 2.). Further options (see point 3.) could the be implemented via third party plugins, if needed at all.\\ --- //[[t@nospamtulpe-mediadot.de|Th.S]] 2007-10-07//