CVE-2006-2783

Mozilla Firefox and Thunderbird before 1.5.0.4 strip the Unicode Byte-order-Mark (BOM) from a UTF-8 page before the page is passed to the parser, which allows remote attackers to conduct cross-site scripting (XSS) attacks via a BOM sequence in the middle of a dangerous tag such as SCRIPT.

Weakness

The product does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users.

Affected Software

Name	Vendor	Start Version	End Version
Firefox	Mozilla	*	1.5.0.3 (including)
Thunderbird	Mozilla	*	1.5.0.3 (including)
Red Hat Enterprise Linux 3	RedHat	seamonkey-0:1.0.2-0.1.0.EL3	*
Red Hat Enterprise Linux 4	RedHat	devhelp-0:0.10-0.2.el4	*
Red Hat Enterprise Linux 4	RedHat	seamonkey-0:1.0.3-0.el4.1	*
Red Hat Enterprise Linux 4	RedHat	firefox-0:1.5.0.5-0.el4.1	*
Red Hat Enterprise Linux 4	RedHat	thunderbird-0:1.5.0.5-0.el4.1	*
Red Hat Enterprise Linux AS (Advanced Server) version 2.1	RedHat		*
Red Hat Enterprise Linux ES version 2.1	RedHat		*
Red Hat Enterprise Linux WS version 2.1	RedHat		*
Red Hat Linux Advanced Workstation 2.1	RedHat		*
Firefox	Ubuntu	dapper	*
Firefox-granparadiso	Ubuntu	devel	*
Lightning-sunbird	Ubuntu	devel	*
Midbrowser	Ubuntu	devel	*
Mozilla-thunderbird	Ubuntu	dapper	*
Mozilla-thunderbird	Ubuntu	edgy	*
Mozilla-thunderbird	Ubuntu	feisty	*
Xulrunner	Ubuntu	devel	*
Xulrunner	Ubuntu	edgy	*
Xulrunner	Ubuntu	feisty	*

Extended Description

There are many variants of cross-site scripting, characterized by a variety of terms or involving different attack topologies. However, they all indicate the same fundamental weakness: improper neutralization of dangerous input between the adversary and a victim.

Potential Mitigations

Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid [REF-1482].
Examples of libraries and frameworks that make it easier to generate properly encoded output include Microsoft’s Anti-XSS library, the OWASP ESAPI Encoding module, and Apache Wicket.
Understand the context in which your data will be used and the encoding that will be expected. This is especially important when transmitting data between different components, or when generating outputs that can contain multiple encodings at the same time, such as web pages or multi-part mail messages. Study all expected communication protocols and data representations to determine the required encoding strategies.
For any data that will be output to another web page, especially any data that was received from external inputs, use the appropriate encoding on all non-alphanumeric characters.
Parts of the same output document may require different encodings, which will vary depending on whether the output is in the:
etc. Note that HTML Entity Encoding is only appropriate for the HTML body.
Consult the XSS Prevention Cheat Sheet [REF-724] for more details on the types of encoding and escaping that are needed.
Use and specify an output encoding that can be handled by the downstream component that is reading the output. Common encodings include ISO-8859-1, UTF-7, and UTF-8. When an encoding is not specified, a downstream component may choose a different encoding, either by assuming a default encoding or automatically inferring which encoding is being used, which can be erroneous. When the encodings are inconsistent, the downstream component might treat some character or byte sequences as special, even if they are not special in the original encoding. Attackers might then be able to exploit this discrepancy and conduct injection attacks; they even might be able to bypass protection mechanisms that assume the original encoding is also being used by the downstream component.
The problem of inconsistent output encodings often arises in web pages. If an encoding is not specified in an HTTP header, web browsers often guess about which encoding is being used. This can open up the browser to subtle XSS attacks.
Assume all input is malicious. Use an “accept known good” input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does.
When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, “boat” may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as “red” or “blue.”
Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code’s environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.
When dynamically constructing web pages, use stringent allowlists that limit the character set based on the expected value of the parameter in the request. All input should be validated and cleansed, not just parameters that the user is supposed to specify, but all data in the request, including hidden fields, cookies, headers, the URL itself, and so forth. A common mistake that leads to continuing XSS vulnerabilities is to validate only fields that are expected to be redisplayed by the site. It is common to see data from the request that is reflected by the application server or the application that the development team did not anticipate. Also, a field that is not currently reflected may be used by a future developer. Therefore, validating ALL parts of the HTTP request is recommended.
Note that proper output encoding, escaping, and quoting is the most effective solution for preventing XSS, although input validation may provide some defense-in-depth. This is because it effectively limits what will appear in output. Input validation will not always prevent XSS, especially if you are required to support free-form text fields that could contain arbitrary characters. For example, in a chat application, the heart emoticon ("<3") would likely pass the validation step, since it is commonly used. However, it cannot be directly inserted into the web page because it contains the “<” character, which would need to be escaped or otherwise handled. In this case, stripping the “<” might reduce the risk of XSS, but it would produce incorrect behavior because the emoticon would not be recorded. This might seem to be a minor inconvenience, but it would be more important in a mathematical forum that wants to represent inequalities.
Even if you make a mistake in your validation (such as forgetting one out of 100 input fields), appropriate encoding is still likely to protect you from injection-based attacks. As long as it is not done in isolation, input validation is still a useful technique, since it may significantly reduce your attack surface, allow you to detect some attacks, and provide other security benefits that proper encoding does not address.
Ensure that you perform input validation at well-defined interfaces within the application. This will help protect the application even if a component is reused or moved elsewhere.

NVD	https://nvd.nist.gov/vuln/detail/CVE-2006-2783
CWE	https://cwe.mitre.org/data/definitions/79.html

Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

Weakness

Affected Software

Extended Description

Potential Mitigations

References

CVE-2006-2783

Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

Weakness

Affected Software

Extended Description

Potential Mitigations

Related Attack Patterns

References