>Ok, standarden säger: > Octets must be encoded if they have no corresponding graphic > character within the US-ASCII coded character set, if the use of the > corresponding character is unsafe, or if the corresponding character > is reserved for some other interpretation within the particular URL > scheme. > > No corresponding graphic US-ASCII: > > URLs are written only with the graphic printable characters of the > US-ASCII coded character set. The octets 80-FF hexadecimal are not > used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent > control characters; these must be encoded. > >Jag kan inte få detta till att "å" (som inte finns i US-ASCII) är OK i >en URL. Läs då detta, som är ur *standarderna* HTML/3.0 och HTTP/1.0: HTML/3.0: Character sets The charset parameter (as defined in section 7.1.1 of RFC 1521) may be used with the text/html content type to specify the encoding used to represent the HTML document as a sequence of bytes. Normally, text/* media types specify a default of US-ASCII for the charset parameter. However, for text/html, if * the byte stream contains data that is not in the 7-bit US-ASCII * set, the HTML interpreting agent should assume a default * charset of ISO-8859-1. HTTP/1.0 (<http://www.w3.org/pub/WWW/Protocols/rfc1945/rfc1945>): 3.2.1 General Syntax URIs in HTTP/1.0 can be represented in absolute form or relative to some known base URI [9], depending upon the context of their use. The two forms are differentiated by the fact that absolute URIs always begin with a scheme name followed by a colon. URI = ( absoluteURI | relativeURI ) [ "#" fragment ] absoluteURI = scheme ":" *( uchar | reserved ) relativeURI = net_path | abs_path | rel_path net_path = "//" net_loc [ abs_path ] abs_path = "/" rel_path rel_path = [ path ] [ ";" params ] [ "?" query ] path = fsegment *( "/" segment ) fsegment = 1*pchar segment = *pchar params = param *( ";" param ) param = *( pchar | "/" ) scheme = 1*( ALPHA | DIGIT | "+" | "-" | "." ) net_loc = *( pchar | ";" | "?" ) query = *( uchar | reserved ) fragment = *( uchar | reserved ) pchar = uchar | ":" | "@" | "&" | "=" uchar = unreserved | escape unreserved = ALPHA | DIGIT | safe | extra | national escape = "%" hex hex hex = "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" | DIGIT reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" safe = "$" | "-" | "_" | "." | "+" extra = "!" | "*" | "'" | "(" | ")" | "," * national = <any OCTET excluding CTLs, SP, * ALPHA, DIGIT, reserved, safe, and extra> * For definitive information on URL syntax and semantics, see RFC 1738 * [4] and RFC 1808 [9]. The BNF above includes national characters not * allowed in valid URLs as specified by RFC 1738, since HTTP servers are * not restricted in the set of unreserved characters allowed to * represent the rel_path part of addresses, and HTTP proxies may receive * requests for URIs not defined by RFC 1738. Läs särskilt de markerade raderna. De visar solklart att ISO 8859-1-tecken som å, ä, ö, ~ etc. är tillåtna i URL:ar, eftersom de hör till gruppen "national" och inte till "reserved". Det står också uttryckligen att detta skiljer sig från hur RFC 1738 definierar URL:ar. Vidare, i 1.2.1 i HTML-standarden (se ovan), sägs följande: "/.../ For example, the value of the HREF attribute of the <A> element must conform to the URI syntax." URI-standarden har inte heller något emot 8859-1. Således, en URI av typen <http://www.lysator.liu.se/åttabitars/> är ingen URL enligt RFC 1738, men väl en acceptabel adress i en A-tag i ett HTML-dokument. Att Netscape då inte klarar det är en klar bugg. /tg
Arkiv genererat av hypermail 2.1.1.