1117 lines
43 KiB
Python
1117 lines
43 KiB
Python
# (c) 2005 Ian Bicking and contributors; written for Paste (http://pythonpaste.org)
|
|
# Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
|
|
# (c) 2005 Ian Bicking, Clark C. Evans and contributors
|
|
# This module is part of the Python Paste Project and is released under
|
|
# the MIT License: http://www.opensource.org/licenses/mit-license.php
|
|
# Some of this code was funded by: http://prometheusresearch.com
|
|
"""
|
|
HTTP Message Header Fields (see RFC 4229)
|
|
|
|
This contains general support for HTTP/1.1 message headers [1]_ in a
|
|
manner that supports WSGI ``environ`` [2]_ and ``response_headers``
|
|
[3]_. Specifically, this module defines a ``HTTPHeader`` class whose
|
|
instances correspond to field-name items. The actual field-content for
|
|
the message-header is stored in the appropriate WSGI collection (either
|
|
the ``environ`` for requests, or ``response_headers`` for responses).
|
|
|
|
Each ``HTTPHeader`` instance is a callable (defining ``__call__``)
|
|
that takes one of the following:
|
|
|
|
- an ``environ`` dictionary, returning the corresponding header
|
|
value by according to the WSGI's ``HTTP_`` prefix mechanism, e.g.,
|
|
``USER_AGENT(environ)`` returns ``environ.get('HTTP_USER_AGENT')``
|
|
|
|
- a ``response_headers`` list, giving a comma-delimited string for
|
|
each corresponding ``header_value`` tuple entries (see below).
|
|
|
|
- a sequence of string ``*args`` that are comma-delimited into
|
|
a single string value: ``CONTENT_TYPE("text/html","text/plain")``
|
|
returns ``"text/html, text/plain"``
|
|
|
|
- a set of ``**kwargs`` keyword arguments that are used to create
|
|
a header value, in a manner dependent upon the particular header in
|
|
question (to make value construction easier and error-free):
|
|
``CONTENT_DISPOSITION(max_age=CONTENT_DISPOSITION.ONEWEEK)``
|
|
returns ``"public, max-age=60480"``
|
|
|
|
Each ``HTTPHeader`` instance also provides several methods to act on
|
|
a WSGI collection, for removing and setting header values.
|
|
|
|
``delete(collection)``
|
|
|
|
This method removes all entries of the corresponding header from
|
|
the given collection (``environ`` or ``response_headers``), e.g.,
|
|
``USER_AGENT.delete(environ)`` deletes the 'HTTP_USER_AGENT' entry
|
|
from the ``environ``.
|
|
|
|
``update(collection, *args, **kwargs)``
|
|
|
|
This method does an in-place replacement of the given header entry,
|
|
for example: ``CONTENT_LENGTH(response_headers,len(body))``
|
|
|
|
The first argument is a valid ``environ`` dictionary or
|
|
``response_headers`` list; remaining arguments are passed on to
|
|
``__call__(*args, **kwargs)`` for value construction.
|
|
|
|
``apply(collection, **kwargs)``
|
|
|
|
This method is similar to update, only that it may affect other
|
|
headers. For example, according to recommendations in RFC 2616,
|
|
certain Cache-Control configurations should also set the
|
|
``Expires`` header for HTTP/1.0 clients. By default, ``apply()``
|
|
is simply ``update()`` but limited to keyword arguments.
|
|
|
|
This particular approach to managing headers within a WSGI collection
|
|
has several advantages:
|
|
|
|
1. Typos in the header name are easily detected since they become a
|
|
``NameError`` when executed. The approach of using header strings
|
|
directly can be problematic; for example, the following should
|
|
return ``None`` : ``environ.get("HTTP_ACCEPT_LANGUAGES")``
|
|
|
|
2. For specific headers with validation, using ``__call__`` will
|
|
result in an automatic header value check. For example, the
|
|
_ContentDisposition header will reject a value having ``maxage``
|
|
or ``max_age`` (the appropriate parameter is ``max-age`` ).
|
|
|
|
3. When appending/replacing headers, the field-name has the suggested
|
|
RFC capitalization (e.g. ``Content-Type`` or ``ETag``) for
|
|
user-agents that incorrectly use case-sensitive matches.
|
|
|
|
4. Some headers (such as ``Content-Type``) are 0, that is,
|
|
only one entry of this type may occur in a given set of
|
|
``response_headers``. This module knows about those cases and
|
|
enforces this cardinality constraint.
|
|
|
|
5. The exact details of WSGI header management are abstracted so
|
|
the programmer need not worry about operational differences
|
|
between ``environ`` dictionary or ``response_headers`` list.
|
|
|
|
6. Sorting of ``HTTPHeaders`` is done following the RFC suggestion
|
|
that general-headers come first, followed by request and response
|
|
headers, and finishing with entity-headers.
|
|
|
|
7. Special care is given to exceptional cases such as Set-Cookie
|
|
which violates the RFC's recommendation about combining header
|
|
content into a single entry using comma separation.
|
|
|
|
A particular difficulty with HTTP message headers is a categorization
|
|
of sorts as described in section 4.2:
|
|
|
|
Multiple message-header fields with the same field-name MAY be
|
|
present in a message if and only if the entire field-value for
|
|
that header field is defined as a comma-separated list [i.e.,
|
|
#(values)]. It MUST be possible to combine the multiple header
|
|
fields into one "field-name: field-value" pair, without changing
|
|
the semantics of the message, by appending each subsequent
|
|
field-value to the first, each separated by a comma.
|
|
|
|
This creates three fundamentally different kinds of headers:
|
|
|
|
- Those that do not have a #(values) production, and hence are
|
|
singular and may only occur once in a set of response fields;
|
|
this case is handled by the ``_SingleValueHeader`` subclass.
|
|
|
|
- Those which have the #(values) production and follow the
|
|
combining rule outlined above; our ``_MultiValueHeader`` case.
|
|
|
|
- Those which are multi-valued, but cannot be combined (such as the
|
|
``Set-Cookie`` header due to its ``Expires`` parameter); or where
|
|
combining them into a single header entry would cause common
|
|
user-agents to fail (``WWW-Authenticate``, ``Warning``) since
|
|
they fail to handle dates even when properly quoted. This case
|
|
is handled by ``_MultiEntryHeader``.
|
|
|
|
Since this project does not have time to provide rigorous support
|
|
and validation for all headers, it does a basic construction of
|
|
headers listed in RFC 2616 (plus a few others) so that they can
|
|
be obtained by simply doing ``from paste.httpheaders import *``;
|
|
the name of the header instance is the "common name" less any
|
|
dashes to give CamelCase style names.
|
|
|
|
.. [1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
|
|
.. [2] http://www.python.org/peps/pep-0333.html#environ-variables
|
|
.. [3] http://www.python.org/peps/pep-0333.html#the-start-response-callable
|
|
|
|
"""
|
|
import mimetypes
|
|
import six
|
|
from time import time as now
|
|
try:
|
|
# Python 3
|
|
from email.utils import formatdate, parsedate_tz, mktime_tz
|
|
from urllib.request import AbstractDigestAuthHandler, parse_keqv_list, parse_http_list
|
|
except ImportError:
|
|
# Python 2
|
|
from rfc822 import formatdate, parsedate_tz, mktime_tz
|
|
from urllib2 import AbstractDigestAuthHandler, parse_keqv_list, parse_http_list
|
|
|
|
from .httpexceptions import HTTPBadRequest
|
|
|
|
__all__ = ['get_header', 'list_headers', 'normalize_headers',
|
|
'HTTPHeader', 'EnvironVariable' ]
|
|
|
|
class EnvironVariable(str):
|
|
"""
|
|
a CGI ``environ`` variable as described by WSGI
|
|
|
|
This is a helper object so that standard WSGI ``environ`` variables
|
|
can be extracted w/o syntax error possibility.
|
|
"""
|
|
def __call__(self, environ):
|
|
return environ.get(self,'')
|
|
def __repr__(self):
|
|
return '<EnvironVariable %s>' % self
|
|
def update(self, environ, value):
|
|
environ[self] = value
|
|
REMOTE_USER = EnvironVariable("REMOTE_USER")
|
|
REMOTE_SESSION = EnvironVariable("REMOTE_SESSION")
|
|
AUTH_TYPE = EnvironVariable("AUTH_TYPE")
|
|
REQUEST_METHOD = EnvironVariable("REQUEST_METHOD")
|
|
SCRIPT_NAME = EnvironVariable("SCRIPT_NAME")
|
|
PATH_INFO = EnvironVariable("PATH_INFO")
|
|
|
|
for _name, _obj in six.iteritems(dict(globals())):
|
|
if isinstance(_obj, EnvironVariable):
|
|
__all__.append(_name)
|
|
|
|
_headers = {}
|
|
|
|
class HTTPHeader(object):
|
|
"""
|
|
an HTTP header
|
|
|
|
HTTPHeader instances represent a particular ``field-name`` of an
|
|
HTTP message header. They do not hold a field-value, but instead
|
|
provide operations that work on is corresponding values. Storage
|
|
of the actual field values is done with WSGI ``environ`` or
|
|
``response_headers`` as appropriate. Typically, a sub-classes that
|
|
represent a specific HTTP header, such as _ContentDisposition, are
|
|
0. Once constructed the HTTPHeader instances themselves
|
|
are immutable and stateless.
|
|
|
|
For purposes of documentation a "container" refers to either a
|
|
WSGI ``environ`` dictionary, or a ``response_headers`` list.
|
|
|
|
Member variables (and correspondingly constructor arguments).
|
|
|
|
``name``
|
|
|
|
the ``field-name`` of the header, in "common form"
|
|
as presented in RFC 2616; e.g. 'Content-Type'
|
|
|
|
``category``
|
|
|
|
one of 'general', 'request', 'response', or 'entity'
|
|
|
|
``version``
|
|
|
|
version of HTTP (informational) with which the header should
|
|
be recognized
|
|
|
|
``sort_order``
|
|
|
|
sorting order to be applied before sorting on
|
|
field-name when ordering headers in a response
|
|
|
|
Special Methods:
|
|
|
|
``__call__``
|
|
|
|
The primary method of the HTTPHeader instance is to make
|
|
it a callable, it takes either a collection, a string value,
|
|
or keyword arguments and attempts to find/construct a valid
|
|
field-value
|
|
|
|
``__lt__``
|
|
|
|
This method is used so that HTTPHeader objects can be
|
|
sorted in a manner suggested by RFC 2616.
|
|
|
|
``__str__``
|
|
|
|
The string-value for instances of this class is
|
|
the ``field-name``.
|
|
|
|
Primary Methods:
|
|
|
|
``delete()``
|
|
|
|
remove the all occurrences (if any) of the given
|
|
header in the collection provided
|
|
|
|
``update()``
|
|
|
|
replaces (if they exist) all field-value items
|
|
in the given collection with the value provided
|
|
|
|
``tuples()``
|
|
|
|
returns a set of (field-name, field-value) tuples
|
|
5 for extending ``response_headers``
|
|
|
|
Custom Methods (these may not be implemented):
|
|
|
|
``apply()``
|
|
|
|
similar to ``update``, but with two differences; first,
|
|
only keyword arguments can be used, and second, specific
|
|
sub-classes may introduce side-effects
|
|
|
|
``parse()``
|
|
|
|
converts a string value of the header into a more usable
|
|
form, such as time in seconds for a date header, etc.
|
|
|
|
The collected versions of initialized header instances are immediately
|
|
registered and accessible through the ``get_header`` function. Do not
|
|
inherit from this directly, use one of ``_SingleValueHeader``,
|
|
``_MultiValueHeader``, or ``_MultiEntryHeader`` as appropriate.
|
|
"""
|
|
|
|
#
|
|
# Things which can be customized
|
|
#
|
|
version = '1.1'
|
|
category = 'general'
|
|
reference = ''
|
|
extensions = {}
|
|
|
|
def compose(self, **kwargs):
|
|
"""
|
|
build header value from keyword arguments
|
|
|
|
This method is used to build the corresponding header value when
|
|
keyword arguments (or no arguments) were provided. The result
|
|
should be a sequence of values. For example, the ``Expires``
|
|
header takes a keyword argument ``time`` (e.g. time.time()) from
|
|
which it returns a the corresponding date.
|
|
"""
|
|
raise NotImplementedError()
|
|
|
|
def parse(self, *args, **kwargs):
|
|
"""
|
|
convert raw header value into more usable form
|
|
|
|
This method invokes ``values()`` with the arguments provided,
|
|
parses the header results, and then returns a header-specific
|
|
data structure corresponding to the header. For example, the
|
|
``Expires`` header returns seconds (as returned by time.time())
|
|
"""
|
|
raise NotImplementedError()
|
|
|
|
def apply(self, collection, **kwargs):
|
|
"""
|
|
update the collection /w header value (may have side effects)
|
|
|
|
This method is similar to ``update`` only that usage may result
|
|
in other headers being changed as recommended by the corresponding
|
|
specification. The return value is defined by the particular
|
|
sub-class. For example, the ``_CacheControl.apply()`` sets the
|
|
``Expires`` header in addition to its normal behavior.
|
|
"""
|
|
self.update(collection, **kwargs)
|
|
|
|
#
|
|
# Things which are standardized (mostly)
|
|
#
|
|
def __new__(cls, name, category=None, reference=None, version=None):
|
|
"""
|
|
construct a new ``HTTPHeader`` instance
|
|
|
|
We use the ``__new__`` operator to ensure that only one
|
|
``HTTPHeader`` instance exists for each field-name, and to
|
|
register the header so that it can be found/enumerated.
|
|
"""
|
|
self = get_header(name, raiseError=False)
|
|
if self:
|
|
# Allow the registration to happen again, but assert
|
|
# that everything is identical.
|
|
assert self.name == name, \
|
|
"duplicate registration with different capitalization"
|
|
assert self.category == category, \
|
|
"duplicate registration with different category"
|
|
assert cls == self.__class__, \
|
|
"duplicate registration with different class"
|
|
return self
|
|
|
|
self = object.__new__(cls)
|
|
self.name = name
|
|
assert isinstance(self.name, str)
|
|
self.category = category or self.category
|
|
self.version = version or self.version
|
|
self.reference = reference or self.reference
|
|
_headers[self.name.lower()] = self
|
|
self.sort_order = {'general': 1, 'request': 2,
|
|
'response': 3, 'entity': 4 }[self.category]
|
|
self._environ_name = getattr(self, '_environ_name',
|
|
'HTTP_'+ self.name.upper().replace("-","_"))
|
|
self._headers_name = getattr(self, '_headers_name',
|
|
self.name.lower())
|
|
assert self.version in ('1.1', '1.0', '0.9')
|
|
return self
|
|
|
|
def __str__(self):
|
|
return self.name
|
|
|
|
def __lt__(self, other):
|
|
"""
|
|
sort header instances as specified by RFC 2616
|
|
|
|
Re-define sorting so that general headers are first, followed
|
|
by request/response headers, and then entity headers. The
|
|
list.sort() methods use the less-than operator for this purpose.
|
|
"""
|
|
if isinstance(other, HTTPHeader):
|
|
if self.sort_order != other.sort_order:
|
|
return self.sort_order < other.sort_order
|
|
return self.name < other.name
|
|
return False
|
|
|
|
def __repr__(self):
|
|
ref = self.reference and (' (%s)' % self.reference) or ''
|
|
return '<%s %s%s>' % (self.__class__.__name__, self.name, ref)
|
|
|
|
def values(self, *args, **kwargs):
|
|
"""
|
|
find/construct field-value(s) for the given header
|
|
|
|
Resolution is done according to the following arguments:
|
|
|
|
- If only keyword arguments are given, then this is equivalent
|
|
to ``compose(**kwargs)``.
|
|
|
|
- If the first (and only) argument is a dict, it is assumed
|
|
to be a WSGI ``environ`` and the result of the corresponding
|
|
``HTTP_`` entry is returned.
|
|
|
|
- If the first (and only) argument is a list, it is assumed
|
|
to be a WSGI ``response_headers`` and the field-value(s)
|
|
for this header are collected and returned.
|
|
|
|
- In all other cases, the arguments are collected, checked that
|
|
they are string values, possibly verified by the header's
|
|
logic, and returned.
|
|
|
|
At this time it is an error to provide keyword arguments if args
|
|
is present (this might change). It is an error to provide both
|
|
a WSGI object and also string arguments. If no arguments are
|
|
provided, then ``compose()`` is called to provide a default
|
|
value for the header; if there is not default it is an error.
|
|
"""
|
|
if not args:
|
|
return self.compose(**kwargs)
|
|
if list == type(args[0]):
|
|
assert 1 == len(args)
|
|
result = []
|
|
name = self.name.lower()
|
|
for value in [value for header, value in args[0]
|
|
if header.lower() == name]:
|
|
result.append(value)
|
|
return result
|
|
if dict == type(args[0]):
|
|
assert 1 == len(args) and 'wsgi.version' in args[0]
|
|
value = args[0].get(self._environ_name)
|
|
if not value:
|
|
return ()
|
|
return (value,)
|
|
for item in args:
|
|
assert not type(item) in (dict, list)
|
|
return args
|
|
|
|
def __call__(self, *args, **kwargs):
|
|
"""
|
|
converts ``values()`` into a string value
|
|
|
|
This method converts the results of ``values()`` into a string
|
|
value for common usage. By default, it is asserted that only
|
|
one value exists; if you need to access all values then either
|
|
call ``values()`` directly, or inherit ``_MultiValueHeader``
|
|
which overrides this method to return a comma separated list of
|
|
values as described by section 4.2 of RFC 2616.
|
|
"""
|
|
values = self.values(*args, **kwargs)
|
|
assert isinstance(values, (tuple, list))
|
|
if not values:
|
|
return ''
|
|
assert len(values) == 1, "more than one value: %s" % repr(values)
|
|
return str(values[0]).strip()
|
|
|
|
def delete(self, collection):
|
|
"""
|
|
removes all occurances of the header from the collection provided
|
|
"""
|
|
if type(collection) == dict:
|
|
if self._environ_name in collection:
|
|
del collection[self._environ_name]
|
|
return self
|
|
assert list == type(collection)
|
|
i = 0
|
|
while i < len(collection):
|
|
if collection[i][0].lower() == self._headers_name:
|
|
del collection[i]
|
|
continue
|
|
i += 1
|
|
|
|
def update(self, collection, *args, **kwargs):
|
|
"""
|
|
updates the collection with the provided header value
|
|
|
|
This method replaces (in-place when possible) all occurrences of
|
|
the given header with the provided value. If no value is
|
|
provided, this is the same as ``remove`` (note that this case
|
|
can only occur if the target is a collection w/o a corresponding
|
|
header value). The return value is the new header value (which
|
|
could be a list for ``_MultiEntryHeader`` instances).
|
|
"""
|
|
value = self.__call__(*args, **kwargs)
|
|
if not value:
|
|
self.delete(collection)
|
|
return
|
|
if type(collection) == dict:
|
|
collection[self._environ_name] = value
|
|
return
|
|
assert list == type(collection)
|
|
i = 0
|
|
found = False
|
|
while i < len(collection):
|
|
if collection[i][0].lower() == self._headers_name:
|
|
if found:
|
|
del collection[i]
|
|
continue
|
|
collection[i] = (self.name, value)
|
|
found = True
|
|
i += 1
|
|
if not found:
|
|
collection.append((self.name, value))
|
|
|
|
def tuples(self, *args, **kwargs):
|
|
value = self.__call__(*args, **kwargs)
|
|
if not value:
|
|
return ()
|
|
return [(self.name, value)]
|
|
|
|
class _SingleValueHeader(HTTPHeader):
|
|
"""
|
|
a ``HTTPHeader`` with exactly a single value
|
|
|
|
This is the default behavior of ``HTTPHeader`` where returning a
|
|
the string-value of headers via ``__call__`` assumes that only
|
|
a single value exists.
|
|
"""
|
|
pass
|
|
|
|
class _MultiValueHeader(HTTPHeader):
|
|
"""
|
|
a ``HTTPHeader`` with one or more values
|
|
|
|
The field-value for these header instances is is allowed to be more
|
|
than one value; whereby the ``__call__`` method returns a comma
|
|
separated list as described by section 4.2 of RFC 2616.
|
|
"""
|
|
|
|
def __call__(self, *args, **kwargs):
|
|
results = self.values(*args, **kwargs)
|
|
if not results:
|
|
return ''
|
|
return ", ".join([str(v).strip() for v in results])
|
|
|
|
def parse(self, *args, **kwargs):
|
|
value = self.__call__(*args, **kwargs)
|
|
values = value.split(',')
|
|
return [
|
|
v.strip() for v in values
|
|
if v.strip()]
|
|
|
|
class _MultiEntryHeader(HTTPHeader):
|
|
"""
|
|
a multi-value ``HTTPHeader`` where items cannot be combined with a comma
|
|
|
|
This header is multi-valued, but the values should not be combined
|
|
with a comma since the header is not in compliance with RFC 2616
|
|
(Set-Cookie due to Expires parameter) or which common user-agents do
|
|
not behave well when the header values are combined.
|
|
"""
|
|
|
|
def update(self, collection, *args, **kwargs):
|
|
assert list == type(collection), "``environ`` may not be updated"
|
|
self.delete(collection)
|
|
collection.extend(self.tuples(*args, **kwargs))
|
|
|
|
def tuples(self, *args, **kwargs):
|
|
values = self.values(*args, **kwargs)
|
|
if not values:
|
|
return ()
|
|
return [(self.name, value.strip()) for value in values]
|
|
|
|
def get_header(name, raiseError=True):
|
|
"""
|
|
find the given ``HTTPHeader`` instance
|
|
|
|
This function finds the corresponding ``HTTPHeader`` for the
|
|
``name`` provided. So that python-style names can be used,
|
|
underscores are converted to dashes before the lookup.
|
|
"""
|
|
retval = _headers.get(str(name).strip().lower().replace("_","-"))
|
|
if not retval and raiseError:
|
|
raise AssertionError("'%s' is an unknown header" % name)
|
|
return retval
|
|
|
|
def list_headers(general=None, request=None, response=None, entity=None):
|
|
" list all headers for a given category "
|
|
if not (general or request or response or entity):
|
|
general = request = response = entity = True
|
|
search = []
|
|
for (bool, strval) in ((general, 'general'), (request, 'request'),
|
|
(response, 'response'), (entity, 'entity')):
|
|
if bool:
|
|
search.append(strval)
|
|
return [head for head in _headers.values() if head.category in search]
|
|
|
|
def normalize_headers(response_headers, strict=True):
|
|
"""
|
|
sort headers as suggested by RFC 2616
|
|
|
|
This alters the underlying response_headers to use the common
|
|
name for each header; as well as sorting them with general
|
|
headers first, followed by request/response headers, then
|
|
entity headers, and unknown headers last.
|
|
"""
|
|
category = {}
|
|
for idx in range(len(response_headers)):
|
|
(key, val) = response_headers[idx]
|
|
head = get_header(key, strict)
|
|
if not head:
|
|
newhead = '-'.join([x.capitalize() for x in
|
|
key.replace("_","-").split("-")])
|
|
response_headers[idx] = (newhead, val)
|
|
category[newhead] = 4
|
|
continue
|
|
response_headers[idx] = (str(head), val)
|
|
category[str(head)] = head.sort_order
|
|
def key_func(item):
|
|
value = item[0]
|
|
return (category[value], value)
|
|
response_headers.sort(key=key_func)
|
|
|
|
class _DateHeader(_SingleValueHeader):
|
|
"""
|
|
handle date-based headers
|
|
|
|
This extends the ``_SingleValueHeader`` object with specific
|
|
treatment of time values:
|
|
|
|
- It overrides ``compose`` to provide a sole keyword argument
|
|
``time`` which is an offset in seconds from the current time.
|
|
|
|
- A ``time`` method is provided which parses the given value
|
|
and returns the current time value.
|
|
"""
|
|
|
|
def compose(self, time=None, delta=None):
|
|
time = time or now()
|
|
if delta:
|
|
assert type(delta) == int
|
|
time += delta
|
|
return (formatdate(time),)
|
|
|
|
def parse(self, *args, **kwargs):
|
|
""" return the time value (in seconds since 1970) """
|
|
value = self.__call__(*args, **kwargs)
|
|
if value:
|
|
try:
|
|
return mktime_tz(parsedate_tz(value))
|
|
except (TypeError, OverflowError):
|
|
raise HTTPBadRequest((
|
|
"Received an ill-formed timestamp for %s: %s\r\n") %
|
|
(self.name, value))
|
|
|
|
#
|
|
# Following are specific HTTP headers. Since these classes are mostly
|
|
# singletons, there is no point in keeping the class around once it has
|
|
# been instantiated, so we use the same name.
|
|
#
|
|
|
|
class _CacheControl(_MultiValueHeader):
|
|
"""
|
|
Cache-Control, RFC 2616 14.9 (use ``CACHE_CONTROL``)
|
|
|
|
This header can be constructed (using keyword arguments), by
|
|
first specifying one of the following mechanisms:
|
|
|
|
``public``
|
|
|
|
if True, this argument specifies that the
|
|
response, as a whole, may be cashed.
|
|
|
|
``private``
|
|
|
|
if True, this argument specifies that the response, as a
|
|
whole, may be cashed; this implementation does not support
|
|
the enumeration of private fields
|
|
|
|
``no_cache``
|
|
|
|
if True, this argument specifies that the response, as a
|
|
whole, may not be cashed; this implementation does not
|
|
support the enumeration of private fields
|
|
|
|
In general, only one of the above three may be True, the other 2
|
|
must then be False or None. If all three are None, then the cache
|
|
is assumed to be ``public``. Following one of these mechanism
|
|
specifiers are various modifiers:
|
|
|
|
``no_store``
|
|
|
|
indicates if content may be stored on disk;
|
|
otherwise cache is limited to memory (note:
|
|
users can still save the data, this applies
|
|
to intermediate caches)
|
|
|
|
``max_age``
|
|
|
|
the maximum duration (in seconds) for which
|
|
the content should be cached; if ``no-cache``
|
|
is specified, this defaults to 0 seconds
|
|
|
|
``s_maxage``
|
|
|
|
the maximum duration (in seconds) for which the
|
|
content should be allowed in a shared cache.
|
|
|
|
``no_transform``
|
|
|
|
specifies that an intermediate cache should
|
|
not convert the content from one type to
|
|
another (e.g. transform a BMP to a PNG).
|
|
|
|
``extensions``
|
|
|
|
gives additional cache-control extensions,
|
|
such as items like, community="UCI" (14.9.6)
|
|
|
|
The usage of ``apply()`` on this header has side-effects. As
|
|
recommended by RFC 2616, if ``max_age`` is provided, then then the
|
|
``Expires`` header is also calculated for HTTP/1.0 clients and
|
|
proxies (this is done at the time ``apply()`` is called). For
|
|
``no-cache`` and for ``private`` cases, we either do not want the
|
|
response cached or do not want any response accidently returned to
|
|
other users; so to prevent this case, we set the ``Expires`` header
|
|
to the time of the request, signifying to HTTP/1.0 transports that
|
|
the content isn't to be cached. If you are using SSL, your
|
|
communication is already "private", so to work with HTTP/1.0
|
|
browsers over SSL, consider specifying your cache as ``public`` as
|
|
the distinction between public and private is moot.
|
|
"""
|
|
|
|
# common values for max-age; "good enough" approximates
|
|
ONE_HOUR = 60*60
|
|
ONE_DAY = ONE_HOUR * 24
|
|
ONE_WEEK = ONE_DAY * 7
|
|
ONE_MONTH = ONE_DAY * 30
|
|
ONE_YEAR = ONE_WEEK * 52
|
|
|
|
def _compose(self, public=None, private=None, no_cache=None,
|
|
no_store=False, max_age=None, s_maxage=None,
|
|
no_transform=False, **extensions):
|
|
assert isinstance(max_age, (type(None), int))
|
|
assert isinstance(s_maxage, (type(None), int))
|
|
expires = 0
|
|
result = []
|
|
if private is True:
|
|
assert not public and not no_cache and not s_maxage
|
|
result.append('private')
|
|
elif no_cache is True:
|
|
assert not public and not private and not max_age
|
|
result.append('no-cache')
|
|
else:
|
|
assert public is None or public is True
|
|
assert not private and not no_cache
|
|
expires = max_age
|
|
result.append('public')
|
|
if no_store:
|
|
result.append('no-store')
|
|
if no_transform:
|
|
result.append('no-transform')
|
|
if max_age is not None:
|
|
result.append('max-age=%d' % max_age)
|
|
if s_maxage is not None:
|
|
result.append('s-maxage=%d' % s_maxage)
|
|
for (k, v) in six.iteritems(extensions):
|
|
if k not in self.extensions:
|
|
raise AssertionError("unexpected extension used: '%s'" % k)
|
|
result.append('%s="%s"' % (k.replace("_", "-"), v))
|
|
return (result, expires)
|
|
|
|
def compose(self, **kwargs):
|
|
(result, expires) = self._compose(**kwargs)
|
|
return result
|
|
|
|
def apply(self, collection, **kwargs):
|
|
""" returns the offset expiration in seconds """
|
|
(result, expires) = self._compose(**kwargs)
|
|
if expires is not None:
|
|
EXPIRES.update(collection, delta=expires)
|
|
self.update(collection, *result)
|
|
return expires
|
|
|
|
_CacheControl('Cache-Control', 'general', 'RFC 2616, 14.9')
|
|
|
|
class _ContentType(_SingleValueHeader):
|
|
"""
|
|
Content-Type, RFC 2616 section 14.17
|
|
|
|
Unlike other headers, use the CGI variable instead.
|
|
"""
|
|
version = '1.0'
|
|
_environ_name = 'CONTENT_TYPE'
|
|
|
|
# common mimetype constants
|
|
UNKNOWN = 'application/octet-stream'
|
|
TEXT_PLAIN = 'text/plain'
|
|
TEXT_HTML = 'text/html'
|
|
TEXT_XML = 'text/xml'
|
|
|
|
def compose(self, major=None, minor=None, charset=None):
|
|
if not major:
|
|
if minor in ('plain', 'html', 'xml'):
|
|
major = 'text'
|
|
else:
|
|
assert not minor and not charset
|
|
return (self.UNKNOWN,)
|
|
if not minor:
|
|
minor = "*"
|
|
result = "%s/%s" % (major, minor)
|
|
if charset:
|
|
result += "; charset=%s" % charset
|
|
return (result,)
|
|
|
|
_ContentType('Content-Type', 'entity', 'RFC 2616, 14.17')
|
|
|
|
class _ContentLength(_SingleValueHeader):
|
|
"""
|
|
Content-Length, RFC 2616 section 14.13
|
|
|
|
Unlike other headers, use the CGI variable instead.
|
|
"""
|
|
version = "1.0"
|
|
_environ_name = 'CONTENT_LENGTH'
|
|
|
|
_ContentLength('Content-Length', 'entity', 'RFC 2616, 14.13')
|
|
|
|
class _ContentDisposition(_SingleValueHeader):
|
|
"""
|
|
Content-Disposition, RFC 2183 (use ``CONTENT_DISPOSITION``)
|
|
|
|
This header can be constructed (using keyword arguments),
|
|
by first specifying one of the following mechanisms:
|
|
|
|
``attachment``
|
|
|
|
if True, this specifies that the content should not be
|
|
shown in the browser and should be handled externally,
|
|
even if the browser could render the content
|
|
|
|
``inline``
|
|
|
|
exclusive with attachment; indicates that the content
|
|
should be rendered in the browser if possible, but
|
|
otherwise it should be handled externally
|
|
|
|
Only one of the above 2 may be True. If both are None, then
|
|
the disposition is assumed to be an ``attachment``. These are
|
|
distinct fields since support for field enumeration may be
|
|
added in the future.
|
|
|
|
``filename``
|
|
|
|
the filename parameter, if any, to be reported; if
|
|
this is None, then the current object's filename
|
|
attribute is used
|
|
|
|
The usage of ``apply()`` on this header has side-effects. If
|
|
filename is provided, and Content-Type is not set or is
|
|
'application/octet-stream', then the mimetypes.guess is used to
|
|
upgrade the Content-Type setting.
|
|
"""
|
|
|
|
def _compose(self, attachment=None, inline=None, filename=None):
|
|
result = []
|
|
if inline is True:
|
|
assert not attachment
|
|
result.append('inline')
|
|
else:
|
|
assert not inline
|
|
result.append('attachment')
|
|
if filename:
|
|
assert '"' not in filename
|
|
filename = filename.split("/")[-1]
|
|
filename = filename.split("\\")[-1]
|
|
result.append('filename="%s"' % filename)
|
|
return (("; ".join(result),), filename)
|
|
|
|
def compose(self, **kwargs):
|
|
(result, mimetype) = self._compose(**kwargs)
|
|
return result
|
|
|
|
def apply(self, collection, **kwargs):
|
|
""" return the new Content-Type side-effect value """
|
|
(result, filename) = self._compose(**kwargs)
|
|
mimetype = CONTENT_TYPE(collection)
|
|
if filename and (not mimetype or CONTENT_TYPE.UNKNOWN == mimetype):
|
|
mimetype, _ = mimetypes.guess_type(filename)
|
|
if mimetype and CONTENT_TYPE.UNKNOWN != mimetype:
|
|
CONTENT_TYPE.update(collection, mimetype)
|
|
self.update(collection, *result)
|
|
return mimetype
|
|
|
|
_ContentDisposition('Content-Disposition', 'entity', 'RFC 2183')
|
|
|
|
class _IfModifiedSince(_DateHeader):
|
|
"""
|
|
If-Modified-Since, RFC 2616 section 14.25
|
|
"""
|
|
version = '1.0'
|
|
|
|
def __call__(self, *args, **kwargs):
|
|
"""
|
|
Split the value on ';' incase the header includes extra attributes. E.g.
|
|
IE 6 is known to send:
|
|
If-Modified-Since: Sun, 25 Jun 2006 20:36:35 GMT; length=1506
|
|
"""
|
|
return _DateHeader.__call__(self, *args, **kwargs).split(';', 1)[0]
|
|
|
|
def parse(self, *args, **kwargs):
|
|
value = _DateHeader.parse(self, *args, **kwargs)
|
|
if value and value > now():
|
|
raise HTTPBadRequest((
|
|
"Please check your system clock.\r\n"
|
|
"According to this server, the time provided in the\r\n"
|
|
"%s header is in the future.\r\n") % self.name)
|
|
return value
|
|
_IfModifiedSince('If-Modified-Since', 'request', 'RFC 2616, 14.25')
|
|
|
|
class _Range(_MultiValueHeader):
|
|
"""
|
|
Range, RFC 2616 14.35 (use ``RANGE``)
|
|
|
|
According to section 14.16, the response to this message should be a
|
|
206 Partial Content and that if multiple non-overlapping byte ranges
|
|
are requested (it is an error to request multiple overlapping
|
|
ranges) the result should be sent as multipart/byteranges mimetype.
|
|
|
|
The server should respond with '416 Requested Range Not Satisfiable'
|
|
if the requested ranges are out-of-bounds. The specification also
|
|
indicates that a syntax error in the Range request should result in
|
|
the header being ignored rather than a '400 Bad Request'.
|
|
"""
|
|
|
|
def parse(self, *args, **kwargs):
|
|
"""
|
|
Returns a tuple (units, list), where list is a sequence of
|
|
(begin, end) tuples; and end is None if it was not provided.
|
|
"""
|
|
value = self.__call__(*args, **kwargs)
|
|
if not value:
|
|
return None
|
|
ranges = []
|
|
last_end = -1
|
|
try:
|
|
(units, range) = value.split("=", 1)
|
|
units = units.strip().lower()
|
|
for item in range.split(","):
|
|
(begin, end) = item.split("-")
|
|
if not begin.strip():
|
|
begin = 0
|
|
else:
|
|
begin = int(begin)
|
|
if begin <= last_end:
|
|
raise ValueError()
|
|
if not end.strip():
|
|
end = None
|
|
else:
|
|
end = int(end)
|
|
last_end = end
|
|
ranges.append((begin, end))
|
|
except ValueError:
|
|
# In this case where the Range header is malformed,
|
|
# section 14.16 says to treat the request as if the
|
|
# Range header was not present. How do I log this?
|
|
return None
|
|
return (units, ranges)
|
|
_Range('Range', 'request', 'RFC 2616, 14.35')
|
|
|
|
class _AcceptLanguage(_MultiValueHeader):
|
|
"""
|
|
Accept-Language, RFC 2616 section 14.4
|
|
"""
|
|
|
|
def parse(self, *args, **kwargs):
|
|
"""
|
|
Return a list of language tags sorted by their "q" values. For example,
|
|
"en-us,en;q=0.5" should return ``["en-us", "en"]``. If there is no
|
|
``Accept-Language`` header present, default to ``[]``.
|
|
"""
|
|
header = self.__call__(*args, **kwargs)
|
|
if header is None:
|
|
return []
|
|
langs = [v for v in header.split(",") if v]
|
|
qs = []
|
|
for lang in langs:
|
|
pieces = lang.split(";")
|
|
lang, params = pieces[0].strip().lower(), pieces[1:]
|
|
q = 1
|
|
for param in params:
|
|
if '=' not in param:
|
|
# Malformed request; probably a bot, we'll ignore
|
|
continue
|
|
lvalue, rvalue = param.split("=")
|
|
lvalue = lvalue.strip().lower()
|
|
rvalue = rvalue.strip()
|
|
if lvalue == "q":
|
|
q = float(rvalue)
|
|
qs.append((lang, q))
|
|
qs.sort(key=lambda query: query[1], reverse=True)
|
|
return [lang for (lang, q) in qs]
|
|
_AcceptLanguage('Accept-Language', 'request', 'RFC 2616, 14.4')
|
|
|
|
class _AcceptRanges(_MultiValueHeader):
|
|
"""
|
|
Accept-Ranges, RFC 2616 section 14.5
|
|
"""
|
|
def compose(self, none=None, bytes=None):
|
|
if bytes:
|
|
return ('bytes',)
|
|
return ('none',)
|
|
_AcceptRanges('Accept-Ranges', 'response', 'RFC 2616, 14.5')
|
|
|
|
class _ContentRange(_SingleValueHeader):
|
|
"""
|
|
Content-Range, RFC 2616 section 14.6
|
|
"""
|
|
def compose(self, first_byte=None, last_byte=None, total_length=None):
|
|
retval = "bytes %d-%d/%d" % (first_byte, last_byte, total_length)
|
|
assert last_byte == -1 or first_byte <= last_byte
|
|
assert last_byte < total_length
|
|
return (retval,)
|
|
_ContentRange('Content-Range', 'entity', 'RFC 2616, 14.6')
|
|
|
|
class _Authorization(_SingleValueHeader):
|
|
"""
|
|
Authorization, RFC 2617 (RFC 2616, 14.8)
|
|
"""
|
|
def compose(self, digest=None, basic=None, username=None, password=None,
|
|
challenge=None, path=None, method=None):
|
|
assert username and password
|
|
if basic or not challenge:
|
|
assert not digest
|
|
userpass = "%s:%s" % (username.strip(), password.strip())
|
|
return "Basic %s" % userpass.encode('base64').strip()
|
|
assert challenge and not basic
|
|
path = path or "/"
|
|
(_, realm) = challenge.split('realm="')
|
|
(realm, _) = realm.split('"', 1)
|
|
auth = AbstractDigestAuthHandler()
|
|
auth.add_password(realm, path, username, password)
|
|
(token, challenge) = challenge.split(' ', 1)
|
|
chal = parse_keqv_list(parse_http_list(challenge))
|
|
class FakeRequest(object):
|
|
if six.PY3:
|
|
@property
|
|
def full_url(self):
|
|
return path
|
|
|
|
selector = full_url
|
|
|
|
@property
|
|
def data(self):
|
|
return None
|
|
else:
|
|
def get_full_url(self):
|
|
return path
|
|
|
|
get_selector = get_full_url
|
|
|
|
def has_data(self):
|
|
return False
|
|
|
|
def get_method(self):
|
|
return method or "GET"
|
|
|
|
retval = "Digest %s" % auth.get_authorization(FakeRequest(), chal)
|
|
return (retval,)
|
|
_Authorization('Authorization', 'request', 'RFC 2617')
|
|
|
|
#
|
|
# For now, construct a minimalistic version of the field-names; at a
|
|
# later date more complicated headers may sprout content constructors.
|
|
# The items commented out have concrete variants.
|
|
#
|
|
for (name, category, version, style, comment) in \
|
|
(("Accept" ,'request' ,'1.1','multi-value','RFC 2616, 14.1' )
|
|
,("Accept-Charset" ,'request' ,'1.1','multi-value','RFC 2616, 14.2' )
|
|
,("Accept-Encoding" ,'request' ,'1.1','multi-value','RFC 2616, 14.3' )
|
|
#,("Accept-Language" ,'request' ,'1.1','multi-value','RFC 2616, 14.4' )
|
|
#,("Accept-Ranges" ,'response','1.1','multi-value','RFC 2616, 14.5' )
|
|
,("Age" ,'response','1.1','singular' ,'RFC 2616, 14.6' )
|
|
,("Allow" ,'entity' ,'1.0','multi-value','RFC 2616, 14.7' )
|
|
#,("Authorization" ,'request' ,'1.0','singular' ,'RFC 2616, 14.8' )
|
|
#,("Cache-Control" ,'general' ,'1.1','multi-value','RFC 2616, 14.9' )
|
|
,("Cookie" ,'request' ,'1.0','multi-value','RFC 2109/Netscape')
|
|
,("Connection" ,'general' ,'1.1','multi-value','RFC 2616, 14.10')
|
|
,("Content-Encoding" ,'entity' ,'1.0','multi-value','RFC 2616, 14.11')
|
|
#,("Content-Disposition",'entity' ,'1.1','multi-value','RFC 2616, 15.5' )
|
|
,("Content-Language" ,'entity' ,'1.1','multi-value','RFC 2616, 14.12')
|
|
#,("Content-Length" ,'entity' ,'1.0','singular' ,'RFC 2616, 14.13')
|
|
,("Content-Location" ,'entity' ,'1.1','singular' ,'RFC 2616, 14.14')
|
|
,("Content-MD5" ,'entity' ,'1.1','singular' ,'RFC 2616, 14.15')
|
|
#,("Content-Range" ,'entity' ,'1.1','singular' ,'RFC 2616, 14.16')
|
|
#,("Content-Type" ,'entity' ,'1.0','singular' ,'RFC 2616, 14.17')
|
|
,("Date" ,'general' ,'1.0','date-header','RFC 2616, 14.18')
|
|
,("ETag" ,'response','1.1','singular' ,'RFC 2616, 14.19')
|
|
,("Expect" ,'request' ,'1.1','multi-value','RFC 2616, 14.20')
|
|
,("Expires" ,'entity' ,'1.0','date-header','RFC 2616, 14.21')
|
|
,("From" ,'request' ,'1.0','singular' ,'RFC 2616, 14.22')
|
|
,("Host" ,'request' ,'1.1','singular' ,'RFC 2616, 14.23')
|
|
,("If-Match" ,'request' ,'1.1','multi-value','RFC 2616, 14.24')
|
|
#,("If-Modified-Since" ,'request' ,'1.0','date-header','RFC 2616, 14.25')
|
|
,("If-None-Match" ,'request' ,'1.1','multi-value','RFC 2616, 14.26')
|
|
,("If-Range" ,'request' ,'1.1','singular' ,'RFC 2616, 14.27')
|
|
,("If-Unmodified-Since",'request' ,'1.1','date-header' ,'RFC 2616, 14.28')
|
|
,("Last-Modified" ,'entity' ,'1.0','date-header','RFC 2616, 14.29')
|
|
,("Location" ,'response','1.0','singular' ,'RFC 2616, 14.30')
|
|
,("Max-Forwards" ,'request' ,'1.1','singular' ,'RFC 2616, 14.31')
|
|
,("Pragma" ,'general' ,'1.0','multi-value','RFC 2616, 14.32')
|
|
,("Proxy-Authenticate" ,'response','1.1','multi-value','RFC 2616, 14.33')
|
|
,("Proxy-Authorization",'request' ,'1.1','singular' ,'RFC 2616, 14.34')
|
|
#,("Range" ,'request' ,'1.1','multi-value','RFC 2616, 14.35')
|
|
,("Referer" ,'request' ,'1.0','singular' ,'RFC 2616, 14.36')
|
|
,("Retry-After" ,'response','1.1','singular' ,'RFC 2616, 14.37')
|
|
,("Server" ,'response','1.0','singular' ,'RFC 2616, 14.38')
|
|
,("Set-Cookie" ,'response','1.0','multi-entry','RFC 2109/Netscape')
|
|
,("TE" ,'request' ,'1.1','multi-value','RFC 2616, 14.39')
|
|
,("Trailer" ,'general' ,'1.1','multi-value','RFC 2616, 14.40')
|
|
,("Transfer-Encoding" ,'general' ,'1.1','multi-value','RFC 2616, 14.41')
|
|
,("Upgrade" ,'general' ,'1.1','multi-value','RFC 2616, 14.42')
|
|
,("User-Agent" ,'request' ,'1.0','singular' ,'RFC 2616, 14.43')
|
|
,("Vary" ,'response','1.1','multi-value','RFC 2616, 14.44')
|
|
,("Via" ,'general' ,'1.1','multi-value','RFC 2616, 14.45')
|
|
,("Warning" ,'general' ,'1.1','multi-entry','RFC 2616, 14.46')
|
|
,("WWW-Authenticate" ,'response','1.0','multi-entry','RFC 2616, 14.47')):
|
|
klass = {'multi-value': _MultiValueHeader,
|
|
'multi-entry': _MultiEntryHeader,
|
|
'date-header': _DateHeader,
|
|
'singular' : _SingleValueHeader}[style]
|
|
klass(name, category, comment, version).__doc__ = comment
|
|
del klass
|
|
|
|
for head in _headers.values():
|
|
headname = head.name.replace("-","_").upper()
|
|
locals()[headname] = head
|
|
__all__.append(headname)
|
|
|
|
__pudge_all__ = __all__[:]
|
|
for _name, _obj in six.iteritems(dict(globals())):
|
|
if isinstance(_obj, type) and issubclass(_obj, HTTPHeader):
|
|
__pudge_all__.append(_name)
|