User guide

Learn how to get started with PyPAC. This guide assumes familiarity with the Requests library and its API.

Basic usage

The quickest way to get started is to use a PACSession:

>>> from pypac import PACSession
>>> session = PACSession()
>>> session.get('http://example.org')
<Response [200]>

Behind the scenes, the first request made with the session will trigger the PAC auto-discovery process. This process first looks for a PAC URL setting in Windows, and if not found, moves on to the DNS WPAD protocol.

Once a PAC file is found, it will be automatically consulted for every request. If a PAC wasn’t found, then PACSession acts just like a requests.Session.

Specify URL to your PAC

The get_pac() function encapsulates the PAC file discovery process and returns a PACFile object upon success. Instead of auto-discovery, this function can be used to get and parse a PAC file from a given URL, which can then be passed to PACSession:

from pypac import PACSession, get_pac
pac = get_pac(url='http://foo.corp.local/proxy.pac')
session = PACSession(pac)

This is useful if you already know the URL for the PAC file to use, and want to skip auto-discovery.

Note that by default, PyPAC requires that PAC files be served with a content-type of either application/x-ns-proxy-autoconfig or application/x-javascript-config. Files served with other types are excluded from consideration as a PAC file. This behaviour can be customized using the allowed_content_types keyword:

pac = get_pac(url='http://foo.corp.local/proxy.txt',
              allowed_content_types=['text/plain'])

Load PAC from a string or file

This is an unusual scenario, but also supported. Just instantiate your own PACFile, passing it a string containing the PAC JavaScript. For instance, to load a local PAC file and use it with a PACSession:

from pypac import PACSession
from pypac.parser import PACFile

with open('proxy.pac') as f:
   pac = PACFile(f.read())

session = PACSession(pac)

Proxy authentication

Proxy servers specified by a PAC file typically allow anonymous access. However, PyPAC supports including Basic proxy authentication credentials:

from pypac import PACSession
from requests.auth import HTTPProxyAuth
session = PACSession(proxy_auth=HTTPProxyAuth('user', 'pwd'))
# or alternatively...
session.proxy_auth = HTTPProxyAuth('user', 'pwd')

NTLM authentication for proxies may also be supported. Refer to the requests-ntlm project.

Custom proxy failover criteria

You can decide when a proxy from the PAC file should be considered unusable. When a proxy is considered unusable, it’s blacklisted, and the next proxy specified by the PAC file is used. PACSession can be configured with callables that define the criteria for failover.

One way to decide when to fail over is by inspecting the response to a request. By default, PyPAC does not do this, but you may find it useful in case a failing proxy interjects with an unusual response. Another use case is to skip proxies upon an HTTP 407 response:

from pypac import PACSession
import requests

def failover_criteria(response):
    return response.status_code == requests.codes.proxy_authentication_required

session = PACSession(response_proxy_fail_filter=failover_criteria)

Another way to decide proxy failover is based on any exception raised while making the request. This can be configured by passing a callable for the exception_proxy_fail_filter keyword in the PACSession constructor. This callable takes an exception object as an argument, and returns true if failover should occur. The default behaviour is to trigger proxy failover upon encountering requests.exceptions.ConnectTimeout or requests.exceptions.ProxyError.

If all proxies specified by the PAC file have been blacklisted, and the PAC didn’t return a final instruction to go DIRECT, then ProxyConfigExhaustedError is raised.

Errors and exceptions

PyPAC defines some exceptions that can occur in the course of PAC auto-discovery, parsing, and execution.

MalformedPacError
PyPAC failed to parse a file that claims to be a PAC.
ProxyConfigExhaustedError
All proxy servers for the given URL have been marked as failed, and the PAC file did not specify a final instruction to go DIRECT.

Security considerations

Supporting and using PAC files comes with some security implications that are worth considering.

PAC discovery and parsing

PAC files are JavaScript. PyPAC uses dukpy to parse and execute JavaScript. dukpy was not designed for handling untrusted JavaScript, and so it is unclear whether the handling of PAC files is sufficiently sandboxed to prevent untrusted Python code execution.

When looking for a PAC file using DNS WPAD, the local machine’s fully-qualified hostname is checked against the Mozilla Public Suffix List to prevent requesting any PAC files outside the scope of the organization. If the hostname’s TLD isn’t in the Public Suffix List, then everything up to the final node is used in the search path. For example, a hostname of foo.bar.local will result in a search for a PAC file from wpad.bar.local and wpad.local.

PyPAC uses the tld library to match TLDs.

HTTPS-decrypting proxies

Proxies operated by a firewall or web security gateway may may be configured with a man-in-the-middle (MITM) certificate to allow decrypting HTTPS traffic for inspection. Your organization may then provision its client machines with this certificate trusted. Browsers such as Internet Explorer and Chrome, which honour the operating system’s certificate store, will accept the proxy’s certificate. However, Requests defaults to its own bundled CA certificates, and thus SSL certificate verification will fail when using such a proxy.

A quick solution is to make your requests with the verify=False option. Understand that this is an overly broad solution: while it allows your request to proceed and be decrypted for inspection by your network proxy (an entity that you ostensibly trust), it also disables SSL certificate verification entirely. This means requests may be vulnerable to MITM attacks.

What’s missing

The DHCP portion of the Web Proxy Auto-Discovery (WPAD) protocol is not implemented.

PyPAC currently works with Requests by including a subclass of requests.Session. No ready-to-use solutions are included for other HTTP libraries, though PyPAC has all the building blocks needed to make one easily.

Pull requests to add these features are welcome.