DHT upstream module + nginx patches

Peter Schüller scode at spotify.com
Fri Oct 9 12:46:45 MSD 2009


Hello,

We have been doing some nginx development for internal use. Tommie has
already sent the statistics module to the list yesterday separately,
because it was very self-contained. However, we have additional
changes that are not suitable for submission for inclusion, but we
still want to publish the code in the hope that it may be useful to
someone and to elicit feedback from interested people.

I am attaching two things; a module (spdht) which implements DHT based
routing of requests to multiple upstream servers, and a patchset for
nginx itself (against 0.7.61) that are needed, in part, in order to
use the module.

In both cases, it is unpolished in terms of its release, and we
realize it is not directly applicable to any user of nginx. However
even so we would rather release it than not, and at least interested
people may look at the code. Some parts may be suitable for selective
inclusion.

The spdh module routes requests based on the hash of the URL being
requested. It needs some configuration in nginx itself (an example
nginx.conf is included in the tarball). In addition the DHT ring is
configured through DNS. For a simple case with only two hosts (for
brevity), DNS is configured similar to this:

; DHT cluster options - replication level for each collection, and
hash algorithm
config._service-name._http           TXT     "slaves=stuff:2
otherstuff:1" "hash=sha1"

; SRV records for the service
_service-name._http                  SRV     1000 1000 80 host1
_service-name._http                  SRV     1000 1000 80 host2

; TXT records containing DHT tokens
tokens.80.host1                   TXT
"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"
tokens.80.host2                   TXT
"7FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF"

This puts host1 and host2 into the ring, each responsible for half of
the keyspace.

For a given path P, the set of hosts responsible for that path is is
calculated by hashing the path N times for N levels of redundancy
(note that the config TXT record specifies slave count; this is
actually a misnomer since the slave role does not exist; 1 slave -> 2
copies of a file).

In case of duplicate hosts, hashing continues (with some limit) until
N unique hosts have been found.

Now, in terms of the patches to nginx itself, a short summary of the
approximate feature set is:

  * Add support for SHA1 in the caching module.
  * Support multi-threaded (one thread per disk) traversal of the
cache during cache manager start up.
  * Some tempfile allocation fixes, avoiding an infinite loop in
certain failure modes (e.g. broken disk).
  * Additional statistics (as submitted separately, but included here too).
  * Support failing quickly when workers are exhausted (e.g. due to
broken disks, overload) rather than
     causing slow modes of failure (max_active_workers).
  * A posix_fdatasync()/posix_fadvise() hack to avoid buffer cache
thrashing when pulling data into the cache (synchronous
     call - will block unrelated requests in the same worker).
  * Cache module uses prefix instead of postfix directory structure.
  * When allocating temp files, pass a prefix onto the tempfile so
that the tempfile ends up in the same
    directory as the final file. This allows the prefix tree to by a
symlink farm pointing to distinct drives,
    without breaking atomic rename() semantics.

-- 
/ Peter Schuller aka scode
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spdht.tar.gz
Type: application/x-gzip
Size: 12044 bytes
Desc: not available
URL: <http://nginx.org/pipermail/nginx/attachments/20091009/5a442b38/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0.7.61-spmisc.diff
Type: text/x-patch
Size: 102915 bytes
Desc: not available
URL: <http://nginx.org/pipermail/nginx/attachments/20091009/5a442b38/attachment-0001.bin>


More information about the nginx mailing list