Package glue :: Package ligolw :: Package utils
[hide private]
[frames] | no frames]

Package utils

source code

Library of utility code for LIGO Light Weight XML applications.


Date: 2014-04-18 15:27:53 +0000

Author: Kipp Cannon <kipp.cannon@ligo.org>

Submodules [hide private]

Classes [hide private]
  RewindableInputFile
DON'T EVER USE THIS FOR ANYTHING! I'M NOT EVEN KIDDING!
  MD5File
Functions [hide private]
 
sort_files_by_size(filenames, verbose=False, reverse=False)
Return a list of the filenames sorted in order from smallest file to largest file (or largest to smallest if reverse is set to True).
source code
 
local_path_from_url(url)
For URLs that point to locations in the local filesystem, extract and return the filesystem path of the object to which they point.
source code
 
load_fileobj(fileobj, gz=None, xmldoc=None, contenthandler=None)
Parse the contents of the file object fileobj, and return the contents as a LIGO Light Weight document tree.
source code
 
load_filename(filename, verbose=False, **kwargs)
Parse the contents of the file identified by filename, and return the contents as a LIGO Light Weight document tree.
source code
 
load_url(url, verbose=False, **kwargs)
Parse the contents of file at the given URL and return the contents as a LIGO Light Weight document tree.
source code
 
write_fileobj(xmldoc, fileobj, gz=False, trap_signals=(signal.SIGTERM,signal.SIGTSTP), **kwargs)
Writes the LIGO Light Weight document tree rooted at xmldoc to the given file object.
source code
 
write_filename(xmldoc, filename, verbose=False, gz=False, **kwargs)
Writes the LIGO Light Weight document tree rooted at xmldoc to the file name filename.
source code
 
write_url(xmldoc, url, **kwargs)
Writes the LIGO Light Weight document tree rooted at xmldoc to the URL name url.
source code
Function Details [hide private]

sort_files_by_size(filenames, verbose=False, reverse=False)

source code 

Return a list of the filenames sorted in order from smallest file to largest file (or largest to smallest if reverse is set to True). If a filename in the list is None (used by many glue.ligolw based codes to indicate stdin), its size is treated as 0. The filenames may be any sequence, including generator expressions.

local_path_from_url(url)

source code 

For URLs that point to locations in the local filesystem, extract and return the filesystem path of the object to which they point. As a special case pass-through, if the URL is None, the return value is None. Raises ValueError if the URL is not None and does not point to a local file.

Example:

>>> print local_path_from_url(None)
None
>>> local_path_from_url("file:///home/me/somefile.xml.gz")
'/home/me/somefile.xml.gz'

load_fileobj(fileobj, gz=None, xmldoc=None, contenthandler=None)

source code 

Parse the contents of the file object fileobj, and return the contents as a LIGO Light Weight document tree. The file object does not need to be seekable.

If the gz parameter is None (the default) then gzip compressed data will be automatically detected and decompressed, otherwise decompression can be forced on or off by setting gz to True or False respectively.

If the optional xmldoc argument is provided and not None, the parsed XML tree will be appended to that document, otherwise a new document will be created. The return value is a tuple, the first element of the tuple is the XML document and the second is a string containing the MD5 digest in hex digits of the bytestream that was parsed.

Example:

>>> from glue.ligolw import ligolw
>>> import StringIO
>>> f = StringIO.StringIO('<?xml version="1.0" encoding="utf-8" ?><!DOCTYPE LIGO_LW SYSTEM "http://ldas-sw.ligo.caltech.edu/doc/ligolwAPI/html/ligolw_dtd.txt"><LIGO_LW><Table Name="demo:table"><Column Name="name" Type="lstring"/><Column Name="value" Type="real8"/><Stream Name="demo:table" Type="Local" Delimiter=",">"mass",0.5,"velocity",34</Stream></Table></LIGO_LW>')
>>> xmldoc, digest = load_fileobj(f, contenthandler = ligolw.LIGOLWContentHandler)
>>> digest
'03d1f513120051f4dbf3e3bc58ddfaa6'

The contenthandler argument specifies the SAX content handler to use when parsing the document. The contenthandler is a required argument, but for (temporary) backwards compatibility if it is omitted a default fallback is used and a warning is emitted. See the glue.ligolw package documentation for typical parsing scenario involving a custom content handler. See glue.ligolw.ligolw.PartialLIGOLWContentHandler and glue.ligolw.ligolw.FilteringLIGOLWContentHandler for examples of custom content handlers used to load subsets of documents into memory.

load_filename(filename, verbose=False, **kwargs)

source code 

Parse the contents of the file identified by filename, and return the contents as a LIGO Light Weight document tree. stdin is parsed if filename is None. Helpful verbosity messages are printed to stderr if verbose is True. All other keyword arguments are passed to load_fileobj(), see that function for more information. In particular note that a content handler must be specified.

Example:

>>> from glue.ligolw import ligolw
>>> xmldoc = load_filename(name, contenthandler = ligolw.LIGOLWContentHandler, verbose = True)

load_url(url, verbose=False, **kwargs)

source code 

Parse the contents of file at the given URL and return the contents as a LIGO Light Weight document tree. Any source from which Python's urllib2 library can read data is acceptable. stdin is parsed if url is None. Helpful verbosity messages are printed to stderr if verbose is True. All other keyword arguments are passed to load_fileobj(), see that function for more information. In particular note that a content handler must be specified.

Example:

>>> from glue.ligolw import ligolw
>>> xmldoc = load_url("file://localhost/tmp/data.xml", contenthandler = ligolw.LIGOLWContentHandler)

write_fileobj(xmldoc, fileobj, gz=False, trap_signals=(signal.SIGTERM,signal.SIGTSTP), **kwargs)

source code 

Writes the LIGO Light Weight document tree rooted at xmldoc to the given file object. Internally, the .write() method of the xmldoc object is invoked and any additional keyword arguments are passed to that method. The file object need not be seekable. The output data is gzip compressed on the fly if gz is True. The return value is a string containing the hex digits of the MD5 digest of the output bytestream.

This function traps the signals in the trap_signals iterable during the write process (the default is signal.SIGTERM and signal.SIGTSTP), and it does this by temporarily installing its own signal handlers in place of the current handlers. This is done to prevent Condor eviction during the write process. When the file write is concluded the original signal handlers are restored. Then, if signals were trapped during the write process, the signals are then resent to the current process in the order in which they were received. The signal.signal() system call cannot be invoked from threads, and trap_signals must be set to None or an empty sequence if this function is used from a thread.

Example:

>>> import sys
>>> write_fileobj(xmldoc, sys.stdout)

write_filename(xmldoc, filename, verbose=False, gz=False, **kwargs)

source code 

Writes the LIGO Light Weight document tree rooted at xmldoc to the file name filename. Friendly verbosity messages are printed while doing so if verbose is True. The output data is gzip compressed on the fly if gz is True.

Internally, write_fileobj() is used to perform the write. All additional keyword arguments are passed to write_fileobj().

Example:

>>> write_filename(xmldoc, "data.xml")

write_url(xmldoc, url, **kwargs)

source code 

Writes the LIGO Light Weight document tree rooted at xmldoc to the URL name url.

NOTE: only URLs that point to local files can be written to at this time. Internally, write_filename() is used to perform the write. All additional keyword arguments are passed to that function. The implementation might change in the future, especially if support for other types of URLs is ever added.

Example:

>>> write_url(xmldoc, "file:///data.xml")