This document describes a few perl modules. Usually Perl programs are documented via POD but Perl programs are portable across platforms and MAN pages are only available on UNIX. This is the reason why I decided to use HTML for documenting my modules.
The software can be used and redistributed according to GPL or GPL for the Czech Republic. See http://icebearsoft.euweb.cz/czgpl/ for detailed information.
The modules and scripts require Perl5 and some of them are object-oriented. I have seen
Windows
implementation of Perl5 which does not recognize the new
method. Be sure that
you have the correct version of Perl.
Before you install the files, you have to convert line endings. Since the development is done on
OS/2, all files have DOS line endings. Afterwards you switch to the main directory of this
distribution and run install.pl
. It recognizes the following options:
Default library directory is taken first from $ENV{'PERL5LIB'}
,
then from $ENV{'PERLLIB'}
if it does not contain path separator.
Default binary directory is the first segment of $ENV{'PATH'}
which starts with the contents of $ENV{'ZWPERL'}
(if it is defined).
Default CGI directory is taken from $ENV{'CGIDIR'}
.
Default HTML directory is taken from $ENV{'HTMLDOC'}
.
Question: Is it safe to call flip -u **/*
or dos2unix **/*
,
possibly repeatedly? If so, the next version will check existence of these programs and convert the
line endings automatically.
Comment: Since it is not necesary to put the files to the system directories and paths can be defined in the environment variables in the user's profile, it is not necessary to have root permissions and any user can install the package himself or herself. Anyway, if it seems to be usable for all users in the system, it is better to ask the root for system-wide installation.
Tip: If you do not wish to install one of the directries in this distribution, just
specify /dev/null
as the installation path.
The modules and scripts were developed and tested on OS/2 Warp 3.0 and 4.0. Unlike UNIX, OS/2 distinguishes ASCII and binary files. The scripts therefore contain BINMODE. This should be harmless on UNIX systems. If it causes problems, let me know and I will make a switch.
Moreover, all supplied files have DOS line endings. Most probably you must convert them before the modules can be used on UNIX systems.
This document describes the following modules and scripts:
This is a simple module which is useful mainly for debugging. It is able to list scalar variables, arrays and hashes to standard output. It is the very first module which I wrote (because my first script did not work and I did not know why) therefore it is really very simple. The module is superceded by PrintList.pm.
Requirements: Exporter
See also: ZWurl.pm
The module uses EXPORT_OK
. It is therefore necessary to list all
functions which you want to use. See the example at the end of the description.
This function lists the contents of an array. The first argument is the title which will appear on the printout. If the first argument is identical to the name of the array, the second argument may be omitted. Thus the following two statements have the same meaning:
list_array 'my_array', @my_array; list_array 'my_array';
This function lists the contents of a hash. It displays keys and associated values but
it does not recurse it the value is not a scalar. The second argument may be omitted if
the text is identical to the name of the hash variable. The logic is similar as in
list_array
above.
This function lists a scalar variable. You should only specify the name of the variable as a simple text. The following code
$test = 'This is a test'; list_scalar 'test';
will display:
test = This is a test
This function is very similar to the function above. It allows to specify an array of names of scalar variables, e.g.:
$a = 1; $b = 'word'; $c = 'This is some text.'; list_scalars 'a', 'b', 'c';
This is a script which was used for testing the module. The words are in Czech without accents...
#!perl5 use IceBearSoft::Zwebfun; use IceBearSoft::ZWdebug(list_hash, list_array, list_scalar, list_scalars); $wt = 3; $request{'method'}='OPTIONS'; $request{'url'}='http://localhost/'; list_hash 'request', \%request; sleep($wt); %other = %request; list_hash 'other', \%other; sleep($wt); list_hash 'pokus', {'a'=>'jedna', 'b'=>'dva'}; sleep($wt); %other = &makeHash; list_hash 'Funkce?', \%other; sleep($wt); list_hash 'Funkce???', {&makeHash}; sleep($wt); list_hash 'Return hash', &retHash; sleep($wt); list_hash 'Return hash', retHash(); sleep($wt); $a = 1; $b = 'slovo'; $c = 'Toto je delsi text.'; list_scalar 'a'; sleep($wt); list_scalar 'b'; sleep($wt); list_scalar 'c'; sleep($wt); list_scalars 'a', 'b', 'c'; sleep($wt); list_array 'array', ['a', 'b', 'c']; sleep($wt); list_hash 'request'; sleep($wt); list_hash 'request 2', \%request; sleep($wt); list_hash 'Hash', {'a'=>$a, 'b'=>$b, 'c'=>$c}; sleep($wt); list_array 'Array', [$a, $b, $c]; sleep($wt); exit; sub makeHash { my %hash=('titul'=>'wizard','jmeno'=>'Gandalf'); return %hash; } sub retHash { return {'titul'=>'wizard','jmeno'=>'Gandalf'}; }
This module provides an object which can list the contents of hashes and arrays as well as the values of scalars. It can even list internal variables of objects which are in fact special types of hashes. In most cases only one global instance is needed but you can have any nuber of instances as you like. The object keeps the list of references of arrays and hashes already listed. For instance, the hash rapresentation of the DOM tree may consist of a hash of the document node containing the list of references to the hashes representing the nested nodes. The hash of each node will contain a reference to the hash of the document node and/or its parent node. Listing such a tree would cause infinite loop. This object will therefore write already listed in such cases.
Requirements: Exporter
This function instantiates the object. It has no parameters.
This is the only function for the user. It requires one or two arguments. The first argument
must be a scalar or a reference. Its value or contents will be printed, the nested elements are
indented. The second argument is optional. It must be a string which will be displayed as the
title. The function writes its result to STDOUT
. You can use select
to
redirect it elsewhere.
This module provides an object which enables communication via HTTP protocol with a WWW server. It was originally inspired by getclient.pl developed by Mark Gaither on 12 Feb 1994 (about tea time). The script was found at www.webtechs.com. Later it was rewritten and some features of HTTP/1.1 were added. Afterwards it was again enhanced because some HTTP/1.0 servers are ill programmed and stop communicating if they see HTTP/1.1 request (even if the request is actually HTTP/1.0 but the ability of sending HTTP/1.1 is signalled, some servers fail to communicate). Therefore, if the server sends HTTP/1.0 response, the module will note it and all further requests to that server will only be HTTP/1.0. The module also tries to use KeepAlive when communicating with HTTP/1.1 servers but this feature is useful only if the object is used several times from the same program.
Requirements: Carp, Socket, Exporter
This function instantiates the object. It has no parameters and is called simply:
$http = new IceBearSoft::Zwebfun;
The function initializes the object and calls the system program hostname
in order to obtain the network name of self. It is then used to get the IP address. Be
sure that you have hostname
properly installed and that it is found without
specifying its full path. Alternatively you should modify the new
function.
This is the only function for you. It accepts parameters supplied in a hash or preferably as a pointer to a hash, performs HTTP communication and returns the result as a pointer to another hash.
The parameter, containing the request, is a hash in which the following keys are recognized:
Byte-Range
is specified and recognized by the
server (it must respond with status 206), the module supports partial downloading and will
append the received body to the file.
'User-Agent: My perl script'This is the only way of sending a single header in case that you supply the hash directly as an argument to
http
. If you wish to specify two or more headers, you must
send a pointer to your hash as an argument.
You will mostly supply headers Connection: close
in order to inhibit
KeepAlive on HTTP/1.1 connections (it may sometimes cause problems) and Byte-Range:
...
for partial downloading.
The response contains the copies of fields file, proxy and url unless it finishes too early due to another error. In case of a serious error where connection cannot be established, field Error-Message is filled with an explanation text. In case of successfull connection all response lines are stored in the hash. The first response line does not have any name and is stored as Status-Line in the hash. For easier use it is also parsed into Status, Sub-Status, Protocol-Type (always HTTP but may change in future versions), Protocol-Version, Reason-Phrase. Sub-Status is not usually used, only some servers respond with status as 404.1 (the number after the period is used as a Sub-Status). The number of bytes received in this connection is stored in Bytes-Received. It may not be the same as Content-Length. Remember that Content-Length is the size of the object reported by the server while Bytes-Received is the number of bytes which were actually received by the module.
This is a timeout value in seconds. If the module reads data from the server and no data are received within this time from the last reading, connection is closed. The default value is 30. You can change it to any value, e.g. to 60 by the following statement:
$this->{'read-timeout'} = 60;Remember that too low value will cause almost all connections to be closed and too high value will diagnose lost connection after long time.
No example is available here. However, you can study scripts httpl.pl and http-retrieve.pl which are also distributed within this package.
This module provides a simple SGML parser. It does not use DTD. It can be used if DTD
is not available. The parser can also parse SGML files which contain errors. It is useful
mainly in cases when it is not necessary to parse the whole document but only contents of
some tags are important. The parser returns only the tags and attributes or the plain
text. It does not provide any structure information (unlike e.g. SGMLS). If you want to
know whether the plain text is inside a tag, you have to maintain your own stack. However,
ZWsgml doesn not use DTD and thus the parser does not know that e.g.
<BR>
in HTML does not have ending tag (XML would use
<BR/>
). Some ending tags may be omitted but ZWsgml will not be able to
recognize it. Therefore, if you really need such information, you should preferably use
another tool which parses documents according to DTD. The module is superceded by
IceBearSoft::Xsgml
which is distributes separately. See http://www.icebearsoft.cz/icebearsoft.euweb.cz/sw.php.
The module defines one object and one plain function.
Requirements: Exporter
This function instantiates the object. It requires one argument which is the file handle of a document to be parsed. See the example at the end of this chapter.
This function reads a line into an internal buffer. You may use it if you wish to
redesign the parser. Do not mix it with calls to nextTag
. The parser stores
its state in internal variables and if you mix calls to both functions, unpredictable
things will happen. You will never call this function in normal situation.
This function returns an array of two strings. The first string is a name of the tag (always in lowercase), the second string contains all attributes with their values. If no attributes are present, the second array element is an empty string. The plain text is returned in the second array element and the first element is an empty string. If both elements are empty, the end of file was reached. Notice in the example below that it is necessary to check the length of the elements. The contents of any element may contain a single digit 0 which may be incorrectly considered as false (it really happens with one HTML file from Apache documentation). If the elements are just tested for true and false, this condition may be incorrectly considered an end of file.
This is a plain function. It requires the second element of an array returned by
nextTag
and splits it into a hash. Attribute names serve as keys in the hash
and are always converted to lowercase. Remember that some attributes do not have a value
as e.g. <ol compact>
in HTML. In such case the value is an empty
string.
The function checks the first character after the equal sign. If it is a quote, the function assumes that the attribute value is delimited by quotes and will look for everything up to the next quote. The quotes will be removed from the attribute value. If the equal sign is followed by any other non-blank character, the function assumes that the value is terminated by the first white space.
The following scripts parses the SGML document and displays the contents. First the
$parser
instance is created. At the beginning of a loop we call
$parser->nextTag
. If a tag is found, it is printed together with the
string of attributes. The attributes are then split and the hash is displayed. The
else
part of the condition displays the plain text. Notice that we check the
length of strings returned from $parser->nextTag
. If we used while
($k || $v)
, the loop would incorrectly stop e.g. in case that the plain text
contains only a single digit 0.
#!perl5 # Test of ZWsgml.pm, 31 Dec 1999 use IceBearSoft::ZWsgml; ($fn, $rest) = @ARGV; if ($rest) { die "Superfluous arguments: $rest\n"; } open (SGML, $fn) or die "Can't open $fn\n"; $parser = new IceBearSoft::ZWsgml (\*SGML); # open the file do { ($k, $v) = $parser->nextTag; # get next tag if (length($k) > 0) { # tag found print "\n<", $k, "> ", $v if $k || $v; if ($v && substr($k,0,1) ne '!') { # comments ignored my %a = attributes($v); # get attributes and store them in a hash foreach $key (sort(keys %a)) { print "\n==> ", $key, ' = ', $a{$key}; } } } else { # print plain text (outside tags) print "\n->", $v if length($v) > 0; } } while (length($k) + length($v) > 0); close SGML;
This module provides a set of functions for operations with URLs.
Requirements: Exporter
The module uses EXPORT_OK
. It is therefore necessary to list all
functions which you want to use.
This function accepts two strings, a referrer and a relative path. It returns an
absolute path. All occurences of '..
' are removed and replaced with correct
directory names. It is not an error to supply an absolute path as the second argument. The
function will just normalize it by removing '..
'. The function will return an
undefined value if the referrer is invalid. The function works only with HTTP and FTP
URLs.
This function splits the URL to its elements. The elements are returned in a hash. HTTP and FTP URLs will be split to 'scheme', 'host', 'port' (optionally), 'object' and 'label' or 'search' if present. Values of 'scheme' and 'host' are always lowercase. The colons are not considered a part of scheme and port specifications and the double slash is not a part of a host name. They are always stripped off. Also the leading question mark in search and the hash mark in label are deleted. If the object ends with '.' or '..', terminating slash is added.
The function can also split other URLs as mailto or news as defined in RFC1738. These URLs contain only scheme and object.
The hash will also contain element with name 'url' which is the original URL before splitting.
This function accepts a hash in a form as returned by split_url
and merges
the parts into a URL. It will ignore 'url' key if it is present. RFC 1738 does not allow
usage of both a label and a search within URL but RFC 2048 says that the search parameters
should appear before label. This function builds the URL according to RFC 2048.
This function URL-decodes the string supplied as an argument.
This function URL-encodes the string supplied as an argument. It encodes only the main separators and spaces. RFC1738 specifies other unsafe characters which are left unchaged by this function.
This function accepts a referrer and an absolute path and changes it to a relative path if possible. It works with HTTP and FTP URLs only. It may give wrong results if the second argument is a relative path. The best way for use is:
$local_path = localize_url $referrer, (abs_url $referrer, $other_path);
This function returns the query string both for the GET and POST method. It is not ready for form-based file upload. The function should be called only once. The second call may cause errors.
This function parses the supplied query string. If no string is supplied, the function
will obtain the query string by call to query_string
. The result is returned
in a hash. If a name contains more values as with <select multiple>
or
<input type=checkbox>
, the value is changed into a pointer to an array
of strings. Remember that the function does not know in advance that the name can be
associated with two or more values. Conversion of a string to a pointer to an array just
occurs only at a moment when the second value is found.
All keys and values are automatically URL decoded. Do not call url_decode
yourself, it will spoil your data.
This function accepts a string with cookies and parses them into a hash. The cookie names as well as their values are automatically URL decoded. If no argument is specified, the function gets the string from the environment set by the WWW server.
This function is made for OS/2 but may be useful elsewhere. OS/2 does not allow to use
//
within command line arguments. Therefore this function prepends optionally
http://
. It will do nothing if the URL supplied as an argument already
contains a scheme. The function should only be used for preparation of URL for
Zwebfun
or similar functions or objects. The function may fail with lots of
valid URLs of different kinds.
The function returns an English name of the day. The argument must have the form as
returned by gmtime
.
The function returns an English name of the month. The argument must have the form as
returned by gmtime
.
This function returns a short version of date and time. It requires an argument in the
same form as a result from the time
function.
This function returns a long version of date and time which is used in cookies. It
requires an argument in the same form as a result from the time
function.
This function accepts a pointer to a hash containing the following variables:
time
The function returns the string in the form usable in Set-Cookie
. The name
and value are automatically URL encoded.
This function accepts a single variable which may be a scalar or any type of pointer. The contents is then printed to the default output in HTML. Scalars are printed as they are, arrays are printed as enumerations of elements, hashes are printed as key/value pairs and other types of pointers are just displayed as the name of the type. The function examines the types of elements of arrays and hashes. If the element is a pointer, the function is called recursively.
The following code was used for testing. The URLs used are just groups of characters, it is not guaranteed that such objects really exist somewhere on the Internet. Notice that localization of local (not absolute) paths may give wrong results.
#!perl5 use IceBearSoft::ZWurl(url_encode, url_decode, split_url, abs_url, localize_url, http_url); use IceBearSoft::ZWdebug(list_hash); # Test URL encoding/decoding $a = "a + b?\015\012\011&=5%"; $b = url_encode $a; $c = url_decode $b; print $a, "\n"; print $b, "\n"; print $c, "\n"; print "OK\n\n" if $a eq $c; print "Failed!\n\n" if $a ne $c; # Test URL splitting %u = split_url 'http://www.cz/dir/subdir/file.html#nic'; list_hash 'URL splitting', \%u; %u = split_url 'http://www.cz/dir/subdir/file.html'; list_hash 'URL splitting', \%u; %u = split_url 'http://www.cz/dir/subdir/file.html?a=b&c=d'; list_hash 'URL splitting', \%u; %u = split_url 'http://www.cz:10954/dir/subdir/'; list_hash 'URL splitting', \%u; %u = split_url 'http://www.cz:1095'; list_hash 'URL splitting', \%u; %u = split_url 'http://www.cz'; list_hash 'URL splitting', \%u; # Test abs path $r = 'http://www.icpf.cas.cz/users/wagner/default.htm'; print "\nReferer = $r\n"; $a = 'index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = '../wagner/index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = '../index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = '../kocka/index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = '/index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = 'kocka/index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = '../../kocka/index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = '//www.cz/index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = 'http://www.cz/index.html'; print "$a => ", (abs_url $r, $a), "\n"; $a = './'; print "$a => ", (abs_url $r, $a), "\n"; $a = '.'; print "$a => ", (abs_url $r, $a), "\n"; $a = '..'; print "$a => ", (abs_url $r, $a), "\n"; $a = '../'; print "$a => ", (abs_url $r, $a), "\n"; # Test localization sub loc($$); print "\nReferer for loc test = $r\n"; loc $r, 'index.html'; loc $r, '../index.html'; loc $r, '../wagner/index.html'; loc $r, '../kocka/index.html'; loc $r, '/index.html'; loc $r, '//www.cz/index.html'; loc $r, 'http://www.icpf.cas.cz/'; loc $r, 'http://www.icpf.cas.cz'; loc $r, './'; loc $r, '.'; loc $r, '..'; loc $r, '../'; # Test http_url sub ht($); print "\nhttp_url test\n"; ht 'http://www.icpf.cas.cz'; ht 'ftp://www.icpf.cas.cz'; ht '//www.icpf.cas.cz/wagner/'; ht 'www.icpf.cas.cz/wagner/frame.html'; # Subroutines sub loc { my ($r, $a) = @_; print "$a => ", (localize_url $r, $a), "\n"; my $b = abs_url $r, $a; print "$b => ", (localize_url $r, $b), "\n"; } sub ht { my $x = shift; print "$x -> ", (http_url $x), "\n"; }
Object oriented langages as C++ or Java support polymorphism via abstract classes and virtual methods. On the contrary, perl does not offer any type checking. However, sometimes you may need to verify that the reference corresponds to an object derived from a particular class.
This package defines a single function which gets either a package
name or an object reference and returns a top-down list of all
package names ending usually with Exporter
. This can be used if you
wish to find whether the object is derived from a particular class.
Suppose that you wish to check whether your object $obj
is a
MyPackage
or derived from it by reblessing. You can use:
die 'Invalid object type' unless scalar(grep(/^MyPackage$/, isA $obj));
You can also supply the regular expression of valid types as the
second optional parameter, e.g. isA $obj, '^My(First|Second)Package$'
.
This will check whether the object is MyFirstPackage
or
MySecondPackage
. In list context the function will return all
matches, in scalar context it returns number of matches. You can thus
write:
die 'Invalid object type' unless scalar(isA $obj, '^My(First|Second)Package$');
or just simply
die 'Invalid object type' unless isA $obj, '^My(First|Second)Package$';
This script performs a single HTTP request. The result is displayed on screen but the body of the response may be stored in a file.
Requirements: Getopt::Long, Carp, IceBearSoft::ZWdebug(list_hash), IceBearSoft::ZWurl(http_url), IceBearSoft::Zwebfun
The script is invoked with the following command line options:
The URL must be specified either in --url
or as a combination of
--host
and --object
. The URL is then completed by call to
http_url
. Specification of --bytes
is used together with
--file
for partial download. It is, however, more comfortably achieved by
http-retrieve.pl
. Notice that the script sets
header Connection: close
in order to prevent KeepAlive.
This script serves for partial downloading of files over HTTP/1.1. It may even be used if the part of the file to be downloaded already exists.
Requirements: Getopt::Long, Carp, IceBearSoft::ZWdebug(list_hash), IceBearSoft::ZWurl(http_url), IceBearSoft::Zwebfun
The script accepts following command line options:
URL will be completed by call to http_url
. The file may already exist and
the script will immediately start in the append mode. You may specify maximum number of
retries, default value is 50. If the script fails after specified number of retries, it is
still possible to run it again with the same URL and file name specification.
Known bug: the script does not verify the Last-Modified
response
field. It is therefore possible that you will append a new object to an old file which
will result in a mess. However, the whole response is always displayed on screen, so you
can verify it yourself.
This REXX script was developed for OS/2. It was necessary to specify full path of the
Perl script because OS/2 strictly requires a backslash as a directory separator while Perl
needs a forward slash. The script takes all arguments preceded by the minus sign as
options for perl. The first argument, which does not start with minus, is a name of the
Perl script. This REXX script optionally adds .pl
extension. The Perl script
is then looked for in the current directory and then accross %PATH%
. The full
path of the Perl script is then used. Remaining arguments are sent as options to the Perl
script.
This script is used to verify links on the web pages. It comes with its own documentation which is installed to your web server. You can also read it locally from this distribution.
Z. Wagner - Ice Bear Soft, http://icebearsoft.euweb.cz