282 lines
8.2 KiB
Plaintext
282 lines
8.2 KiB
Plaintext
|
=head1 NAME
|
||
|
|
||
|
lwpcook - libwww-perl cookbook
|
||
|
|
||
|
=head1 DESCRIPTION
|
||
|
|
||
|
This document contain some examples that show typical usage of the
|
||
|
libwww-perl library. You should consult the documentation for the
|
||
|
individual modules for more detail.
|
||
|
|
||
|
All examples should be runnable programs. You can, in most cases, test
|
||
|
the code sections by piping the program text directly to perl.
|
||
|
|
||
|
|
||
|
|
||
|
=head1 GET
|
||
|
|
||
|
It is very easy to use this library to just fetch documents from the
|
||
|
net. The LWP::Simple module provides the get() function that return
|
||
|
the document specified by its URL argument:
|
||
|
|
||
|
use LWP::Simple;
|
||
|
$doc = get 'http://www.sn.no/libwww-perl/';
|
||
|
|
||
|
or, as a perl one-liner using the getprint() function:
|
||
|
|
||
|
perl -MLWP::Simple -e 'getprint "http://www.sn.no/libwww-perl/"'
|
||
|
|
||
|
or, how about fetching the latest perl by running this command:
|
||
|
|
||
|
perl -MLWP::Simple -e '
|
||
|
getstore "ftp://ftp.sunet.se/pub/lang/perl/CPAN/src/latest.tar.gz",
|
||
|
"perl.tar.gz"'
|
||
|
|
||
|
You will probably first want to find a CPAN site closer to you by
|
||
|
running something like the following command:
|
||
|
|
||
|
perl -MLWP::Simple -e 'getprint "http://www.perl.com/perl/CPAN/CPAN.html"'
|
||
|
|
||
|
Enough of this simple stuff! The LWP object oriented interface gives
|
||
|
you more control over the request sent to the server. Using this
|
||
|
interface you have full control over headers sent and how you want to
|
||
|
handle the response returned.
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
$ua = new LWP::UserAgent;
|
||
|
$ua->agent("$0/0.1 " . $ua->agent);
|
||
|
# $ua->agent("Mozilla/8.0") # pretend we are very capable browser
|
||
|
|
||
|
$req = new HTTP::Request 'GET' => 'http://www.sn.no/libwww-perl';
|
||
|
$req->header('Accept' => 'text/html');
|
||
|
|
||
|
# send request
|
||
|
$res = $ua->request($req);
|
||
|
|
||
|
# check the outcome
|
||
|
if ($res->is_success) {
|
||
|
print $res->content;
|
||
|
} else {
|
||
|
print "Error: " . $res->status_line . "\n";
|
||
|
}
|
||
|
|
||
|
The lwp-request program (alias GET) that is distributed with the
|
||
|
library can also be used to fetch documents from WWW servers.
|
||
|
|
||
|
|
||
|
|
||
|
=head1 HEAD
|
||
|
|
||
|
If you just want to check if a document is present (i.e. the URL is
|
||
|
valid) try to run code that looks like this:
|
||
|
|
||
|
use LWP::Simple;
|
||
|
|
||
|
if (head($url)) {
|
||
|
# ok document exists
|
||
|
}
|
||
|
|
||
|
The head() function really returns a list of meta-information about
|
||
|
the document. The first three values of the list returned are the
|
||
|
document type, the size of the document, and the age of the document.
|
||
|
|
||
|
More control over the request or access to all header values returned
|
||
|
require that you use the object oriented interface described for GET
|
||
|
above. Just s/GET/HEAD/g.
|
||
|
|
||
|
|
||
|
=head1 POST
|
||
|
|
||
|
There is no simple procedural interface for posting data to a WWW server. You
|
||
|
must use the object oriented interface for this. The most common POST
|
||
|
operation is to access a WWW form application:
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
$ua = new LWP::UserAgent;
|
||
|
|
||
|
my $req = new HTTP::Request 'POST','http://www.perl.com/cgi-bin/BugGlimpse';
|
||
|
$req->content_type('application/x-www-form-urlencoded');
|
||
|
$req->content('match=www&errors=0');
|
||
|
|
||
|
my $res = $ua->request($req);
|
||
|
print $res->as_string;
|
||
|
|
||
|
Lazy people use the HTTP::Request::Common module to set up a suitable
|
||
|
POST request message (it handles all the escaping issues) and has a
|
||
|
suitable default for the content_type:
|
||
|
|
||
|
use HTTP::Request::Common qw(POST);
|
||
|
use LWP::UserAgent;
|
||
|
$ua = new LWP::UserAgent;
|
||
|
|
||
|
my $req = POST 'http://www.perl.com/cgi-bin/BugGlimpse',
|
||
|
[ search => 'www', errors => 0 ];
|
||
|
|
||
|
print $ua->request($req)->as_string;
|
||
|
|
||
|
The lwp-request program (alias POST) that is distributed with the
|
||
|
library can also be used for posting data.
|
||
|
|
||
|
|
||
|
|
||
|
=head1 PROXIES
|
||
|
|
||
|
Some sites use proxies to go through fire wall machines, or just as
|
||
|
cache in order to improve performance. Proxies can also be used for
|
||
|
accessing resources through protocols not supported directly (or
|
||
|
supported badly :-) by the libwww-perl library.
|
||
|
|
||
|
You should initialize your proxy setting before you start sending
|
||
|
requests:
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
$ua = new LWP::UserAgent;
|
||
|
$ua->env_proxy; # initialize from environment variables
|
||
|
# or
|
||
|
$ua->proxy(ftp => 'http://proxy.myorg.com');
|
||
|
$ua->proxy(wais => 'http://proxy.myorg.com');
|
||
|
$ua->no_proxy(qw(no se fi));
|
||
|
|
||
|
my $req = HTTP::Request->new(GET => 'wais://xxx.com/');
|
||
|
print $ua->request($req)->as_string;
|
||
|
|
||
|
The LWP::Simple interface will call env_proxy() for you automatically.
|
||
|
Applications that use the $ua->env_proxy() method will normally not
|
||
|
use the $ua->proxy() and $ua->no_proxy() methods.
|
||
|
|
||
|
Some proxies also require that you send it a username/password in
|
||
|
order to let requests through. You should be able to add the
|
||
|
required header, with something like this:
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
|
||
|
$ua = new LWP::UserAgent;
|
||
|
$ua->proxy(['http', 'ftp'] => 'http://proxy.myorg.com');
|
||
|
|
||
|
$req = new HTTP::Request 'GET',"http://www.perl.com";
|
||
|
$req->proxy_authorization_basic("proxy_user", "proxy_password");
|
||
|
|
||
|
$res = $ua->request($req);
|
||
|
print $res->content if $res->is_success;
|
||
|
|
||
|
Replace C<proxy.myorg.com>, C<proxy_user> and
|
||
|
C<proxy_password> with something suitable for your site.
|
||
|
|
||
|
|
||
|
=head1 ACCESS TO PROTECTED DOCUMENTS
|
||
|
|
||
|
Documents protected by basic authorization can easily be accessed
|
||
|
like this:
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
$ua = new LWP::UserAgent;
|
||
|
$req = new HTTP::Request GET => 'http://www.sn.no/secret/';
|
||
|
$req->authorization_basic('aas', 'mypassword');
|
||
|
print $ua->request($req)->as_string;
|
||
|
|
||
|
The other alternative is to provide a subclass of I<LWP::UserAgent> that
|
||
|
overrides the get_basic_credentials() method. Study the I<lwp-request>
|
||
|
program for an example of this.
|
||
|
|
||
|
=head1 HTTPS
|
||
|
|
||
|
URLs with https scheme are accessed in exactly the same way as with
|
||
|
http scheme, provided that an SSL interface module for LWP has been
|
||
|
properly installed (see the F<README.SSL> file found in the
|
||
|
libwww-perl distribution for more details). If no SSL interface is
|
||
|
installed for LWP to use, then you will get "501 Protocol scheme
|
||
|
'https' is not supported" errors when accessing such URLs.
|
||
|
|
||
|
Here's an example of fetching and printing a WWW page using SSL:
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
|
||
|
my $ua = LWP::UserAgent->new;
|
||
|
my $req = HTTP::Request->new(GET => 'https://www.helsinki.fi/');
|
||
|
my $res = $ua->request($req);
|
||
|
if ($res->is_success) {
|
||
|
print $res->as_string;
|
||
|
} else {
|
||
|
print "Failed: ", $res->status_line, "\n";
|
||
|
}
|
||
|
|
||
|
=head1 MIRRORING
|
||
|
|
||
|
If you want to mirror documents from a WWW server, then try to run
|
||
|
code similar to this at regular intervals:
|
||
|
|
||
|
use LWP::Simple;
|
||
|
|
||
|
%mirrors = (
|
||
|
'http://www.sn.no/' => 'sn.html',
|
||
|
'http://www.perl.com/' => 'perl.html',
|
||
|
'http://www.sn.no/libwww-perl/' => 'lwp.html',
|
||
|
'gopher://gopher.sn.no/' => 'gopher.html',
|
||
|
);
|
||
|
|
||
|
while (($url, $localfile) = each(%mirrors)) {
|
||
|
mirror($url, $localfile);
|
||
|
}
|
||
|
|
||
|
Or, as a perl one-liner:
|
||
|
|
||
|
perl -MLWP::Simple -e 'mirror("http://www.perl.com/", "perl.html")';
|
||
|
|
||
|
The document will not be transfered unless it has been updated.
|
||
|
|
||
|
|
||
|
|
||
|
=head1 LARGE DOCUMENTS
|
||
|
|
||
|
If the document you want to fetch is too large to be kept in memory,
|
||
|
then you have two alternatives. You can instruct the library to write
|
||
|
the document content to a file (second $ua->request() argument is a file
|
||
|
name):
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
$ua = new LWP::UserAgent;
|
||
|
|
||
|
my $req = new HTTP::Request 'GET',
|
||
|
'http://www.sn.no/~aas/perl/www/libwww-perl-5.00.tar.gz';
|
||
|
$res = $ua->request($req, "libwww-perl.tar.gz");
|
||
|
if ($res->is_success) {
|
||
|
print "ok\n";
|
||
|
}
|
||
|
|
||
|
Or you can process the document as it arrives (second $ua->request()
|
||
|
argument is a code reference):
|
||
|
|
||
|
use LWP::UserAgent;
|
||
|
$ua = new LWP::UserAgent;
|
||
|
$URL = 'ftp://ftp.unit.no/pub/rfc/rfc-index.txt';
|
||
|
|
||
|
my $expected_length;
|
||
|
my $bytes_received = 0;
|
||
|
$ua->request(HTTP::Request->new('GET', $URL),
|
||
|
sub {
|
||
|
my($chunk, $res) = @_;
|
||
|
$bytes_received += length($chunk);
|
||
|
unless (defined $expected_length) {
|
||
|
$expected_length = $res->content_length || 0;
|
||
|
}
|
||
|
if ($expected_length) {
|
||
|
printf STDERR "%d%% - ",
|
||
|
100 * $bytes_received / $expected_length;
|
||
|
}
|
||
|
print STDERR "$bytes_received bytes received\n";
|
||
|
|
||
|
# XXX Should really do something with the chunk itself
|
||
|
# print $chunk;
|
||
|
});
|
||
|
|
||
|
|
||
|
|
||
|
=head1 COPYRIGHT
|
||
|
|
||
|
Copyright 1996-1999, Gisle Aas
|
||
|
|
||
|
This library is free software; you can redistribute it and/or
|
||
|
modify it under the same terms as Perl itself.
|
||
|
|
||
|
|