[Bps-public-commit] www-mechanize branch, master, updated. 1.69_01-5-g958f42b
Jason May
jasonmay at bestpractical.com
Thu Sep 22 15:42:07 EDT 2011
The branch, master has been updated
via 958f42b261e04bb8a3ab02746307ebccf2a3601e (commit)
via 917cc36b29b6bf891b7051b468147cfeab674376 (commit)
via ab6942520bab4726b911a7a424cc725099e9ea46 (commit)
via 23e7f7b67b6ee57e85916e762ebdc75aa7850dbf (commit)
from 97b70589277947e9aac162c66624cd641deb9cb6 (commit)
Summary of changes:
lib/WWW/Mechanize.pm | 10 +++++--
lib/WWW/Mechanize/FAQ.pod | 52 +++++++---------------------------------
t/local/{click.t => content.t} | 6 +++-
t/local/log-server | 6 ++++-
4 files changed, 25 insertions(+), 49 deletions(-)
copy t/local/{click.t => content.t} (78%)
- Log -----------------------------------------------------------------
commit 23e7f7b67b6ee57e85916e762ebdc75aa7850dbf
Author: Lars Dɪá´á´á´á´á´¡ 迪ææ¯ <daxim at cpan.org>
Date: Tue Jul 12 21:16:12 2011 +0200
Improve docs about support of JavaScript
diff --git a/lib/WWW/Mechanize.pm b/lib/WWW/Mechanize.pm
index bb45180..7b36e15 100644
--- a/lib/WWW/Mechanize.pm
+++ b/lib/WWW/Mechanize.pm
@@ -100,8 +100,8 @@ you can also use any of L<LWP::UserAgent>'s methods.
$mech->add_header($name => $value);
-Please note that Mech does NOT support JavaScript. Please check the
-FAQ in WWW::Mechanize::FAQ for more.
+Please note that Mech does NOT support JavaScript, you need additional software
+for that. Please check L<WWW::Mechanize::FAQ/"JavaScript"> for more.
=head1 IMPORTANT LINKS
diff --git a/lib/WWW/Mechanize/FAQ.pod b/lib/WWW/Mechanize/FAQ.pod
index 91541c5..8707876 100644
--- a/lib/WWW/Mechanize/FAQ.pod
+++ b/lib/WWW/Mechanize/FAQ.pod
@@ -28,48 +28,9 @@ It does pretty much, but it doesn't support JavaScript.
I added some basic attempts at picking up URLs in C<window.open()>
calls and return them in C<< $mech->links >>. They work sometimes.
-Beyond that, there's no support for JavaScript.
-
-=head2 Are you going to add JavaScript support?
-
-I will if anyone sends me the code to do it. I'm not going to write a
-JavaScript processor myself.
-
-=head2 Wouldn't that be a great thing to have in WWW::Mechanize?
-
-Yes.
-
-=head2 Would it be hard to do?
-
-Hard enough that I don't want to deal with it myself. Plus, I don't
-use JavaScript myself, so I don't have an itch to scratch.
-
-=head2 Is anyone working on it?
-
-I've heard noises from people every so often over the past couple of
-years, but nothing you'd pin your hopes on.
-
-=head2 It would really help me with a project I'm working on.
-
-I'm sure it would.
-
-=head2 Do you know when it might get added?
-
-I have no idea if or when such a thing will ever get done. I can
-guarantee that as soon as there's anything close to JavaScript support
-I will let everyone know.
-
-=head2 Maybe I'll ask around and see if anyone else knows of a solution.
-
-If you must, but I doubt that anyone's written JavaScript support for
-Mechanize and neglected to tell me about it.
-
-=head2 So what can I do?
Since Javascript is completely visible to the client, it cannot be used
-to prevent a scraper from following links. But it can make life difficult,
-and until someone writes a Javascript interpreter for Perl or a Mechanize
-clone to control Firefox, there will be no general solution. But if
+to prevent a scraper from following links. But it can make life difficult. If
you want to scrape specific pages, then a solution is always possible.
One typical use of Javascript is to perform argument checking before
@@ -88,8 +49,8 @@ before and after URLs and save them to files. Edit each file, converting
the the argument separators ('?', '&' or ';') into newlines. Now it is
easy to use diff or comm to find out what Javascript did to the URL.
Step 2 - find the function call which created the URL - you will need
-to parse and interpret its argument list. Using the Javascript Debugger
-Extension for Firefox may help with the analysis. At this point, it is
+to parse and interpret its argument list. The Javascript Debugger in the
+Firebug extension for Firefox helps with the analysis. At this point, it is
fairly trivial to write your own function which emulates the Javascript
for the pages you want to process.
@@ -118,6 +79,11 @@ then redirects and cookies should not be a problem, but are listed here
for completeness. If you are missing headers, C<< $mech->add_header >>
can be used to add the headers that you need.
+=head2 Which modules work like Mechanize and have JavaScript support?
+
+In no particular order: L<Gtk2::WebKit::Mechanize>, L<Win32::IE::Mechanize>,
+L<WWW::Mechanize::Firefox>, L<WWW::Scripter>, L<WWW::Selenium>
+
=head1 How do I do X?
=head2 Can I do [such-and-such] with WWW::Mechanize?
commit ab6942520bab4726b911a7a424cc725099e9ea46
Author: David Precious <davidp at preshweb.co.uk>
Date: Tue May 10 18:33:05 2011 +0100
Recognise application/xhtml+xml as HTML.
This fixes Issue 162[1], making e.g. $mech->title() work for XHTML pages which
were returned with the application/xhtml+xml content type.
[1] http://code.google.com/p/www-mechanize/issues/detail?id=162
diff --git a/lib/WWW/Mechanize.pm b/lib/WWW/Mechanize.pm
index 7b36e15..cb86e59 100644
--- a/lib/WWW/Mechanize.pm
+++ b/lib/WWW/Mechanize.pm
@@ -559,7 +559,11 @@ sub status { my $self = shift; return $self->{status}; }
sub ct { my $self = shift; return $self->{ct}; }
sub content_type { my $self = shift; return $self->{ct}; }
sub base { my $self = shift; return $self->{base}; }
-sub is_html { my $self = shift; return defined $self->ct && ($self->ct eq 'text/html'); }
+sub is_html {
+ my $self = shift;
+ return defined $self->ct &&
+ ($self->ct eq 'text/html' || $self->ct eq 'application/xhtml+xml');
+}
=head2 $mech->title()
commit 917cc36b29b6bf891b7051b468147cfeab674376
Author: Jason May <jasonmay at bestpractical.com>
Date: Wed Sep 21 11:26:58 2011 -0400
Test that the application/xhtml+xml content type is supported
diff --git a/t/local/content.t b/t/local/content.t
new file mode 100644
index 0000000..a5e829a
--- /dev/null
+++ b/t/local/content.t
@@ -0,0 +1,35 @@
+#!perl
+
+use warnings;
+use strict;
+use lib 't/local';
+use LocalServer;
+use Test::More tests => 10;
+
+BEGIN {
+ delete @ENV{ grep { lc eq 'http_proxy' } keys %ENV };
+ delete @ENV{ qw( IFS CDPATH ENV BASH_ENV ) };
+ use_ok( 'WWW::Mechanize' );
+}
+
+my $mech = WWW::Mechanize->new();
+isa_ok( $mech, 'WWW::Mechanize', 'Created the object' );
+
+my $server = LocalServer->spawn();
+isa_ok( $server, 'LocalServer' );
+
+diag('Running tests against ' . $server->url . '?xml=1');
+my $response = $mech->get( $server->url . '?xml=1' );
+isa_ok( $response, 'HTTP::Response', 'Got back a response' );
+ok( $response->is_success, 'Got URL' ) or die q{Can't even fetch local url};
+is( $response->content_type, 'application/xhtml+xml', 'Content type is application/xhtml+xml' );
+ok( $mech->is_html, 'Local page is HTML' );
+
+$mech->field(query => 'foo'); # Filled the 'q' field
+
+$response = $mech->click('submit');
+isa_ok( $response, 'HTTP::Response', 'Got back a response' );
+ok( $response->is_success, q{Can click 'Go' ('Google Search' button)} );
+
+is( $mech->field('query'),'foo', 'Filled field correctly');
+
diff --git a/t/local/log-server b/t/local/log-server
index dfde1c5..7cb21bc 100755
--- a/t/local/log-server
+++ b/t/local/log-server
@@ -113,7 +113,11 @@ SERVERLOOP: {
'Connection' => 'close',
'Content-Length' => length($rbody),
], $rbody);
- $res->content_type('text/html');
+
+ $res->content_type(
+ $q->param('xml') ? 'application/xhtml+xml' : 'text/html'
+ );
+
debug "Request " . ($r->uri->path || "/");
};
};
commit 958f42b261e04bb8a3ab02746307ebccf2a3601e
Author: Jason May <jasonmay at bestpractical.com>
Date: Thu Sep 22 15:39:07 2011 -0400
Typo correction: optionaly -> optionally
diff --git a/lib/WWW/Mechanize/FAQ.pod b/lib/WWW/Mechanize/FAQ.pod
index 8707876..d20ac19 100644
--- a/lib/WWW/Mechanize/FAQ.pod
+++ b/lib/WWW/Mechanize/FAQ.pod
@@ -110,7 +110,7 @@ or get the specs from the environment:
=head2 How can I see what fields are on the forms?
-Use the mech-dump utility, optionaly installed with Mechanize.
+Use the mech-dump utility, optionally installed with Mechanize.
$ mech-dump --forms http://search.cpan.org
Dumping forms
-----------------------------------------------------------------------
More information about the Bps-public-commit
mailing list