HTMLUnit: тонны устаревшего контента и не могут создавать предупреждения объектов на getPage (), а затем сбой с исключением вызова setOuterHTML на getByXPath()
Я пытаюсь HTMLUnit автоматизировать загрузку данных с веб-приложения. Тем не менее, я получаю целый беспорядок предупреждений на getPage() (большинство из которых, похоже, имеют дело со связанными скриптами, которые мне даже не нужны), а затем фатальный com.gargoylesoftware.htmlunit.ScriptException: исключение, вызывающее setOuterHTML при попытке запустить getByXPath для извлечения данных, которые я ищу. И из-за ошибок, которые я получаю, я не могу понять, что происходит. У вас у всех есть идеи?
вот мой код:
import java.util.List;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class ScrapperApp {
private static void go() throws Exception {
HtmlPage nextPage;
String url = "http://media.ethics.ga.gov/search/Campaign/Campaign_Name.aspx?NameID=5751&FilerID=C2009000085&Type=candidate";
final WebClient webclient = new WebClient();
final HtmlPage page = webclient.getPage(url);
System.out.println("PULLING LINKS:");
List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//div[@class='hform1']/a[@class='lblentrylink']");
/*for(int x=0; x<articles.size(); x++) {
nextPage = articles.get(x).click();
System.out.println(nextPage.getBody());
}*/
}
public static void main(String[] args) throws Exception {
go();
System.out.println("COMPLETE");
}
}
и вот мой вывод на консоль:
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[443] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[448] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/urchin.js] line=[456] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:51 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:52 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:53 PM com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument execCommand
WARNING: Nothing done for execCommand(BackgroundImageCache, ...) (feature not implemented)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/search/theethics.css' [1621:72] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/search/theethics.css' [1621:72] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/search/theethics.css' [1722:1] Error in style sheet. (Invalid token ".123". Was expecting one of: <EOF>, <S>, <IDENT>, "<!--", "-->", <HASH>, <IMPORT_SYM>, <PAGE_SYM>, <MEDIA_SYM>, ".", ":", "*", "[", <ATKEYWORD>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [4:1] Error in style rule. (Invalid token ".". Was expecting one of: <S>, <LBRACE>, <COMMA>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [4:1] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [538:16] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=12a7FOCbnwgUAwtiPjKWh6wDEhgkTfdV9_FCfkqzSp1sZ_YdcvnAj941ZFWBBPCjl5RQqmB3TVerNjIRqn-QyCUV4dFAyyOktFPBtLE-ETB9nE-rPiQp_RNPyuD-NYO58_ngCw2&t=634516122000000000' [538:16] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [6:1] Error in style rule. (Invalid token ".". Was expecting one of: <S>, <LBRACE>, <COMMA>.)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [6:1] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [105:17] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [105:17] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error
WARNING: CSS error: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [160:16] Error in style rule. (Invalid token ":". Was expecting one of: <EOF>, <S>, <NUMBER>, "inherit", <IDENT>, <STRING>, <PLUS>, <COMMA>, <HASH>, <IMPORTANT_SYM>, <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <URI>, <FUNCTION>, "}", ";", "/", "-".)
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning
WARNING: CSS warning: 'http://media.ethics.ga.gov/Search/WebResource.axd?d=P_qivaU1jkjGS6yiS47lVyoi52Pqy5e8DnncH3bigK8349gyQVvRTapoSdHm45oIHlJhLQAhH3tEXp29b5hNLTwX4AdAh7qPU9_lVIhmQjWu1Kvx6RDeUrTdN4UrhhDIdOIrpOYk5RJGCyYDSr8ky9HSOiU1&t=634516122000000000' [160:16] Ignoring the following declarations in this rule.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[http://media.ethics.ga.gov/Search/Telerik.Web.UI.WebResource.axd?_TSM_HiddenField_=ctl00_ContentPlaceHolder1_RadScriptManager1_TSM&compress=1&_TSM_CombinedScripts_=%3b%3bSystem.Web.Extensions%2c+Version%3d3.5.0.0%2c+Culture%3dneutral%2c+PublicKeyToken%3d31bf3856ad364e35%3aen-US%3a7263e9c6-5962-41bc-b839-88b704bfcf0d%3aea597d4b%3ab25378d2%3bTelerik.Web.UI%2c+Version%3d2011.2.915.35%2c+Culture%3dneutral%2c+PublicKeyToken%3d121fae78165ba3d4%3aen-US%3a168ec6eb-791b-4159-8a0f-6c601196f873%3a16e4e7cd%3af7645509%3a24ee1bba%3af46195d3%3a19620875%3a874f8ea2%3a490a9d4e%3abd8f85e4%3bAjaxControlToolkit%2c+Version%3d3.0.20820.16598%2c+Culture%3dneutral%2c+PublicKeyToken%3d28f01b0e84b6d53e%3aen-US%3a707835dd-fa4b-41d1-89e7-6df5d518ffb5%3ab14bb7d5%3a13f47f54%3a369ef9d0%3a1d056c78%3adc2d6e36%3a5acd2e8e%3af8a45328] line=[997] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:54 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.7'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash.6'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.host.ActiveXObject jsConstructor
WARNING: Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError
SEVERE: runtimeError: message=[Automation server can't create object for 'ShockwaveFlash.ShockwaveFlash'.] sourceName=[http://www.google-analytics.com/ga.js] line=[24] lineSource=[null] lineOffset=[0]
Jul 2, 2013 6:19:55 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jul 2, 2013 6:19:56 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
PULLING LINKS:
Jul 2, 2013 6:19:56 PM com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl runSingleJob
SEVERE: Job run failed with unexpected RuntimeException: Exception invoking setOuterHTML
======= EXCEPTION START ========
Exception class=[java.lang.RuntimeException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking setOuterHTML
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:663)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:559)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:525)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:594)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:569)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:996)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:53)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:101)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:328)
at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:161)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.RuntimeException: Exception invoking setOuterHTML
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:163)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:287)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:359)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2659)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:509)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2364)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1601)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1595)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1248)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:415)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:274)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3132)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:107)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doRun(JavaScriptEngine.java:587)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:651)
... 10 more
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1023)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement$ProxyDomNode.appendChild(HTMLElement.java:1091)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.handleCharacters(HTMLParser.java:710)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endDocument(HTMLParser.java:718)
at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown Source)
at org.cyberneko.html.HTMLTagBalancer.endDocument(HTMLTagBalancer.java:510)
at org.cyberneko.html.filters.DefaultFilter.endDocument(DefaultFilter.java:213)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2116)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:818)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:162)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:121)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.parseHtmlSnippet(HTMLElement.java:1048)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1035)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:137)
... 26 more
Enclosed exception:
java.lang.RuntimeException: Exception invoking setOuterHTML
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:163)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$GetterSlot.setValue(ScriptableObject.java:287)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject$RelinkedSlot.setValue(ScriptableObject.java:359)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putImpl(ScriptableObject.java:2659)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.put(ScriptableObject.java:509)
at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.putProperty(ScriptableObject.java:2364)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1601)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.setObjectProp(ScriptRuntime.java:1595)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1248)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:109)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:415)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:274)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3132)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:107)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doRun(JavaScriptEngine.java:587)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:651)
at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:559)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:525)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:594)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:569)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:996)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptFunctionJob.runJavaScript(JavaScriptFunctionJob.java:53)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutionJob.run(JavaScriptExecutionJob.java:101)
at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptJobManagerImpl.runSingleJob(JavaScriptJobManagerImpl.java:328)
at com.gargoylesoftware.htmlunit.javascript.background.DefaultJavaScriptExecutor.run(DefaultJavaScriptExecutor.java:161)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.IllegalStateException: Previous sibling for HtmlDivision[<div style="height: 0px; overflow: hidden; border-top: solid black; border-top-width: thick;">] is null.
at com.gargoylesoftware.htmlunit.html.DomNode.insertBefore(DomNode.java:1023)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement$ProxyDomNode.appendChild(HTMLElement.java:1091)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.handleCharacters(HTMLParser.java:710)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endDocument(HTMLParser.java:718)
at org.apache.xerces.parsers.AbstractSAXParser.endDocument(Unknown Source)
at org.cyberneko.html.HTMLTagBalancer.endDocument(HTMLTagBalancer.java:510)
at org.cyberneko.html.filters.DefaultFilter.endDocument(DefaultFilter.java:213)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2116)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:918)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:818)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:162)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseFragment(HTMLParser.java:121)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.parseHtmlSnippet(HTMLElement.java:1048)
at com.gargoylesoftware.htmlunit.javascript.host.html.HTMLElement.setOuterHTML(HTMLElement.java:1035)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at net.sourceforge.htmlunit.corejs.javascript.MemberBox.invoke(MemberBox.java:137)
... 26 more
== CALLING JAVASCRIPT ==
function () {
return b.apply(a, arguments);
}
======= EXCEPTION END ========
COMPLETE
1 ответов
ошибка происходит из . Попробуйте имитировать хром:
final WebClient webclient = new WebClient(BrowserVersion.CHROME);
добавлена ссылка для подавления предупреждений HtmlUnit.
кроме того, ваш XPath ничего не находит (я тестировал в Chrome). Я использовал другой для примера:
import java.util.List;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.html.HtmlAnchor;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class ScrapperApp {
private static void go() throws Exception {
/* turn off annoying htmlunit warnings */
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
HtmlPage nextPage;
String url = "http://media.ethics.ga.gov/search/Campaign/Campaign_Name.aspx?NameID=5751&FilerID=C2009000085&Type=candidate";
final WebClient webclient = new WebClient(BrowserVersion.CHROME);
final HtmlPage page = webclient.getPage(url);
System.out.println("PULLING LINKS:");
List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//a[@class='lblentrylink']");
//List<HtmlAnchor> articles = (List<HtmlAnchor>) page.getByXPath("//div[@class='hform1']/a[@class='lblentrylink']");
for(int x=0; x<articles.size(); x++) {
System.out.println("Clicking "+articles.get(x).asText());
//nextPage = articles.get(x).click();
//System.out.println(nextPage.getBody());
}
}
public static void main(String[] args) throws Exception {
go();
System.out.println("COMPLETE");
}
}