Can you scrape sites where pages uses JavaScript to output forms?
I'm trying to scrape data from a website that consists of one web page. It has one URL, but it displays different forms. It displays a form, accepts the data in the form, then it displays another form onto the page. It uses JavaScript to display the forms. The JavaScript below is from the page that displays the forms. It appears to use a cookie to save the sessionID.
Is screen-scraper capable of scraping a site constructed in this way? If it is, what approach would you take?
The site name is - https://committing.efanniemae.com/eCommitting/eCommitting
Login info: ID: -------, PW: --------
[edited by admin]
function submitHandler(requestActionName,formName,submittedPageName) {
if (requestActionName != null) {
document[formName].actionResource.value=requestActionName;
}
if (submittedPageName != null) {
document[formName].submittedPage.value=submittedPageName;
}
for( i=0; i
}
document[formName].submit();
}
Gary, Because the whole world
Gary,
Because the whole world can see these posts I've gone ahead and removed the login information to your account.
I'll follow up with a comment on your question.
-Scott
Gary, So, there's a rule in
Gary,
So, there's a rule in screen-scraping that is true 90% of the time. The rule is, "don't try to understand what the Javascript is doing." The reason this is true 90% of the time is because, regardless of what the Javascript does to generate the request being made to the server, your job is simply to replicate that request. Let your proxy transactions be your guide.
In your case, let's say you needed to iterate over the different options at
Browse Prices > All Products
The following is a sample combination of selections. The items in parenthesis are the key/values being pasted in the POST payload.
Product Family (filterCriteria_selectedFamily_value): Flex (Flex)
Product (filterCriteria_selectedProduct_value): 10-Year Fixed Rate (4)
Remittance Type (filterCriteria_remittanceType_value): Actual/Actual (1)
In order to scrape the values before passing them you'll need to find them on the originating/requesting page.
For example, if you search the last response of the originating page for "filterCriteria_selectedFamily_value" you find a select tag with a bunch of Javascript in it...