Multiple Sessions
I am trying to scrape data from a website using VBA. The website requires me to login, perform a search and then the results are displayed, limited to so many records a page. I have set up 3 scraping sessions to handle the login, search and logout (there is a unique session reference that is generated each visit and this is used to make up the URL - I don't want to login every time I need to scrape the next page of results), but I am having no luck.
The following is my code: -
Dim strSessionID
Dim strTotalRecords
Dim rs As ADODB.Recordset
Dim i As Integer
'On Error GoTo Err_Handler
'Login
Call objSession.Initialize("Login")
Call objSession.Scrape
strSessionID = objSession.GetVariable("SESSION_ID")
MsgBox strSessionID
'Get the number of records returned from the search
Call objSession.Initialize("Search")
Call objSession.SetVariable("SESSION_ID", strSessionID)
Call objSession.Scrape
strTotalRecords = objSession.GetVariable("TOTAL_RECORDS")
MsgBox strTotalRecords
'Now do the search proper
Call objSession.StoreVariable("RESULTS")
MsgBox objSession.GetNumDataRecordsForDataSet("RESULTS")
If objSession.GetNumDataRecordsForDataSet("RESULTS") > 0 Then
Do Until i = objSession.GetNumDataRecordsForDataSet("RESULTS")
MsgBox objSession.GetDataSetValue("RESULTS", i, "ID")
MsgBox objSession.GetDataSetValue("RESULTS", i, "CASE_DESCRIPTION")
MsgBox objSession.GetDataSetValue("RESULTS", i, "KEYWORDS")
MsgBox objSession.GetDataSetValue("RESULTS", i, "CITATION")
MsgBox objSession.GetDataSetValue("RESULTS", i, "DOC_NO")
i = i + 1
Loop
End If
Call objSession.Disconnect
Set objSession = Nothing
'Logout
Call objSession.Initialize("LogOut")
Call objSession.SetVariable("SESSION_ID", strSessionID)
Call objSession.Scrape
Call objSession.Disconnect
Set objSession = Nothing
If I can this code to work I will use the variable strTotalRecords to loop through the pages of search results.
Can anyone help?
Multiple Sessions
Hi,
My best guess is that this is happening because you're converting the values to strings rather than ints. Try changing this line in your script:
If strNextPage < strTotalRecords Then
to this:
If CInt(strNextPage) < CInt(strTotalRecords) Then
If that still doesn't seemt to solve it feel free to post back.
Best wishes,
Todd
Multiple Sessions
Thanks Todd. I have done as suggested, and in principle everything seems to work fine. However, I am having problems looping through the search results pages. I now have a script that is called after a file is scraped - it tests the value of the current page and if it is less that the total number of pages it calls the scrape file again. For some reason the conditional statement always resolves to false, even though I am sure it should resolve to true: -
Dim strEndPage
Dim strNextPage
Dim strTotalRecords
Session.Log "Extracting the data and writing to a file"
strEndPage = Session.getVariable( "END_PAGE" )
strTotalRecords = Session.getVariable("TOTAL_RECORDS")
strNextPage = CStr(strEndPage+1)
Session.Log "End Page " & CStr(strNextPage)
Session.Log "Total records " & CStr(strTotalRecords)
If strNextPage < strTotalRecords Then
Call Session.SetVariable( "START_PAGE", CStr(strNextPage) )
Call Session.scrapeFile( "Search" )
Session.Log "Now scraping another page..."
Else
Session.Log "There's no more pages to scrape!"
End If
Can you help?
Multiple Sessions
Hi,
It's possible that in addition to the session ID you refer to in your scripts the site is using a cookie. If you check the "Last Request" and "Last Response" tabs in screen-scraper you should be able to see "Cookie" and "Set-Cookie" HTTP headers that would indicate this. If the site is using cookies, unfortunately, you wouldn't be able to transfer that across scraping sessions. Would it be viable for you to merge everything into a single scraping session, perhaps using a bit of scripting within screen-scraper to handle your existing logic?
Kind regards,
Todd Wilson