Logging in to Google Analytics & Youtube
Hello
I am demoing the pro version of screen-scraper for a project that requires logging into my google analytics and youtube accounts in order to scrape a bunch of data. However, after working through tutorial #2 and reviewing the forums and other documentation, I am still unable to figure out how to get screen-scraper to login to these sites. Any help you can provide is greatly appreciated.
Google Analytics Login Page:
https://accounts.google.com/ServiceLogin?service=analytics&passive=true&nui=1&hl=en&continue=https://www.google.com/analytics/settings/&followup=https://www.google.com/analytics/settings/
Youtube Login Page:
https://accounts.google.com/ServiceLogin?uilel=3&service=youtube&passive=true&continue=http%3A%2F%2Fwww.youtube.com%2Fsignin%3Faction_handle_signin%3Dtrue%26nomobiletemp%3D1%26hl%3Den_US%26next%3D%252Fopenmichigan&hl=en_US<mpl=sso
Can screen-scraper login to these sites?
Also, can screen-scraper login to slideshare.net?
Thank you
Screen-scraper can log into
Screen-scraper can log into to a google account. It looks like you need to scrape a couple of values from that login form page, and pass them to the page that logs in. They are 'GALX' and 'dsh', then it should work fine.
Thanks, Jason. However, I am
Thanks, Jason. However, I am still unsuccessful.
Can you share your settings/process for scraping the login form page? I am unable to scrape it. I have tried deleting all cookies and under Progress > I have unchecked both "Filter out less useful transactions" and "Don't record binary files" boxes. However,the proxy is still not capturing the google login page found here:
https://accounts.google.com/ServiceLogin?hl=en&continue=https://www.google.com/
I can see the values 'GALX' and 'dsh' by viewing the source code of the login form page in my browser, but it appears that the screen-scraper proxy is not capturing this page.
Any ideas? I am working on a mac, using opera.
Thank you
Copy the below into a text
Copy the below into a text editor, and save it as 'analytics.sss', and then import the file into screen-scraper. Once done, you need to edit the init script to have your credentials, but it should work.
<scraping-session use-strict-mode="true"><script-instances><script-instances when-to-run="10" sequence="1" enabled="true"><script><script-text>session.setVariable("USER", "username");
session.setVariable("PASS", "password");</script-text><name>Analytics--init</name><language>Interpreted Java</language></script></script-instances><owner-type>ScrapingSession</owner-type><owner-name>Analytics</owner-name></script-instances><name>Analytics</name><notes></notes><cookiePolicy>0</cookiePolicy><maxHTTPRequests>1</maxHTTPRequests><external_proxy_username></external_proxy_username><external_proxy_password></external_proxy_password><external_proxy_host></external_proxy_host><external_proxy_port></external_proxy_port><external_nt_proxy_username></external_nt_proxy_username><external_nt_proxy_password></external_nt_proxy_password><external_nt_proxy_domain></external_nt_proxy_domain><external_nt_proxy_host></external_nt_proxy_host><anonymize>false</anonymize><terminate_proxies_on_completion>false</terminate_proxies_on_completion><number_of_required_proxies>5</number_of_required_proxies><originator_edition>2</originator_edition><logging_level>1</logging_level><date_exported>January 10, 2012 14:51:35</date_exported><character_set>UTF-8</character_set><created_by_version>5.5.34a</created_by_version><scrapeable-files sequence="2" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>https://accounts.google.com/ServiceLoginAuth</URL><last-request></last-request><name>Login</name><HTTPParameters sequence="10"><key>Email</key><type>POST</type><value>~#USER#~</value></HTTPParameters><HTTPParameters sequence="11"><key>Passwd</key><type>POST</type><value>~#PASS#~</value></HTTPParameters><HTTPParameters sequence="2"><key>followup</key><type>POST</type><value>https://www.google.com/analytics/settings/</value></HTTPParameters><HTTPParameters sequence="3"><key>service</key><type>POST</type><value>analytics</value></HTTPParameters><HTTPParameters sequence="9"><key>secTok</key><type>POST</type><value></value></HTTPParameters><HTTPParameters sequence="12"><key>signIn</key><type>POST</type><value>Sign in</value></HTTPParameters><HTTPParameters sequence="5"><key>dsh</key><type>POST</type><value>~#DSH#~</value></HTTPParameters><HTTPParameters sequence="4"><key>nui</key><type>POST</type><value>1</value></HTTPParameters><HTTPParameters sequence="7"><key>GALX</key><type>POST</type><value>~#GALX#~</value></HTTPParameters><HTTPParameters sequence="14"><key>rmShown</key><type>POST</type><value>1</value></HTTPParameters><HTTPParameters sequence="13"><key>PersistentCookie</key><type>POST</type><value>yes</value></HTTPParameters><HTTPParameters sequence="1"><key>continue</key><type>POST</type><value>https://www.google.com/analytics/settings/</value></HTTPParameters><HTTPParameters sequence="6"><key>hl</key><type>POST</type><value>en</value></HTTPParameters><HTTPParameters sequence="8"><key>timeStmp</key><type>POST</type><value></value></HTTPParameters><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>Login</owner-name></script-instances></scrapeable-files><scrapeable-files sequence="1" will-be-invoked-manually="false" tidy-html="jtidy"><last-scraped-data></last-scraped-data><URL>https://accounts.google.com/ServiceLogin</URL><last-request></last-request><name>Login page</name><extractor-patterns sequence="2" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>"GALX" value="~@GALX@~"</pattern-text><identifier>galx</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="true" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[^"]*</regular-expression><identifier>GALX</identifier></extractor-pattern-tokens><script-instances><owner-type>ExtractorPattern</owner-type><owner-name>galx</owner-name></script-instances></extractor-patterns><extractor-patterns sequence="1" automatically-save-in-session-variable="false" if-saved-in-session-variable="0" filter-duplicates="false" cache-data-set="false" will-be-invoked-manually="false"><pattern-text>"dsh" value="~@DSH@~"</pattern-text><identifier>dsh</identifier><extractor-pattern-tokens optional="false" save-in-session-variable="true" compound-key="true" strip-html="false" resolve-relative-url="false" replace-html-entities="false" trim-white-space="false" exclude-from-data="false" null-session-variable="false" sequence="1"><regular-expression>[^"]*</regular-expression><identifier>DSH</identifier></extractor-pattern-tokens><script-instances><owner-type>ExtractorPattern</owner-type><owner-name>dsh</owner-name></script-instances></extractor-patterns><HTTPParameters sequence="4"><key>hl</key><type>GET</type><value>en</value></HTTPParameters><HTTPParameters sequence="2"><key>passive</key><type>GET</type><value>true</value></HTTPParameters><HTTPParameters sequence="5"><key>continue</key><type>GET</type><value>https://www.google.com/analytics/settings/</value></HTTPParameters><HTTPParameters sequence="6"><key>followup</key><type>GET</type><value>https://www.google.com/analytics/settings/</value></HTTPParameters><HTTPParameters sequence="3"><key>nui</key><type>GET</type><value>1</value></HTTPParameters><HTTPParameters sequence="1"><key>service</key><type>GET</type><value>analytics</value></HTTPParameters><script-instances><owner-type>ScrapeableFile</owner-type><owner-name>Login page</owner-name></script-instances></scrapeable-files></scraping-session>