How to test if a URL exists from a .NET program?

How can you tell if a URL exists from a .NET program? I tried using the WebBrowser control with the Navigate method. It works OK except when the file is a PDF. In that case, it invokes Adobe Reader to pop up the PDF. I don't actually want to bring up a PDF, I just want to know if it exists. It won't work too well to keep popping up PDF files. Do you just remove the file type association in the "programs" option in Internet Options / Programs? Have you guys done this with straight .NET? If not, how would you do it with screen-scrapes and and a .NET program?

I'm sorry that I don't

I'm sorry that I don't understand the question.

Do you want to confirm that a URL is valid by calling a scrape through VB.NET?

Gary,If you're talking about

Gary,

If you're talking about the same scenario as in your earlier post and if I'm understanding what you're needing then, I believe, screen-scraper could handle this. Are you manually constructing the URL versus extracting the link from the page? Is this perhaps why you're wanting to verify that it is a valid URL before requesting it? If so, then you have to do a bit of dancing just to determine whether or not you'll get a 404 without downloading the entire file.

The script below will first indicate if the URL is valid and if it is return the contentType. The contentType is what your browser looks at to know whether or not to load Acrobat or prompt you to download a file, for example.

It's likely that .Net has a way to do this that may not be so verbose, but if you create a script using the following, pass the script your URL as a session variable and check the value of the session variable "CONTENT_TYPE".

import org.apache.commons.httpclient.*;
import org.apache.commons.httpclient.methods.*;
import org.apache.commons.httpclient.params.HttpMethodParams;
import org.apache.commons.httpclient.contrib.ssl.EasySSLProtocolSocketFactory;

session.log("URL: " + session.getVariable( "URL" ));
if (session.getVariable( "URL" )!=null)
{
	session.log("No, not null");
	
	urlString = session.getVariable( "URL" );
	session.setVariable("URL", null);
	
	session.log("Checking content-type for: " + urlString);
	
	// Create a method instance.
	HeadMethod method = new HeadMethod( urlString );
	
	// Provide custom retry handler is necessary
	method.getParams().setParameter
	(
	  HttpMethodParams.RETRY_HANDLER,
	  new DefaultHttpMethodRetryHandler( 3, false )
	);
	
	try
	{
		HttpClient client = new HttpClient();
		
		session.setProxySettingsOnHttpClient( client, client.getHostConfiguration() );
		
		try
		{
			HostConfiguration hostConfiguration = new HostConfiguration();
			URL url = new URL( urlString );
			if( url.toString().startsWith( "https" ) )
			{
				Protocol easyHTTPS = new Protocol( "https", new EasySSLProtocolSocketFactory(), 443 );
				hostConfiguration.setHost( url.getHost(), 443, easyHTTPS );
			}
			else
			{
				hostConfiguration.setHost( url.getHost() );
			}
		}
		catch( MalformedURLException mfue )
		{
			session.log( "MalformedURLException: " + mfue, mfue );
		}
		
		// Execute the method.
		int statusCode = client.executeMethod( method );
		
		if( statusCode!=HttpStatus.SC_OK )
		{
			throw new Exception( "Received status code: " + statusCode );
		}
		
		// Retrieve just the last modified header value.
		String contentType = method.getResponseHeader( "Content-Type" ).getValue();
	
		session.log( "contentType: " + contentType );
		session.setVariable( "CONTENT_TYPE", contentType );
	}
	catch( Exception e )
	{
		throw e;
	}
	finally
	{
	  // Release the connection.
	  method.releaseConnection();
	}
}

.NET code that checks if a URL exists

Thanks for the code. Is that interpreted Java or JavaScript? I checked .NET forums for code that might work within a .NET 3.5 program. I found a couple of things and pieced them together into something that appears to work. I'll test it further to make sure that it indeed works for all cases. Here it is:

    ' Check if the passed web page EXISTS.
    Private Function CheckPageExists(ByVal url As String) As Boolean

        If String.IsNullOrEmpty(url) Then Return False
        If url.Equals("about:blank") Then Return False
        If Not url.StartsWith("http://") And _
            Not url.StartsWith("https://") Then
            url = "http://" & url
        End If

        Dim request As HttpWebRequest
        Dim response As HttpWebResponse

        request = WebRequest.Create(url)
        request.Timeout = 5000

        Try
            response = request.GetResponse()
        Catch ex As Exception
            MessageBox.Show(url + vbCrLf + vbCrLf + "NO - web page NOT found")
            Return False
        End Try

        MessageBox.Show(url + vbCrLf + vbCrLf + "YES - web page found")
        Return True
    End Function

Gary, That looks like it

Gary,

That looks like it would work. You would need to pass the URL back and forth from screen-scraper to your app if you were to have .Net do the URL verification. That can be done. Depending on whether the site requires you to maintain a session state you may need to handle that exchange keeping that in mind.

-Scott