Leon's Weblog

February 27, 2007

ASP.NET Impersonation and Screen-Scraping

Filed under: Software Dev — Leon @ 10:13 am

If you have ever tied to download the content of a secured web page from another page then you know just how easy it is to have something go wrong. But, as I’ve found out recently, if you see it through, you will learn a great deal about IIS and ASP.NET security (while developing a new found appreciation for Apache at the same time). Here are some highlights.

You want a user to press a button on an Intranet page and download the content of several other remote sites that use NTLM. The connection to the remote resource should be made using the credentials of the user not a default account. (in my case the remote resources were actually web services and network files but it doesn’t really matter)

The Basics:
First configure your site to use Integrated Windows Security (a.k.a. NTLM) and disable anonymous login. This can all be done under the Security tab of the IIS settings. At this point, it doesn’t really matter what authentication method is configured for ASP.NET in the web.config file because IIS will still pass the identity of the network user on the domain to the page. However, to be a bit more explicit, we can disable guest login in the web.config file as well.

The easiest way to download a file in ASP.NET is to use the System.Net.WebClient class. Just create an instance of the class and use the DownloadFile method. What if you are downloading from a secured site that doesn’t allow anonymous users?

First Problem:
One would think that passing the credentials of the current logged in user to a remote resource should be easy. Especially when the WebClient class has a Credentials property which we can set to CredentialCache.DefaultCredentials. Easy enough. Too bad that doesn’t work. Maybe we should set the UseDefaultCredentials property of the WebClient class to True as well. Still doesn’t work.

Checking the logs on the remote web server will show that no credentials were passed at all and the server returned the typical 401-unauthorized error. There are plenty of other properties that you can set or you can try to work with the lower level HTTP Client Request and Response classes but they will all lead you in the same place… i.e. nowhere. You may see a small beacon of hope if you place both websites on the same host because this configuration works (although it is not the way you intended).

By default, IIS runs and processes each user request in threads owned by ASP.NET’s local machine account (which is typically ASPNET). As a consequence, even if we enable NTLM security on the page, those credentials will only last one hop. Meaning that your web server authenticates the client over NTLM but cannot spawn new web requests on the user’s behalf to remote network resources. No wonder we couldn’t login before. ASPNET is not a network account. Also, this explains why you can connect to other websites on the same host… IIS was using the same ASP.NET account on each site.

Now we can configure IIS to enable impersonation (which is disabled by default) by setting the appropriate variable in the web.config file. This will tell IIS to spawn each new request thread under the credentials of the client thus giving the thread access to network resources that are available to the user. Are we home free now? Well, you can test that this scenario does work and that we have reached our goal but at what cost.

Next Problem:
According to Microsoft (and a few simple performance tests) enabling impersonation reduces the scalability of your site because each request incurs a small performance hit. Do you want to impose this overhead on the entire site if you only need to use this functionality in a few places? There are also a number of security problems with this solution. The one that I particularly dislike is that, if you have pages that write files on the webs server, every network user will now need write access to the output folders (whereas, without impersonation, only the local asp.net account needs write access to the output folders).

Impersonation at Runtime:
What if you can disable impersonation for the site as a whole but enable it dynamically on the pages that need it. Well it turns out that you can. I would have preferred setting a page directive to enable this option as needed but, after going though this ordeal, writing a few extra lines of code doesn’t seem so bad. The last caveat is that the web.config settings now matter. If ASP.NET authentication method is not set to “Windows” you will not be able to impersonate the user at runtime because there will not be enough information in the User.Identity object.

The final code:

' Impersonate the current network user
Dim impersonationContext As System.Security.Principal.WindowsImpersonationContext
Dim currentIdentity As System.Security.Principal.WindowsIdentity = CType(User.Identity, _ 
impersonationContext = currentIdentity.Impersonate

' Download file from a network resource
Dim myWebClient As New WebClient
myWebClient.Credentials = CredentialCache.DefaultCredentials
myWebClient.DownloadFile("http://network_resource", Server.MapPath("~\inbox"))

' Revert back to original context of the ASP.NET process

February 19, 2007

.NET Regular Expressions

Filed under: Software Dev — Leon @ 7:11 pm

Regular Expressions are great for parsing text files. If only all languages used the same conventions… Oh well. Here are some tips and caveats that I’ve picked up from working with Regular Expressions in .NET

  • Comments: If your RegExp is going to be longer than just a few symbols, make sure to include comments. The comments are placed in tags like this: (?# Your Comment Here ). When doing this make sure to set the option to ignore pattern white space. While you are at it, enable multiline support since the file that you are parsing is probably on multiple lines and ignore case to make the pattern simpler. Combine the options using the or operator like so: RegexOptions.Multiline Or RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace
  • Named Captures: If you are parsing the file with Regular Expressions then you are looking for tokens to extract. These tokens are saved in the Groups collection of the Match object. By default, you can access the captured values using an index; however, you can greatly improve readability by assigning a name to any matched token like this: (?<Name>(Pattern)).
  • Non-Greedy Matching: Ever wonder why .* sometimes gobbles up your entire file. By default regular expressions are optimistic and match the last possible token that matches. To stop at the first token that matches use the question mark operator like so .*?. As you would expect, this works on other patterns such as .+? and .{4,8}?. At least this behavior is consistent on most platforms that claim to implement regular expressions.
  • Caveat: Did you expect the period operator to match absolutely any character. Well not in .NET. Here, a new line character (which, might I add, is pretty common in text files) will not be matched even the Multiline option is set. If you want to match absolutely anything, use a character class like so [\s\S]. This basically means match anything that is white space or is not white space… i.e. everything.

In my last project, I had to parse an HTML document to extract the path to a particular image and the associated image map (good ol fashioned screen-scraping). Maybe this example will help someone else…

Dim re As New Regex( _
   "<img\\s+                     (?# First find an image ) " & _
   "id=""Chart_Image""\\s+       (?# With this ID ) " & _
   "usemap=                     (?# Then find the image map name ) " & _
   """\\#([\\s\\S]+?)""\\s+         (?# Save the image map name. Use [\s\S] because . does not match \n ) " & _
   "src=""(?<file>([\\s\\S]+?))"" (?# Then get the file path ) " & _
   "[\\s\\S]*?                    (?# Now wait for the image map. Non-Greedy capture ) " & _
   "<map \\s+ name=""\\1"" \\s*>   (?# Capture the image map content with specified name ) " & _
   "(?<imagemap>([\\s\\S]+))      (?# This is the image map content) " & _
   "<\\/map>                     (?# End of Image Map) ", _
   RegexOptions.Multiline Or RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
Dim myMatch As Match = re.Match(file)

For more .NET specific help on Regular Expressions, check out this Article.