Leon’s Weblog

July 22, 2008

File Synchronization with Unison

Filed under: Software Dev — leon @ 8:51 am

Unison is a universal tool for synchronizing files. Although the program is no longer actively developed, it has enough useful features to make it my tool of choice for many tasks and projects. Here are a few scenarios for which I find Unison to be particularly useful:

Application Deployment
While Unison is in no way a replacement for version control, it can be used to release (web/intranet) applications from staging to production environments. This approach has several advantages. First, it is faster (and safer) than doing a full copy of a large site. Before the changes are committed, the program displays a summary of changed files and allows you to use diff to view/confirm the changes that were made. Since, platforms like ASP.NET can compile pages on-the-fly (in memory) synchronizing only the changed files saves the server processing time and improves the users’ experience. Also, synchronization is bidirectional (unlike rsync) so changes made directly on the production copy can be detected (just don’t ask who made them). Of course all of this can be achieved by writing custom deployment scripts but running Unison is far easier (especially if you have a frequent release schedule).

Synchronizing Documents with Mobile Gadgets
I run a central file server that hosts all of my documents. Although I can access the documents remotely, I often make create replicas for my laptop, PDA, and flash drive (as needed) for times when I am not connected or the internet connection is too slow. Unison is particularly useful here because it is available for Windows, Linux, and Mac and can synchronize local files (for flash drive), network shares, and over SSH. This was the only tool that I found that can safely and securely synchronize files from my Linux server to my windows laptop without compromising any functionality on either platform. Furthermore, if you have more than two replicas of the same files, you can safely synchronize the replicas two at a time to propagate changes.

Backup
There are many backup and disaster recovery solutions; however on Unix/Linux, everything is just a file. It’s often easier and more useful to just make a copy of everything to an external disk and maintain it by synchronizing. To recover, just copy the files back. I wouldn’t recommend this approach on a critical corporate server; but, for a personal server I find this approach is good enough.

Unison is free. Give it a shot.

December 10, 2007

WordPress Auto-Login

Filed under: Software Dev — leon @ 9:31 pm

WordPressis a great blogging engine. It’s flexible, scalable, and easy to tweak/configure to integrate into an existing PHP site. However, if you have an existing site with available user authentication and management capabilities, getting WordPress to accept those credentials (in a single sign-on fashion) can be a bit of a challenge.

Before we proceed, I should note that there are a number of available plugins that enable WordPress to integrate with some of the popular content management systems out there. Our requirement is a bit different however. We want to bypass WordPress’ authentication mechanism all together and have users login through the main portion of the site. In fact, in a well integrated site, the interface should make navigating between WordPress pages and the rest of the site seamless to the user. Our goal is to write a WordPress plug-in that will automatically authenticate a user who is already logged into the parent site (and, consequently, grant the user access to edit the blog’s content). All other users will have the rights of an unregistered visitor.

In my setup, the main site has role-based permissions and the WordPress setup only has one account for each role (i.e. admin, editor, user etc…). The plugin first checks the role of the user logged in to the main site and then simulates a WordPress login anytime the user navigates to the blog. You should be able to customize this method for your own needs.

function auto_login() {
    if (!is_user_logged_in()) {
        //determine WordPress user account to impersonate
        $user_login = 'guest'; 

        //get users password
        $user = new WP_User(0, $user_login);
        $user_pass = md5($user->user_pass); 

        //login, set cookies, and set current user
        wp_login($user_login, $user_pass, true);
        wp_setcookie($user_login, $user_pass, true);
        wp_set_current_user($user->ID, $user_login);
    }
}
add_action('init', 'auto_login');

Additional notes and caveats for the attentive reader

  • There is a wp-include/pluggable.phpfile that defines all the functions that you can override and hook into. The WordPress API documentation is not very thorough so you may need to review the actual code.
  • WordPress uses a double MD5 hash of the password to authenticate the user. In the database, the password is stored as a single hash. We need to hash that password again before passing it into the wp_login() function (and set the third parameter to indicate that the password is already hashed). Obviously hard coding the actual password would be a big no-no.

We did all this work to login but what about logging out? We have several options. First, we can call WordPress’ logout method which is wp_clearcookie()from the main site.  The drawback to this approach is that we need to include all the WordPress libraries into our main site for this to work (too much unnecessary overhead IMHO). The other approach is to not use cookies at all thus alleviating the need to logout. To do this we simply remove the call to wp_setcookie()in out plugin and override the auth_redirect()function to do nothing. This works because we impersonate the user on every page load and the only WordPress code that checks the cookie was in auth_redirect()until we got rid of it. Another side effect of this is that un-authenticated WordPress users will no longer be taken to the WordPress login page (but we didn’t want that anyway).

Update 6/4/08: There were a few changes to the WordPress API as of version 2.5 and some of the functions I used above became depreciated. The API documentation has also improved. A better way to implement the auto_login() function above is as follows.

function auto_login() {
    if (!is_user_logged_in()) {
        //determine WordPress user account to impersonate
        $user_login = 'guest';

       //get user's ID
        $user = get_userdatabylogin($user_login);
        $user_id = $user->ID;
  
        //login
        wp_set_current_user($user_id, $user_login);
        wp_set_auth_cookie($user_id);
        do_action('wp_login', $user_login);
    }
} 
add_action('init', 'auto_login');

November 30, 2007

Configuring Website on a 1and1 Shared Host

Filed under: Software Dev — leon @ 2:39 pm

Recently, I was working on a project to setup a new website on a 1 & 1 shared host. Shared hosts are a cheap alternative to VPS and managed servers but they come with a mixed bag of restrictions that limit your ability to configure the server. I was looking for a host for under $10/month that offered SSH access and had a typical LAMP setup. This is how I configured the rest. (more…)

February 27, 2007

ASP.NET Impersonation and Screen-Scraping

Filed under: Software Dev — leon @ 10:13 am

If you have ever tied to download the content of a secured web page from another page then you know just how easy it is to have something go wrong. But, as I’ve found out recently, if you see it through, you will learn a great deal about IIS and ASP.NET security (while developing a new found appreciation for Apache at the same time). Here are some highlights.

Background:
You want a user to press a button on an Intranet page and download the content of several other remote sites that use NTLM. The connection to the remote resource should be made using the credentials of the user not a default account. (in my case the remote resources were actually web services and network files but it doesn’t really matter)

The Basics:
First configure your site to use Integrated Windows Security (a.k.a. NTLM) and disable anonymous login. This can all be done under the Security tab of the IIS settings. At this point, it doesn’t really matter what authentication method is configured for ASP.NET in the web.config file because IIS will still pass the identity of the network user on the domain to the page. However, to be a bit more explicit, we can disable guest login in the web.config file as well.

The easiest way to download a file in ASP.NET is to use the System.Net.WebClient class. Just create an instance of the class and use the DownloadFile method. What if you are downloading from a secured site that doesn’t allow anonymous users?

First Problem:
One would think that passing the credentials of the current logged in user to a remote resource should be easy. Especially when the WebClient class has a Credentials property which we can set to CredentialCache.DefaultCredentials. Easy enough. Too bad that doesn’t work. Maybe we should set the UseDefaultCredentials property of the WebClient class to True as well. Still doesn’t work.

Checking the logs on the remote web server will show that no credentials were passed at all and the server returned the typical 401-unauthorized error. There are plenty of other properties that you can set or you can try to work with the lower level HTTP Client Request and Response classes but they will all lead you in the same place… i.e. nowhere. You may see a small beacon of hope if you place both websites on the same host because this configuration works (although it is not the way you intended).

Impersonation:
By default, IIS runs and processes each user request in threads owned by ASP.NET’s local machine account (which is typically ASPNET). As a consequence, even if we enable NTLM security on the page, those credentials will only last one hop. Meaning that your web server authenticates the client over NTLM but cannot spawn new web requests on the user’s behalf to remote network resources. No wonder we couldn’t login before. ASPNET is not a network account. Also, this explains why you can connect to other websites on the same host… IIS was using the same ASP.NET account on each site.

Now we can configure IIS to enable impersonation (which is disabled by default) by setting the appropriate variable in the web.config file. This will tell IIS to spawn each new request thread under the credentials of the client thus giving the thread access to network resources that are available to the user. Are we home free now? Well, you can test that this scenario does work and that we have reached our goal but at what cost.

Next Problem:
According to Microsoft (and a few simple performance tests) enabling impersonation reduces the scalability of your site because each request incurs a small performance hit. Do you want to impose this overhead on the entire site if you only need to use this functionality in a few places? There are also a number of security problems with this solution. The one that I particularly dislike is that, if you have pages that write files on the webs server, every network user will now need write access to the output folders (whereas, without impersonation, only the local asp.net account needs write access to the output folders).

Impersonation at Runtime:
What if you can disable impersonation for the site as a whole but enable it dynamically on the pages that need it. Well it turns out that you can. I would have preferred setting a page directive to enable this option as needed but, after going though this ordeal, writing a few extra lines of code doesn’t seem so bad. The last caveat is that the web.config settings now matter. If ASP.NET authentication method is not set to “Windows” you will not be able to impersonate the user at runtime because there will not be enough information in the User.Identity object.

The final code:

' Impersonate the current network user
Dim impersonationContext As System.Security.Principal.WindowsImpersonationContext
Dim currentIdentity As System.Security.Principal.WindowsIdentity = CType(User.Identity, _
System.Security.Principal.WindowsIdentity)
impersonationContext = currentIdentity.Impersonate

' Download file from a network resource
Dim myWebClient As New WebClient
myWebClient.Credentials = CredentialCache.DefaultCredentials
myWebClient.DownloadFile("http://network_resource", Server.MapPath("~\inbox"))

' Revert back to original context of the ASP.NET process
impersonationContext.Undo()

February 19, 2007

.NET Regular Expressions

Filed under: Software Dev — leon @ 7:11 pm

Regular Expressions are great for parsing text files. If only all languages used the same conventions… Oh well. Here are some tips and caveats that I’ve picked up from working with Regular Expressions in .NET

  • Comments: If your RegExp is going to be longer than just a few symbols, make sure to include comments. The comments are placed in tags like this: (?# Your Comment Here ). When doing this make sure to set the option to ignore pattern white space. While you are at it, enable multiline support since the file that you are parsing is probably on multiple lines and ignore case to make the pattern simpler. Combine the options using the or operator like so: RegexOptions.Multiline Or RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace
  • Named Captures: If you are parsing the file with Regular Expressions then you are looking for tokens to extract. These tokens are saved in the Groups collection of the Match object. By default, you can access the captured values using an index; however, you can greatly improve readability by assigning a name to any matched token like this: (?<Name>(Pattern)).
  • Non-Greedy Matching: Ever wonder why .* sometimes gobbles up your entire file. By default regular expressions are optimistic and match the last possible token that matches. To stop at the first token that matches use the question mark operator like so .*?. As you would expect, this works on other patterns such as .+? and .{4,8}?. At least this behavior is consistent on most platforms that claim to implement regular expressions.
  • Caveat: Did you expect the period operator to match absolutely any character. Well not in .NET. Here, a new line character (which, might I add, is pretty common in text files) will not be matched even the Multiline option is set. If you want to match absolutely anything, use a character class like so [\s\S]. This basically means match anything that is white space or is not white space… i.e. everything.

In my last project, I had to parse an HTML document to extract the path to a particular image and the associated image map (good ol fashioned screen-scraping). Maybe this example will help someone else…

Dim re As New Regex( _
   "<img\\s+                     (?# First find an image ) " & _
   "id=""Chart_Image""\\s+       (?# With this ID ) " & _
   "usemap=                     (?# Then find the image map name ) " & _
   """\\#([\\s\\S]+?)""\\s+         (?# Save the image map name. Use [\s\S] because . does not match \n ) " & _
   "src=""(?<file>([\\s\\S]+?))"" (?# Then get the file path ) " & _
   "[\\s\\S]*?                    (?# Now wait for the image map. Non-Greedy capture ) " & _
   "<map \\s+ name=""\\1"" \\s*>   (?# Capture the image map content with specified name ) " & _
   "(?<imagemap>([\\s\\S]+))      (?# This is the image map content) " & _
   "<\\/map>                     (?# End of Image Map) ", _
   RegexOptions.Multiline Or RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
Dim myMatch As Match = re.Match(file)

For more .NET specific help on Regular Expressions, check out this Article.

January 21, 2007

Maps for GPS Tuner

Filed under: Gadgets, Software Dev — leon @ 10:34 pm

I have been looking for an off-road navigation solution when I stumbled across GPS Tuner. While TomTom is great for car navigation, it lacks many features such as track recording and support for custom maps. We now have the option of using the newly released mobile Google Maps and mobile Virtual Earth, these programs require a constant Internet connection and can be slow to use (especially when hiking in remote locations with poor cell phone reception). Its often much more convenient to have the needed maps pre-loaded and configured on the hand-held.

When I gave GPS Tuner a try I quickly realized that I have to spend a lot of time making my own maps. Luckily, there are a number of free online mapping systems (tile servers) such as Google Earth and Virtual Earth that can provide the base images for maps. The problem now is downloading the maps (in fine resolution) and piecing them together. Since I’m too lazy to do this manually, it was time for a little scripting to automate the process.

Google Maps and Microsoft Virtual Earth work by asynchronously downloading tiles of the map depending on the users desired map zoom level. With a little hacking, I figured that I could put together a script that will download any section of the map in any available zoom level and automatically put all the tiles together into one large image. The biggest challenge there is finding out the indexing scheme used for the tiles (i.e. given a lat/long coordinate and a zoom level deterministically determine the corresponding tile on the map and the URL to fetch that tile). The following articles on Via Virtual Earth gave me a great head start and even some sample code. All that was left to do is write a loop to download all the tiles between two lat/long coordinates and save them into one continuous image that can be loaded into GPS Tuner.

Here is the code and a sample image of Manhattan made from about 100 tiles.

Happy Navigating.

October 13, 2005

PHP Application Framework Design: 4 - Forms and Events

Filed under: Software Dev — leon @ 9:41 am

This is part 4 of a multi-part series on the design of a complete application framework written in PHP. In part 1, we covered the basic class structure of the framework and laid out the scope of the project. The second part described methods for managing users and session data. The third part described a practical implementation of page templates. In this fourth and final section, we will expand on our implementation of page templates to provide the web application persistent form data capabilities. We will also apply the framework to built an error page and a page dispatcher.

(more…)

PHP Application Framework Design: 3 - Page Templates

Filed under: Software Dev — leon @ 9:16 am

This is part 3 of a multi-part series on the design of a complete application framework written in PHP. In part 1, we covered the basic class structure of the framework and laid out the scope of the project. The second part described methods for managing users and session data. This part describes a practical implementation of page templates and the separation of application logic from the presentation layer.

(more…)

PHP Application Framework Design: 2 - Managing Users

Filed under: Software Dev — leon @ 9:13 am

This is part 2 of a multi-part series on the design of a complete application framework written in PHP. In part 1, we covered the basic class structure of the framework and laid out the scope of the project. This part adds session handling to our application and illustrates ways of managing users.

(more…)

PHP Application Framework Design: 1 - Getting Started

Filed under: Software Dev — leon @ 9:10 am

This article describes the design of a complete application framework written in PHP. The reader is assumed to have a working knowledge of PHP. This part of the series describes the scope of the framework and goes over the initial class hierarchy. The following parts cover everything from session handling to creating page templates.

(more…)