November 19, 2013

Multi-File Upload Using Plupload in ASP.NET

File_Upload After a failed attempt at getting multi-file upload functionality on an intranet site using the AjaxFileUpload control from the Ajax Control Toolkit I was forced to look for alternatives. The AjaxFileUpload control was designed for ASP.NET so I originally though that it would have been the ideal solution. Unfortunately, I kept getting authentication errors in certain usage scenarios on legacy browsers. As an alternative, I ended using Plupload which is used in content management systems like WordPress. While not designed specifically for ASP.NET, I found Plupload flexible enough to work well with different kinds of server side technologies as well as most browsers. The Plupload does not interfere with ASP.NET’s partial page rendering and can be used on sites with master pages. For legacy browsers, it could be configured to use it’s Flash interface or an even more basic HTML 4 interface (instead of HTML 5 which I used by default). Here is how I set it up.

Plupload requires setting up two files to work. On the client side (in the ASPX page) we need to load the appropriate JavaScript libraries and configure the plugin. On the server side, we need to handle the data stream that will be submitted from the client. There are many examples online of configuring the client side script so I won’t go into great detail here. I found the multipart_params parameter very useful for passing server side attributes for files being uploaded. I defined some of the parameters (e.g. AllowedFileTypes) on the server side to be able to be consistent later when it comes time to validate the data stream submitted from the client. Also, I found it simpler to implement the FileUploaded event handler on the client side to be able to indicate the status of the upload to the user (otherwise all uploads look successful unless something crashes on the server side). (more…)

November 16, 2013

Document Indexer Library for .NET

Document Filter I recently wrote about using the Full-Text search feature built into SQL Server to allow users to search through documents (and the challenge of displaying a summary of the search results). Configuring Full-Text search was a fairly easy process; however, populating the table containing the data to be searched turned out to be a bit more tricky. I wanted to avoid the overhead of using SQL Server FileStream and FileTables and needed control over the text that was extracted from documents. My only option was to implement a custom indexer to extract the text that I wanted from files and then store the text in the database.

My first attempt at extracting text from documents was using iFilters. This is the interface that Windows Search uses to index files. SQL Server use it as well to search through FileTables. I liked the universality of this approach because any file type registered on the server would be parse-able without requiring file-type specific code. After hours of browsing PInvoke.net and looking at working projects online, I finally got a library that compiled and parsed all of the file types that I needed. Unfortunately, I had to give up this approach because of several limitations. First, the code was just not stable enough for my liking. It required reading the registry, loading COM objects, and involved a great deal of unmanaged code. As a result, the library was hard to setup on various servers due to issues with 32 bit vs 64 bit iFilter libraries. Furthermore, because of the unmanaged code, I could not invoke this library from a web page and ended up implementing a Windows Service which indexed the documents offline. The final straw was that Adobe’s iFilter parser for PDF files sucks. It would parse the entire document into one string with no way to discern pages, lines or sections of the document. It was time to try something else. (more…)