Thursday, February 17, 2005

HTML parsing

While working on my new personal project today, I came across an unexpected problem - how to parse an html file using C#. My business partner Shawn had previously done something similar using VB.Net, but this was my first stab at it. Luckily, there are a number of people who have already dove into this area, so it wasn't hard to find some answers. My first find was the best - a guy named Jeff Heaton had already created a number of classes in C# to address this problem and put them out under Limited GNU Public License(LGPL). Although these were easy to implement and worked without any difficulties, I quickly ran into a limitation - he had designed his classes to be used as a spider and was only interested at the attributes of the html tags (like 'href' attribute in the 'a' tag). Unfortunately, I need the descriptive contents of the tag, i.e. the words between '' and ''. My first attempts to modify his code have left me getting the urls and descriptions of half of the links - it skips every second one... So tomorrow will be trying to re-engineer the whole thing to get what I need.

Well, on my continuing search for employment as a developer, I am noticing some sights using Macromedia's Flex technology. First, there is my past client Launch Vision and a site they have developed, Stem Cell Therapeutics. Nice job guys - both of them look great! Also, I'm pretty sure 5By5 Software is using it at least partially for their product. Looks like it is definetly a growing technology, at least in Calgary...

By the way, I have found one of the most annoying things I've ever seen in Windows tonight - I accidentally held down the shift key for 8 seconds (my thoughts were drifting as I was listening to some tunes) and this dialog popped up! I pressed cancel, but it was too late - the "Filter Keys" option had been engaged through the Windows Accessibility option. The most immediate effect was not being able to point my mouse to a new point on a page and position the Ibar there - instead it would highlight from the first point to the new point! I was trying to code and this was infuriating! Well, after checking Google out, I found a page that allowed me to get by it (I unplugged my keyboard and plugged it back in), but now this stupid little stop watch is in my display icons in the taskbar and I can't get rid of it! I've hidden it and hopefully when I reboot the computer it will go away... But who thought of this?!

No comments: