At first glance this one appears much more difficult than the one I solved earlier (from d2jsp.org) but after a few lines of pre-processing the image is cleaned up very nicely.
From this pre-processing we are left with a single artifact to deal with, that centered vertical line. Since this artifact was in the same place every time, I simply just trained the OCR engine to recognize the letters with the line through them. I used a data set of 333 images to do the initial training.
I ran another data set of 333 new images through to see what kind of recognition rates I was getting. I needed to train 28 of the 333 images, or 8.408% of them.
On the third and final data set I ran an addition 333 new CAPTCHAs through. I had to train 19 of these, giving me a 5.706% training rate for the third data set.
More to come later...
PS: I'm new to this whole blogging thing, does anyone know if it is possible to use LaTeX in these posts? A plugin of some sort?
Intresting, you should post some CAPTCHA art!
ReplyDelete