chris mikkelson [Thu, 11 Mar 2010 21:34:56 +0000 (15:34 -0600)]
URL extraction fixes -- Add more robust text/plain URL regecxp,
use html-specific regexp for html module, and extract both
anchor (<a href=) and image (<img src=) URLs.
chris mikkelson [Thu, 29 Jan 2009 03:53:02 +0000 (21:53 -0600)]
Added code to split multipart media into parts. At a boundary, data
is flushed downstream, the downstream processors are terminated, and
if more parts are expected, a new downstream processor is started.
Downstream processor is the "part" type, which looks for headers
to decide content type / encoding.
chris mikkelson [Tue, 27 Jan 2009 04:49:54 +0000 (22:49 -0600)]
Implemented new module (msgproc_module) and module instance (msgproc)
interface. This allows each processing module to consume/advertise only
one global symbol, as opposed to 4-5 previously.
The HTML, text, and base64 modules have been moved from the old interface
to the new. Base64 has furthermore become self-contained.
When the quoted-printable code is module-ified, the decoders.h file
can go away.
parser.h and parser.c were removed as they were specific to the old interface,
used to collect the growing number of public symbols for each module.
chris mikkelson [Mon, 26 Jan 2009 05:44:14 +0000 (23:44 -0600)]
Moved html and text parsers into separate files, and fleshed
out their implementation in the process. Other parsers will also
be in their own files, eventually.