Minimal code
Using the default pipeline
require_once('pipeline.class.php');
parse_config_file('./.html2ps.config');
$g_config = array(
'cssmedia' => 'screen',
'renderimages' => true,
'renderforms' => false,
'renderlinks' => true,
'mode' => 'html',
'debugbox' => false,
'draw_page_border' => false
);
$media = Media::predefined('A4');
$media->set_landscape(false);
$media->set_margins(array('left' => 0,
'right' => 0,
'top' => 0,
'bottom' => 0));
$media->set_pixels(1024);
$g_px_scale = mm2pt($media->width() - $media->margins['left'] - $media->margins['right']) / $media->pixels;
$g_pt_scale = $g_px_scale * 1.43;
$pipeline = PipelineFactory::create_default_pipeline("","");
$pipeline->process('http://www.google.com', $media);
Building your own conversion pipeline
require_once('pipeline.class.php');
parse_config_file('./.html2ps.config');
$g_config = array(
'cssmedia' => 'screen',
'renderimages' => true,
'renderforms' => false,
'renderlinks' => true,
'mode' => 'html',
'debugbox' => false,
'draw_page_border' => false
);
$media = Media::predefined('A4');
$media->set_landscape(false);
$media->set_margins(array('left' => 0,
'right' => 0,
'top' => 0,
'bottom' => 0));
$media->set_pixels(1024);
$g_px_scale = mm2pt($media->width() - $media->margins['left'] - $media->margins['right']) / $media->pixels;
$g_pt_scale = $g_px_scale * 1.43;
$pipeline = new Pipeline;
$pipeline->fetchers[] = new FetcherURL;
$pipeline->fetchers[] = new FetcherLocalFile('./input');
$pipeline->data_filters[] = new DataFilterHTML2XHTML;
$pipeline->parser = new ParserXHTML;
$pipeline->layout_engine = new LayoutEngineDefault;
$pipeline->output_driver = new OutputDriverFPDF($media);
$pipeline->destination = new DestinationBrowser;
$pipeline->process('http://www.yahoo.com');
Conversion pipeline
PipelineFactory is a simple factory class simplifying building of
Pipeline instances;
create_default_pipeline() will build a simple ready-to-run conversion pipeline. The usage of
PipelineFactory is not required; you may create the
Pipeline object and fill
the appropriate fields manually.
class PipelineFactory {
function create_default_pipeline();
}
Pipeline class describe the process of conversion as a whole; it contains references to classes, described
above and is responsible for calling them in correct order and error handling.
class Pipeline {
var $fetchers;
var $data_filters;
var $parser;
var $pre_tree_filters;
var $layout_engine;
var $post_tree_filters;
var $output_driver;
var $output_filter;
var $destination;
function Pipeline();
function process($data_id);
function error_message();
}
Description of interfaces and classes
Almost all interfaces described below include
error_message method.
It should return the user-readable description of
the error. This description MAY contain HTML tags, but should remain
readable in case tags are removed.
Fetcher interface provides a method of
fetching the data required
to build a document tree. Normally, classes implementing this interface would
fetch an HTML/XHTML string from somewhere (e.g. from remove HTTP server,
local file or database). Nevertheless, it MAY fetch ANY data provided that
this data will be understood by parser. The pipeline object may contain
several fetcher objects; in this case they're used one-by-one until
one of them return non-null value.
It is assumed that if you need to get data from non-standard places (e.g. from template engine or database), you
should implement Fetcher in your own class.
Note that the get_data method returns the FetchedData object (or one of its descendants) instead of
HTML string!
class Fetcher {
function get_data($data_id);
function error_message();
}
FetcherURL is an implementation of the
Fetcher interface.
Takes the URL and fetches the HTML page using http or https protocol.
Other protocols considered as an error.
class FetcherURL {
function FetcherURL();
function get_data($url); // fetches the URL and returns the HTML/XHTML content
function error_message();
}
FetcherLocalFile is an implementation of the
Fetcher interface;
it reads the contents of local file. As, in general, showing contents of local
files to user is insecure, we introduce the simple security measure:
it can fetch only the files inside some predefined directory.
class FetcherLocalFile {
function FetcherLocalFile($restrict_path);
function get_data($path);
function error_message();
}
DataFilter interface describes the filters modifying the raw input data.
The main purpose of these filters is to fix the raw data so that it can be
processed by parser without errors.
class DataFilter {
function process($data); // returns modified ("filtered") data
function error_message();
}
DataFilterHTML2PSCommands is an implementation of
DataFilter.
It converts the special HTML2PS commands (stricly speaking, the only one –
page breaking command, which can be recorded as <!--NewPage--> or
) to HTML "tag" – <pagebreak/> so that
XML parser will add this command to the document tree.
class DataFilterHTML2PSCommands {
function DataFilterHTML2PSCommands();
function process($data); // returns modified ("filtered") data
function error_message();
}
DataFilterHTML2XHTML is an implementation of
DataFilter.
The precide description of this filter actions are beyoud the scope of this
document. In general, it makes the input document a wellformed XML document
(possibly throwing out invalid parts, by the way). Note that it is achieved
by extensive use of regular expressions; no XML/HTML parsers involved
in conversion at this stage.
class DataFilterHTML2XHTML {
function DataFilterHTML2XHTML();
function process($data); // returns modified ("filtered") data
function error_message();
}
Parser interface provides a method of building the DOM tree from the
filtered data.
class Parser {
function process($data); // returns DOM tree object
function error_message();
}
ParserXHTML
Implementation of
Parser interface; takes an XHTML string as an input
and returns the dom tree object.
class ParserXHTML {
function ParserXHTML();
function process($xhtml); // returns a reference to an object implementing the
// DOMTree interface
}
PreTreeFilter interface describes a procedure of document tree transformation executed before
the layout engine starts.
No classes implementing the
PreTreeFilter will be included in the distribution
class PreTreeFilter {
function process(&$tree); // Processes tree IN-PLACE
}
PreTreeFilterHTML2PSFields implements
PostTreeFilter and describes the processing
of special fields (such a date, page count, page number, etc.).
class PostTreeFilterHTML2PSFields {
function PostTreeFilterHTML2PSFields($filename, $filesize, $timestamp);
function process(&$tree); // Processes tree IN-PLACE
}
LayoutEngine interface of a class processing
of the document tree and calculating positions of page elements. In theory, different implementations
of this interface will allow us to use "lightweight" layout engines in case we do
not need full HTML/CSS support.
class LayoutEngine {
function process(&$tree, &$media); // Processes tree IN-PLACE
}
LayoutEngineDefault - a standard layout engine HTML2PS uses.
class LayoutEngineDefault {
function LayoutEngineDefault();
function process(&$tree, &$media); // Processes tree IN-PLACE
}
PostTreeFilter interface describes a procedure of document tree transformation executed after
the layout engine completes.
class PostTreeFilter {
function process(&$tree); // Processes tree IN-PLACE
}
OutputDriver interface contains device-specific functions - drawing, movement, fonts selection, etc.
In general, description of this interface is beyond the scope of this document, as users are not intended
to implement this interface themselves. Instead, they would use pre-defined output drivers described below.
class OutputDriver {
...
}
OutputDriverPDFLIB implements
OutputDriver using PDFLIB.
class OutputDriverPDFLIB {
function OutputDriverPDFLIB(&$media, $pdf_version);
...
}
OutputDriverFPDF implements
OutputDriver using FPDF
class OutputDriverFPDF {
function OutputDriverFPDF(&$media, $pdf_version);
...
}
OutputDriverCompactPS implements
OutputDriver for Postscript output.
class OutputDriverCompactPS {
function OutputDriverCompactPS(&$media);
...
}
OutputFilter interface describes the filter applied to generated PS or PDF file.
class OutputFilter {
function process($temp_filename); // Possibly creates new file and returns its name
}
OutputFilterPS2PDF implements
OutputFilter. Run the PS2PDF utitity on the generated file.
class OutputFilterPS2PDF {
function OutputFilterPS2PDF();
function process($temp_filename);
}
OutputFilterGZIP implements
OutputFilter. Compresses generated file using ZLIB.
class OutputFilterGZIP {
function OutputFilterGZIP();
function process($temp_filename);
}
Destination interface describes the "channel" object which determines where the final output file
should be placed.
class Destination {
function process($temp_file_name);
}
DestinationBrowser implements
Destination and outputs the generated file directly to the browser.
class DestinationBrowser {
function DestinationBrowser($filename = "");
function process($temp_file_name);
}
DestinationDownload implements
Destination and outputs the generated file directly to the browser.
Unlike
DestinationBrowser, this class send headers preventing the file from being opened directly
in the browser window.
class DestinationDownload {
function DestinationDownload($filename = "");
function process($temp_file_name);
}
DestinationLocalFile implements
Destination and saves generated file on the server side.
class DestinationLocalFile {
function DestinationLocalFile($dir, $filename = "");
function process($temp_file_name);
}
Implementing your own fetcher class
Sometimes you may need to convert HTML code taken from database or from other non-standard sources.
In this case you should implement Fetcher interface yourself, returning the string to be converted
from the get_data method. Additional parameters (like database connection settings,
template variables, etc) may be specified either as globals (not recommended, though), passed as a parameters
to constructor of fetcher object or as $dataId parameter of get_data method.
class TestFetcher extends Fetcher {
var $content;
var $url;
function get_data($dumb) {
return new FetchedDataURL($this->content, array(), "");
}
function set_content($content) {
$this->content = $content;
}
}
Class dependencies
The pipeline object contains the following:
- one or more objects implementing Fetcher interface;
- zero or more objects implementing DataFilter interface;
- one object implementing Parser interface;
- zero or more objects implementing PreTreeFilter interface;
- one object implementing LayoutEngine interface;
- zero or more objects implementing PostTreeFilter interface;
- one object implementing OutputDriver interface;
- one object implementing Destination interface;
No other dependencies between class in interfaces (except "implements").
Note that order of filters is important; imagine you're using some king of tree filter which adds header block
containing HTML2PS-specific fields. In this case you must add this filter before PostTreeFilterHTML2PSFields, or
you'll get raw field codes in generated output.