B.E.R.T.

Inside the GenericProcessor


						    <generic-processor processImages="true">
      <contents-selector><![CDATA[div.articolo]]></contents-selector>
      <title-selector><![CDATA[h1]]></title-selector>
      <sub-title-selector><![CDATA[]]></sub-title-selector>
      <meta-info-selector><![CDATA[small]]></meta-info-selector>
      <body-paragraphs-selector><![CDATA[div.contenuto > p]]></body-paragraphs-selector>
      <comment-selector><![CDATA[li[id^=comment-]]]></comment-selector>
      <comment-author-selector>
        <![CDATA[div.comment_head_left > small]]>
      </comment-author-selector>
      <comment-meta-info-selector>
        <![CDATA[div.comment_head_left > h4]]>
      </comment-meta-info-selector>
      <comment-body-paragraphs-selector>
        <![CDATA[div.comment_text]]>
      </comment-body-paragraphs-selector>
    </generic-processor>

The strong prerequisite here is understandig the CSS selctors without this know-how you can't develop any Processor

Selectors are strings required for locate original post's pieces of informations usefull to compose epub pages.

contents-selector: Selector indicating the post's content tipically without comments

title-selector: selector used inside contents-selector to select the post's title (stored in the epup under H1 tag)

sub-title-selector: optional selector used inside contents-selector to select the post's subtitle (stored in the epup under H3 tag)

meta-info-selector: selector used inside contents-selector to select the post's meta info like author, publishing date, ecc. (stored in the epup under H2 tag)

body-paragraphs-selector: selector used used inside contents-selector to collect ALL post body contents paragraphs. For each of them BERT create a P element and insert it into the output epub preserving inner html balanced tags

comment-selector: selector used to find comments, using it BERT take in hand ALL comments attached.

comment-selector: selector used used inside comment-author-selector to get the comment's author

comment-selector: selector used used inside comment-author-selector to get the comment's meta-info (publication time infos, ecc)

comment-selector: selector used used inside comment-author-selector to collect ALL single comment body contents paragraphs. For each of them BERT create a P element and insert it into the output epub preserving inner html balanced tags

Examples

Look HERE for a real world xml configuration with a lot of supported Blogs


						<generic-processor processImages="true">
	<contents-selector><![CDATA[div.articolo]]></contents-selector>
	<title-selector><![CDATA[h1]]></title-selector>
	<sub-title-selector><![CDATA[]]></sub-title-selector>
	<meta-info-selector><![CDATA[small]]></meta-info-selector>
	<body-paragraphs-selector><![CDATA[div.contenuto >
		p]]></body-paragraphs-selector>
	<comment-selector><![CDATA[li[id^=comment-]]]></comment-selector>
	<comment-author-selector><![CDATA[div.comment_head_left >
		small]]></comment-author-selector>
	<comment-meta-info-selector><![CDATA[div.comment_head_left >
		h4]]></comment-meta-info-selector>
	<comment-body-paragraphs-selector><![CDATA[div.comment_text]]></comment-body-paragraphs-selector>
</generic-processor>

This is a generic-processor definition for a generic "BooksBlog" ( http://www.booksblog.it/ ) blog's post

B.E.R.T.

Blog to Epub Ripping Tool

Inside the GenericProcessor

Examples

Links