Inside the GenericProcessor
<generic-processor processImages="true">
<contents-selector><![CDATA[div.articolo]]></contents-selector>
<title-selector><![CDATA[h1]]></title-selector>
<sub-title-selector><![CDATA[]]></sub-title-selector>
<meta-info-selector><![CDATA[small]]></meta-info-selector>
<body-paragraphs-selector><![CDATA[div.contenuto > p]]></body-paragraphs-selector>
<comment-selector><![CDATA[li[id^=comment-]]]></comment-selector>
<comment-author-selector>
<![CDATA[div.comment_head_left > small]]>
</comment-author-selector>
<comment-meta-info-selector>
<![CDATA[div.comment_head_left > h4]]>
</comment-meta-info-selector>
<comment-body-paragraphs-selector>
<![CDATA[div.comment_text]]>
</comment-body-paragraphs-selector>
</generic-processor>
The strong prerequisite here is understandig the CSS selctors without this know-how you can't develop any Processor
Selectors are strings required for locate original post's pieces of informations usefull to compose epub pages.
contents-selector
: Selector indicating the post's content tipically without comments
title-selector
: selector used inside contents-selector
to select the post's title (stored in the epup under H1 tag)
sub-title-selector
: optional selector used inside contents-selector
to select the post's subtitle (stored in the epup under H3 tag)
meta-info-selector
: selector used inside contents-selector
to select the post's meta info like author, publishing date, ecc. (stored in the epup under H2 tag)
body-paragraphs-selector
: selector used used inside contents-selector
to collect ALL post body contents paragraphs. For each of them BERT create a P element and insert it into the output epub preserving inner html balanced tags
comment-selector
: selector used to find comments, using it BERT take in hand ALL comments attached.
comment-selector
: selector used used inside comment-author-selector
to get the comment's author
comment-selector
: selector used used inside comment-author-selector
to get the comment's meta-info (publication time infos, ecc)
comment-selector
: selector used used inside comment-author-selector
to collect ALL single comment body contents paragraphs. For each of them BERT create a P element and insert it into the output epub preserving inner html balanced tags
Examples
Look HERE for a real world xml configuration with a lot of supported Blogs
<generic-processor processImages="true">
<contents-selector><![CDATA[div.articolo]]></contents-selector>
<title-selector><![CDATA[h1]]></title-selector>
<sub-title-selector><![CDATA[]]></sub-title-selector>
<meta-info-selector><![CDATA[small]]></meta-info-selector>
<body-paragraphs-selector><![CDATA[div.contenuto >
p]]></body-paragraphs-selector>
<comment-selector><![CDATA[li[id^=comment-]]]></comment-selector>
<comment-author-selector><![CDATA[div.comment_head_left >
small]]></comment-author-selector>
<comment-meta-info-selector><![CDATA[div.comment_head_left >
h4]]></comment-meta-info-selector>
<comment-body-paragraphs-selector><![CDATA[div.comment_text]]></comment-body-paragraphs-selector>
</generic-processor>
This is a generic-processor
definition for a generic "BooksBlog" ( http://www.booksblog.it/ ) blog's post