B.E.R.T.

Let's have a look at the a config file!


				<ripping-session>
  <epub-title><![CDATA[File epub title]]></epub-title>
  <epub-language>it</epub-language>
  <epub-filename>BERTFilename.epub</epub-filename>
  <!-- Optional proxy information required if you are behind a proxy  -->
  
  <net-proxy-host>10.1.1.1</net-proxy-host>
  <net-proxy-port>8080</net-proxy-port>
  <net-proxy-username><![CDATA[proxyuser]]></net-proxy-username>
  <net-proxy-password><![CDATA[proxypwd]]></net-proxy-password>
  
  <list-provider  id="Booksblog">
    <provider-title><![CDATA[Booksblog]]></provider-title>
    <list>
      <page-url>
        <![CDATA[http://www.booksblog.it/post/6560/confessioni-da-lettore]]>
      </page-url>
      <page-url>
        <![CDATA[http://www.booksblog.it/post/6562/quali-sono-i-vostri-guru-della-lettura]]>
      </page-url>
    </list>
    <generic-processor processImages="true">
      <contents-selector><![CDATA[div.articolo]]></contents-selector>
      <title-selector><![CDATA[h1]]></title-selector>
      <sub-title-selector><![CDATA[]]></sub-title-selector>
      <meta-info-selector><![CDATA[small]]></meta-info-selector>
      <body-paragraphs-selector><![CDATA[div.contenuto > p]]></body-paragraphs-selector>
      <comment-selector><![CDATA[li[id^=comment-]]]></comment-selector>
      <comment-author-selector>
        <![CDATA[div.comment_head_left > small]]>
      </comment-author-selector>
      <comment-meta-info-selector>
        <![CDATA[div.comment_head_left > h4]]>
      </comment-meta-info-selector>
      <comment-body-paragraphs-selector>
        <![CDATA[div.comment_text]]>
      </comment-body-paragraphs-selector>
    </generic-processor>
  </list-provider>
  <feed-provider  id="NazioneIndiana" maxEntries="5">
    <provider-title><![CDATA[[BLOG] Nazione Indiana]]></provider-title>
    <feed-url><![CDATA[http://feeds2.feedburner.com/NazioneIndiana]]></feed-url>
    <processor processImages="true" className="org.bert.ebooks.processors.NazioneIndiana"/>
  </feed-provider>
  
</ripping-session>

This represents an exaustive example of BERT's xml config file.

epub-* tags set epub meta infos (title and language) and the output filename

net-proxy-* are useful tags if you want to use BERT behind a network proxy. You should remove or comment those lines if not (if you are not behind a proxy and you leave those tags the ripping session will be very slow)

Now you can find a one or more Provider definitions as many as you want. In this example we find two Providers the first one is a ListProvider in which you can define under page-url tags one or more (as many as you wish) single post urls that you want to rip, the second one is a FeedProvider in which you have to specify, under feed-url tag, the RSS resource from which read post's urls. In FeedProvider you should limit the number of urls to process (potentially a RSS source can produce thousand of urls) by setting the maxEntries property. Each Provider must have an ID and it should be unique in a single ripping session. If you repeat the same ID in two (or more) different Provider, the last defined win (it's a LinkedHashMap).

Each Provider has it's own Processor in our example the NazioneIndiana Provider has it's own java class implementing org.bert.ebooks.BlogEntryProcessor all processing logic is incapsulated in the class org.bert.ebooks..processors.NazioneIndiana. In the Booksblog Provider the Processor is an Object of org.bert.ebooks..processors.GenericProcessor (see the detail section) less powerfull but much more easy to be implemented: no java Know-How is required.

The CDATA "element" is required for those tags in which the content could contain special XML chars (like, for exmple in a URL: "&")

B.E.R.T.

Blog to Epub Ripping Tool

Let's have a look at the a config file!

Links