emacs-orgmode
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[O] html to org-mode


From: John Kitchin
Subject: [O] html to org-mode
Date: Fri, 3 Jan 2014 21:40:14 -0500

Hi everyone,

I was playing around with org-rss today, and it is pretty cool. I would like to customize the way the subheading bodies look though, primarily to unescape some html  things like &lt;,  to get rid of all the html tags, convert <a ..> to org-mode links, to download <img ...> so they can be displayed, etc...

for example a body of an rss entry looks like:

     <title>Philip Herron: Cython Book</title>     <guid>http://redbrain.co.uk/?p=147</guid>     <link>http://redbrain.co.uk/cython-book/</link>     <description><p>Hey all i thought i should really share that i actually wrote a book on Cython. The book has detailed examples and even shows you how you can extend native C/C++ applications in python by doing it for Tmux. <a href="" href="http://bit.ly/195ahQs">http://bit.ly/195ahQs">http://bit.ly/195ahQs</a></p> <p><a href="" href="http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg">http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg"><img class="aligncenter size-full wp-image-148" alt="photo" src="" href="http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg">http://redbrain.co.uk/wp-content/uploads/2013/12/photo.jpg" width="640" height="480" /></a>The code can be found: <a href="" href="https://github.com/redbrain/cython-book">https://github.com/redbrain/cython-book">https://github.com/redbrain/cython-book</a></p></description>     <pubDate>Tue, 10 Dec 2013 14:45:08 +0000</pubDate>

I would like this simplified to something like:
Philip Herron: Cython Book

http://redbrain.co.uk/?p=147

http://redbrain.co.uk/cython-book/
Hey all i thought i should really share that i actually wrote a book on Cython. The book has detailed examples and even shows you how you can extend native C/C++ applications in python by doing it for Tmux. http://bit.ly/195ahQs

[[feed-images/photo.jpg]]

The code can be found: https://github.com/redbrain/cython-book

basically, get the html code as close to org as reasonable. i found a way to get an html parse tree (libxml-parse-html-region start end), but I can't figure out how to convert that to the text I want.

Has anyone done anything like this?

John

-----------------------------------
John Kitchin
Associate Professor
Doherty Hall A207F
Department of Chemical Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
412-268-7803
http://kitchingroup.cheme.cmu.edu


reply via email to

[Prev in Thread] Current Thread [Next in Thread]