[Members] wiki.xmpp.org data recovery
Guus der Kinderen
guus.der.kinderen at gmail.com
Fri Jun 23 10:07:37 UTC 2017
I've manually restored my application pages and all page's from Tobi's
archive that started with Summer_of_Code
>From that, I've learned that these manual modifications are needed for a
page that is transformed using the xidel / pandoc combination mentioned
- The table of content needs to be removed (Mediawiki will add one
- Everything that matches this regex need to be removed <span [^>]*>
(these were used to create anchors for the old ToC, I think)
- Everything that matches </span> needs to be removed (closing tags for
the anchors mentioned above)
- The old context root of the wiki was /web/, while the new one is
/index.php/ - search the text for web/ which gives you some old references
to pages and or user profiles
- Some pages start with a level 2 header - you'll have to reduce all
header levels down by one for these pages.
- Generally, get rid of <div> and <br> tags
- Images that are used on some pages are lost
- When images were used, there now is a table of two columns, each
column having a fixed with of 50%. You should drop that 50% fixation.
After that, Mediawiki's preview can be used for smell-testing your
On 22 June 2017 at 17:03, Goffi <goffi at goffi.org> wrote:
> Le jeudi 22 juin 2017, 10:06:05 CEST Guus der Kinderen a écrit :
> > Oh, that's actually handy. I'm not much of a bash scripter, but by
> > combining xidel (to select the part of the HTML that is the article
> > content) and pandoc (for conversion to the Mediawiki format), I'm getting
> > something that is pretty close. Example:
> > $ xidel --html Edwin_Mons_Application_2011.html --css
> "#mw-content-text" |
> > pandoc --from html --to mediawiki
> > Can someone improve on that?
> We can also use weboob with webcontentedit to automatize publishing on the
> wiki, something like
> $ xidel --html Edwin_Mons_Application_2011.html --css "#mw-content-text" |
> pandoc --from html --to mediawiki |
> webcontentedit edit Edwin_Mons_Application_2011
> Add curl or wget to the game, and I think we can make a script to handle
> not too badly, we can fix issues after by hand.
> I'm too busy right now to work on a script, but it should not be really
> complicated to do.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Members