[Members] wiki.xmpp.org data recovery

Guus der Kinderen guus.der.kinderen at gmail.com
Sat Jun 24 06:47:55 UTC 2017


True to my nature as a software developer, I never considered the simple,
pragmatic approach :) Could you restore a couple of pages that way Arc (for
instance, your past membership applications)?

On 23 June 2017 at 19:37, Arc Riley <arcriley at gmail.com> wrote:

> Thanks guys for all this work.
>
> I'm somewhat humored by it, most of these pages have minimal formatting so
> it'd be easy enough to copy/paste.
>
> On Fri, Jun 23, 2017 at 8:07 AM, Guus der Kinderen <
> guus.der.kinderen at gmail.com> wrote:
>
>> I've taken Tobias' http://ayena.de/files/wiki.xmpp.org.zip archive and
>> pulled every HTML file there through the Xidel / Pandoc wringer. The
>> resulting content is stored in a file with a different extension. I've
>> created a new archive of everything here: http://goodbytes.nl/with-conve
>> rsion.tar.gz (same as Tobias' archive, but with additional files). With
>> these files, and the list of manual modifications that I mentioned in my
>> last message, content is restored with relative ease.
>>
>> On 23 June 2017 at 16:59, Kevin Smith <kevin.smith at isode.com> wrote:
>>
>>> On 23 Jun 2017, at 11:07, Guus der Kinderen <guus.der.kinderen at gmail.com>
>>> wrote:
>>> >
>>> > I've manually restored my application pages and all page's from Tobi's
>>> archive that started with Summer_of_Code
>>> >
>>> > From that, I've learned that these manual modifications are needed for
>>> a page that is transformed using the xidel / pandoc combination mentioned
>>> earlier:
>>> >       • The table of content needs to be removed (Mediawiki will add
>>> one automatically)
>>> >       • Everything that matches this regex need to be removed <span
>>> [^>]*> (these were used to create anchors for the old ToC, I think)
>>> >       • Everything that matches </span> needs to be removed (closing
>>> tags for the anchors mentioned above)
>>> >       • The old context root of the wiki was /web/, while the new one
>>> is /index.php/ - search the text for web/ which gives you some old
>>> references to pages and or user profiles
>>> >       • Some pages start with a level 2 header - you'll have to reduce
>>> all header levels down by one for these pages.
>>> >       • Generally, get rid of <div> and <br> tags
>>> >       • Images that are used on some pages are lost
>>> >       • When images were used, there now is a table of two columns,
>>> each column having a fixed with of 50%. You should drop that 50% fixation.
>>> > After that, Mediawiki's preview can be used for smell-testing your
>>> resulting page.
>>>
>>> Thanks very very much, Guus.
>>>
>>> /K
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/members/attachments/20170624/068bff3f/attachment-0001.html>


More information about the Members mailing list