[Members] wiki.xmpp.org data recovery
Kevin Smith
kevin.smith at isode.com
Tue Jun 27 14:50:07 UTC 2017
On 27 Jun 2017, at 13:05, Alexander Gnauck <gnauck at gmail.com> wrote:
> Kevin, you ROCK.
> Thanks a lot *CLAP*
I’d like to take credit, but it was mostly thanks to archive.org having a dump in a mostly restorable format.
Regardless, we’ve even got the page history back.
/K
>
> Alex
>
> ------ Original Message ------
> From: "Kevin Smith" <kevin.smith at isode.com>
> To: "XSF Members" <members at xmpp.org>
> Sent: 27.06.2017 13:11:07
> Subject: Re: [Members] wiki.xmpp.org data recovery
>
>> I believe that I have now recovered all the page content (not yet images) as of October 2015. If any of Tobi’s dump is newer than that, please continue to import it.
>>
>> /K
>>
>>> On 24 Jun 2017, at 07:47, Guus der Kinderen <guus.der.kinderen at gmail.com> wrote:
>>>
>>> True to my nature as a software developer, I never considered the simple, pragmatic approach :) Could you restore a couple of pages that way Arc (for instance, your past membership applications)?
>>>
>>> On 23 June 2017 at 19:37, Arc Riley <arcriley at gmail.com> wrote:
>>> Thanks guys for all this work.
>>>
>>> I'm somewhat humored by it, most of these pages have minimal formatting so it'd be easy enough to copy/paste.
>>>
>>> On Fri, Jun 23, 2017 at 8:07 AM, Guus der Kinderen <guus.der.kinderen at gmail.com> wrote:
>>> I've taken Tobias' http://ayena.de/files/wiki.xmpp.org.zip archive and pulled every HTML file there through the Xidel / Pandoc wringer. The resulting content is stored in a file with a different extension. I've created a new archive of everything here: http://goodbytes.nl/with-conversion.tar.gz (same as Tobias' archive, but with additional files). With these files, and the list of manual modifications that I mentioned in my last message, content is restored with relative ease.
>>>
>>> On 23 June 2017 at 16:59, Kevin Smith <kevin.smith at isode.com> wrote:
>>> On 23 Jun 2017, at 11:07, Guus der Kinderen <guus.der.kinderen at gmail.com> wrote:
>>> >
>>> > I've manually restored my application pages and all page's from Tobi's archive that started with Summer_of_Code
>>> >
>>> > From that, I've learned that these manual modifications are needed for a page that is transformed using the xidel / pandoc combination mentioned earlier:
>>> > • The table of content needs to be removed (Mediawiki will add one automatically)
>>> > • Everything that matches this regex need to be removed <span [^>]*> (these were used to create anchors for the old ToC, I think)
>>> > • Everything that matches </span> needs to be removed (closing tags for the anchors mentioned above)
>>> > • The old context root of the wiki was /web/, while the new one is /index.php/ - search the text for web/ which gives you some old references to pages and or user profiles
>>> > • Some pages start with a level 2 header - you'll have to reduce all header levels down by one for these pages.
>>> > • Generally, get rid of <div> and <br> tags
>>> > • Images that are used on some pages are lost
>>> > • When images were used, there now is a table of two columns, each column having a fixed with of 50%. You should drop that 50% fixation.
>>> > After that, Mediawiki's preview can be used for smell-testing your resulting page.
>>>
>>> Thanks very very much, Guus.
>>>
>>> /K
>>>
>>>
>>>
>>
>
More information about the Members
mailing list