[Standards] Shared Editing

Emanuele Aina em at nerd.ocracy.org
Tue Sep 4 13:57:25 UTC 2007

Joonas Govenius indagò:

>> <http://www.selenic.com/mercurial/wiki/index.cgi/UnderstandingMercurial>
> This doesn't seem overly useful because the system is based on pulling
> changes; I think we want to push them. In particular, the merging
> described in
> http://www.selenic.com/mercurial/wiki/index.cgi/UnderstandingMercurial#head-75349837835f08689995a0777011390ee5dfd90d
> seems to work only because one user pulls first.

The main idea behind the Mercurial history model (which is also the GIT 
one as both are based on the Monotone one) is to have a DAG of revisions 
each one unambiguosly identified by a crypto hash (like SHA-1) of the 
data of the revision itself.

To avoid duplicate hashes, every hash is calculated concatenating the 
parent hash with the data to be hashed.

Mercurial has no knowledge about handling conflicts in a merge, it just 
does a simple three way merge letting the user resolve the conflicts.

Also, in Mercurial pulling and pushing are symmetrical, there isn't a 
real difference between the two. In fact push is implemented internally 
as a 'request to pull'. :)

I spoke about Mercurial as Jonathan looked at SVN and I thought that a 
good DSCM would be a more sensible base to take inspiration from.

That said, I don't know how well the Mercurial/GIT/Monotone history 
model applies to shared editing requirements...

> 1. The underlying protocol is designed for synchronization of XML
> instances between entities that communicate via XMPP.

As those are real SCMs, they handle arbitrary binary data without making 
any assumption on the content (other than for visualization purpose, 
e.g. hg diff).

XML could be handled as binary data but I think you are aiming at 
something more designed for XML.

For example, if you decide to ignore content preserving changes (e.g. 
changing the quotes around XML attributes from '' to "") you need to 
canonicalize the document to compute sensible hashes.

> 2. By "synchronization" is mean that all parties to a shared XML editing
> session must have the same representation of the XML object after all
> synchronization messages have been processed.

This would be similar to how developers synchonize their repositories 
pulling and pushing between each other.

Using crypto hashes to identify revision would ensure one download a 
revision once and only once, no matter how many other developers have it 
in their repository.

> 3. Ideally, it should be possible to use the protocol for multiple
> application types, e.g. SVG whiteboarding, XHTML document editing,
> collaborative data objects (XEP-0204).

I think that knowledge of the document type is only useful for conflict 
resolution and I think that this should be left to the application.

The protocol itself should only concern about distributing the result of 
the conflict resolution among users.

> 5. It must be possible to synchronize XML object instances either
> between two entities or among more than two entities.

This is why I thought that DSCM models would be a better source of 
inspiration than the SVN one.

> 10. Where possible, all edits should be commutative in order to minimize
> the need for locking of the XML object.

DSCM have no locking but branches that need to be merged.

I would leave the decision about how to do this merge to the application 
and focus only on how to distribute the resolution of the merge.

Maybe the merging algorithms can be described in separate XEPs (e.g. one 
for SVG, one for XHTML).

> 12. It must be possible to add new "nodes" (elements and attributes) to
> the XML object.
> 13. It must be possible to edit existing nodes.
> 14. It must be possible to remove existing nodes.
> 15. It should be possible to move a node from one location to another
> within the XML object.

The semantic of those actions is not preserved in the Mercurial model, 
as it is essentially snapshot-based.

Maybe the model used by Darcs could be a better fit, even if Darcs 
itself has a lot of algorhithmic problems in the implementation of its 
theory of patches.

> 19. It should be possible to retrieve the current state of the XML
> object from a party to the shared editing session.

In the DSCM model this is done by pulling from a random user and then 
synchronising with the others as they push their new changes.

> 22. It must be possible to discover which application types another
> entity can handle.

This would correpond to asking which document type the user is able to 
merge resolving possible conflicts.

> 25. It must be possible to uniquely identify each node in an XML object,
> both within the scope of a standalone shared XML editing session and if
> the edited object is shared with other appliclations or embedded in
> other objects.

This could be done using the revision hash and a xpath expression, but I 
think it would be only needed in a non snapshot-based history model.

> 28. It must be possible to specify the parent of any given node.

This would be solved using the crypto hashes.

> 39. It should be possible to log all synchronization messages and
> associated metadata for archiving and logging purposes.

Well, having a Mercurial repository as a result of a shared editing 
session would be really great! :)

Complimenti per l'ottima scelta.

More information about the Standards mailing list