On Sun, 20 Aug 2023 at 09:35, Guus der Kinderen
<guus.der.kinderen(a)gmail.com> wrote:
I can't believe this was in August...
I've been toying with an Openfire-based
implementation for the Reporting Account Affiliations plugin. The plugin has not seen much
real-world use yet (it barely has been smoke tested). So far, I've only implemented
the 'reporting' functionality (the Info-request query mechanism, and the embedding
things). I have not yet implemented a consumer part (eg: using account info to do
something with regards to permissions). For the curious, the code is here:
https://github.com/igniterealtime/openfire-accountaff-plugin
I'm trying to get my own implementation finally finished and published soon.
> When implementing this, I wondered about the following.
> Is embedding worth it?
> A (considerable) benefit of embedding
account info in stanzas is that it saves round-trips: whenever an entity wants to know the
data, it does not necessarily need to perform a lookup. Are there other benefits that
I've not yet identified?
Lookups are a pain. Your comment focuses on the time saved, but
sending queries and waiting for a response generally consumes
resources. The other party might take a long time to response, during
which time we have to keep the stanza and associated state on hold.
Now consider this multiplied 1000x during someone flooding a MUC room
with automated bots, or whatever.
I think there are significant benefits in stanzas bearing this
information up-front if it is likely to be needed. In some cases I
think I would rather fall back to the current behaviour than implement
remote lookup. E.g. if I wanted to automatically grant voice to new
MUC users with sufficient RAA credentials, I am more likely to just
leave users who don't present RAA info without voice than to implement
a lookup and complex async join logic.
The round-trip that's saved comes with a caveat:
prior to usage, a recipient MUST perform a disco/info query to the originating domain, to
see if the provided data can be trusted. This can be optimized (through caching), and the
disco/info result can be re-used for all accounts that originate from the domain - but
still, it's not quite round-trip free.
As you say, I anticipate such lookups only being done infrequently.
Something about adding data to a stanza that is
inherently 'unsafe' (something that MUST be verified before use) does not sit well
with me. It is invitingly easy for an implementation to skip, or have an issue in that
second step, which would lead to misinterpretation of data that has potential to be used
for authorization purposes.
I think it happens all the time that entities send stuff that cannot
be trusted or needs additional verification. Data received from third
parties should be untrusted by default, and that's just a general
principle of most systems.
I expect that the account information data is rarely
used - although I'll readily admit that I've not coded this part yet, and that
there might be future use-cases that I'm not thinking of. Assuming that I'm right,
then the embedding of data on many stanzas can add considerable overhead: not only in
stanza size (which possibly gets persisted in MAM archives - which leads to interesting
questions around the validity of this data over time), but also to compute the value at
the originating server. The XEP addresses this by providing guides on when to embed data
(more on that below), but a significant amount of overhead remains. Is that worth the
savings of a round-trip (especially when a verification request might be needed anyway)?
I think there are only a very small number of stanza types where this
needs to be added, so I'm not too worried about overhead.
The XEP defines three types of stanzas in which to
embed account info. I've found implementing this to be a lot harder than anticipated
(which very well might be a result of Openfire's API). One thing that I struggled with
was to choose when to act on a stanza that was being processed: when processing inbound,
client-originating stanzas, I worry about missing stanzas that are generated by the server
on the user's behalf. When applying this to outbound stanzas, it's not
straight-forward to identify stanzas that originated from local users - and even if you
do, something like MUC can have modified the addressing, making identifying the correct
local account even harder. The complexity adds up, which leads to implementations that are
more error-prone.
You're not alone, it's not "hard", but it's not trivial in Prosody
either. I'm still working on it.
If I understand the XEP correctly, it specifies that a
server must remove account info from client-originating stanzas if the server has support
for embedding account info for that particular type of stanza (to avoid spoofing). Given
the complexity that I tried to describe above, I believe that this is error-prone. Can we
consider mandating that ALL account info from client-originating stanzas is to be removed,
if the server has support for any kind of account reporting? Is there any reason to clean
up this data in some, but not all client-originating data?
Pretty much just to avoid the overhead of scanning all stanzas. As
mentioned above, I think RAA info only needs to be embedded in a very
small subset of stanzas. Paying the cost on *all* stanzas across a
server seems excessive. We have previously discussed this in the past
related to the various elements we have now that have a 'by' attribute
that ought to be filtered.
Given all of the above, I'm cautiously arguing
that all of the 'embedding' should be removed from the XEP. I do not believe that
its benefit (preventing a query, sometimes) outweighs the drawbacks (in added complexity
and overhead). Removing it might lead to implementations that are easier (faster) to build
and are less error-prone.
I think it would be easier to remove embedding for the sending side,
but conversely it will make the receiving side much more complicated.
For that reason I would rather keep it. Of course I reserve the right
to change my mind as I continue implementing it....
As a last remark: the explicit query for information
is made against the bare JID of the entity for which information is seeked. XMPP dictates
that a server MUST answer such IQ queries on behalf of the user, so this works. I do not
like how this implies that another entity is being asked to provide data than the entity
that we require to provide the data: the XEP explicitly does not want a client/user to
provide this data (that's even defined as 'spoofing'). It wants a server to
provide this data. In this light, it would be clearer to address the request to the server
JID (without a node-part) instead of an account JID.
I see what you're saying. I don't feel strongly, but it doesn't feel
right to me. There are a bunch of things that we address to the
account if we want to learn about the account. I don't think this is
any different. Directing queries to the server JID and putting a
username field in the stanza seems worse than what the XEP currently
proposes. For example, blocklists will do the right thing with the
current approach.
I'll continue thinking about some of this stuff as I finish the implementation.
Regards,
Matthew