Subscribe to our mailing list

* indicates required
Close

Monday, April 13, 2009

Coming to grips with CMIS

I'm slowly but surely coming to grips with CMIS (Content Management Interoperability Services), which will soon be the lingua franca of CRUD in the content management world, and maybe some other worlds as well.

After reading some of the CMIS draft docs and watching a couple of EMC's CMIS videos at YouTube, I'm starting to grok the basic abstractions. Here are a few first impressions. I offer these impressions as constructive criticism, BTW, not pot-shots. I want to see CMIS succeed. Which also means I want to see it done right.

The v0.5 draft doc for the Domain Model says there are four top-level ("first class", root) object types: Document, Folder, Relationship, and Policy. (Support for the Policy type is optional. So there are basically three root types.)

Already I question whether there shouldn't perhaps be a top-level object type ("CMISObject") that everything inherits from, rather than four root objects, since presumably all four basic object types will share at least a few characteristics in common. But maybe not.

Page 16 of the Part I doc says that Administration is out of scope for CMIS. But later on, we learn that "A policy object represents an administrative policy that can be enforced by a repository." We also find applyPolicy and removePolicy operations, which are clearly administrative in intent.

Remarkably, Policy objects can be manipulated through standard CMIS CRUD operations but do not have a content stream and are not versionable. However, they "may be" fileable, queryable, or controllable. Why are we treating this object as a file ("fileable") but not allowing it to be versionable? And why are we pretending it doesn't have a content stream? And why are we saying "may be"? This is too much fuzziness, it seems to me.

Right now, the way CMIS Part I is worded, a "policy" can be anything. One might as well call it Rules. Or Aspects. Or OtherStuff. The word Policy has a specific connotation, though. Where I come from, it implies things like compliance and governance, things that MAY intersect role constraints, separation of duties, RBAC, and possibly a lot more; and yes, these concepts do come up in content management, in the context of workflow. But it seems to me that policy, by any conventional definition, is rather far afield from where CMIS should be concentrating right now. If "policy" means something else here, let's have a good definition of it and let's hear the argument for why it should be exposed to client apps.

I say drop the Policy object type entirely. It's baggage. Keep the spec light.

I like the idea of having Relationships as a top-level object type. The notion here is that you can specify the designation of a source object and a target object that are related in some way that the two objects don't need to know about. I like it; it feels suitably abstract. And it models a construct that's used in all sorts of ways in content management systems today.

The Folder object type, OTOH, is too concrete for my tastes. We need to stop thinking in terms of "folder" (which is a playful non-geek term for "directory", designed to make file systems understandable by people who know about manila folders), and think more abstractly. What notion(s) are we really trying to encapsulate with the object type currently dubbed "Folder"? At first blush, it would seem as though navigability (navigational axes) constitute(s) the core notion, but the possible graphs allowed by Folder do not match popular navigational notions inherent in file-system folders (at least on Windows). In other words, the many-to-many parent-child mappings allowed by CMIS's Folders destroy the conventional "folder" metaphor, unless you're a computer science geek, in which case you don't think in terms of folders anyway.

I think what "Folder" should try to encapsulate is a Collection of Relationships. A navigation hierarchy (whether treelike or not) is just one possible subclass of such a collection. We cheat ourselves by trying to emulate, at the outset, some parochial notion of "folders" based on a particular type of graph. We need Folder to be more general. It is a Collection of Relationships. We already have Relationships, so why not take the opportunity to reuse them here?

I'd like to see more discussion about Folders, but I fear that the rush to get CMIS blessed by OASIS may have already precluded further discussion of this important issue. I hope I'm wrong.

Interesting stuff, though, this CMIS. And wow, do I still have a lot of grokking to do . . .

5 comments:

  1. I agree with your notion of doing away with folders and looking for something more abstract. I have tried to implement a wiki solution on our operational system in lieu of using a shared network drive with folders.

    One of my biggest things I am fighting is getting people to let go of their precious folders. They think information they seek is in a specific folder, not realizing that is the very reason they can't keep folders in order.

    By switching to a cloud-based solution with navigable links, the information can become more fluid and linkable. But some people flat out won't use this system, just so they can hold on to their folders, despite how easy it is to link information together, search, and cross-index.

    ReplyDelete
  2. Having a toplevel CMISObject type wouldn't bring anything to the model, except for a way to search all types of objects at the same time (folders, documents, relationships) which is something a lot of the repository will have a hard time doing efficiently. I really don't think it's needed.

    Policies are being reworked a lot, and will probably turn into something like ACLs. See the latest ACL proposal at http://www.oasis-open.org/committees/documents.php?wg_abbrev=cmis

    You're the first person I see having a problem with Folders :) Really, folders are a fact of life for many, *many* repositories, and won't go away any time soon. There's peril in over-abstracting things... CMIS is not trying to invent new abstractions, it's trying to be a common ground among many repositories -- and folders belong to that common ground.

    Note however that a repository is free to have all its document unfiled, and to disallow the creation of any folder. That's how repositories dealing with records management will probably choose to expose their functionality through CMIS.

    ReplyDelete
  3. My understanding of the spec is that it is geared towards classic document management systems and their requirements. In fact, the spec explicitly rules out *web* content management as a use case. Therefore, it is understandable that abstractions like e.g. "node" in JCR are not found in CMIS - they make no sense for a DMS.

    OTOH I don't get why this DMS focus of CMIS is not discussed more.

    Cheers
    Michael

    ReplyDelete
  4. trackback url: http://scroisier.posterous.com/fmis-or-cmis

    ReplyDelete
  5. "Abstract" is too abstract...

    In the absence of a tool to manage a controlled hierarchy, a many-to-many folder structure is a "good enough" solution. It allows contributors to make collections of their documents in a personal taxonomy, but does not prevent other kinds of groupings based on global taxonomies or folksonomies.

    ReplyDelete

Add a comment. Registration required because trolls.