Concordia: News and Views

http://planet.atlantides.org/concordia

Tom Elliott (tom.elliott@nyu.edu)

June 05, 2008

Sebastian Heath (Mediterranean Ceramics)

More on Barrington Atlas IDs and a Question

Sean Gilles has blogged about deriving unique representations of geographic names from the Barrington Atlas.

He suggests the pattern:
http://pleiades.stoa.org/batlas/{label-normalized}-{map}-{grid}

See the complete post for how to transliterate names containing non-ASCII characters into URL friendly (near) equivalents. It's mostly simple: 'Ağva' becomes 'agva' but there are more interesting cases.

Quick question: should the host component of the url be 'pleiades.stoa.org' or 'atlantides.org'? This may not matter from a redirection point-of-view but consistency, searchability, and interchange issues might make it desirable to designate one or the other as the preferred form.

June 04, 2008

Sebastian Heath (Mediterranean Ceramics)

Mapping GRBPIlion

I continue to be interested in using the Atom Syndication Format, Georss, KML and Google's mapping tools to express the geographic component of data related to the ancient Mediterranean. These formats are all simple, well-documented and xml-based so that Greek, Roman and Byzantine Pottery at Ilion (Troia), in which we try to use open standards, is a good test-bed for trying out ideas.

To cut to the chase, the following URLs show what I'm up to:Clicking on the Google Maps link shows a browser-embedded map with a short list of sites on the left. It's early yet, but that list will expand. Regardless, clicking on a site name on the left will bring up a text-bubble pointing to the right place on the map. Within the bubble are one or more links to relevant pages on the GRBPIlion website. Imagine more dots and you get the idea.

The implementation is pretty simple but should give me flexibility going forward. There are three basic components. The file "geography.atom" defines geographic entities. If you look inside you'll see that I derive unique IDs from the Barrington Atlas, so Gaza is "http://atlantides.org/batlas/gaza-70-e2". In doing so, I follow the suggestions of Tom Elliot of ISAW. Looking inside "groups.xml" - which instantiates concepts such as "African Red Slip" for rendering into html - shows that a few such groups make reference to these geographic entities. Search for 'rel="geographic"' to see what I mean. Finally, I munge those two files into "grbpilion.kml", which can be shown directly in Google Earth or via Google Maps using the URLs listed above.

The xslt that does the munging is "kml.xsl". It's pretty ugly right now but it works so will do for the short term.

At a more abstract level, I can theoretically put elements such as '<link rel="geographic" href="http://atlantides.org/batlas/gaza-70-e2" />' anywhere in the publication. Right now I only implement this idea in groups.xml but I look forward to extending this system to individual sherd descriptions and to the bibliography.

May 21, 2008

Sean Gillies's Weblog: Geography, Python, the Web

Barrington Atlas Feature IDs and Unicode Normalization

Last week I was NYU's Institute for the Study of the Ancient World to plan our Concordia project, an effort to interlink projects like Pleiades, IRCyr, the ANS database, and APIS, and build a traversable, searchable network of data based on Web architecture. I'll be blogging more about this all through the year -- expect my blog to intersect more with the Mapufacture and FortiusOne blogs if they continue on course. Some may even be interesting to mainstream GIS designers and developers. ESRI's implicit embrace of Web architecture (along with the explicit embrace of Google) was one of the big stories out of Where 2.0, after all.

One thing that cropped up in our sessions was the need of other Ancient World projects to be able to refer to Barrington Atlas features in Pleiades by URIs derived from their atlas labels. Tom came up with a template for these URIs:

http://pleiades.stoa.org/batlas/{label-normalized}-{map}-{grid}

and simple rules for normalization that just so happen to be already implemented in plone.i18n. We've forked (in a friendly way: forking is now cool thanks to git, right?) plone.i18n and removed the Zope utilities and all dependency on the Zope component architecture. The result is pleiades.normalizer, and it reduces Barrington Atlas labels which may contain annotation and non-ASCII characters to ASCII strings suitable for use in the URI template:

>>> from pleiades.normalizer import normalizer

>>> list(normalizer.normalizeN(u'Tetrapyrgia'))
['tetrapyrgia']

>>> list(normalizer.normalizeN(u'Timeles fl. '))
['timeles-fl']

>>> list(normalizer.normalizeN(u'*Tyinda'))
['tyinda']

>>> list(normalizer.normalizeN(u'[Agrai]'))
['agrai']

>>> list(normalizer.normalizeN(u'Kalaba(n)tia'))
['kalabantia']

>>> list(normalizer.normalizeN(u'Tripolis ad Maeandrum/Apollonia ad Maeandrum/Antoniopolis'))
['tripolis-ad-maeandrum', 'apollonia-ad-maeandrum', 'antoniopolis']

>>> list(normalizer.normalizeN(unicode('Ağva', 'utf-8')))
['agva']

>>> list(normalizer.normalizeN(unicode('Çaykenarı', 'utf-8')))
['caykenari']

The algorithm normalizes non-ASCII characters (normal form KD) and discards elements which are not letters or digits:

U+011F -> (g, U+0306) -> g

U+0131, the last character in Çaykenarı, is a bit of a troublemaker. Our ASCII "i" has the diacritical mark relative to its dotless latin cousin, the inverse of the usual situation, and we have to make a special exception for it in the code.

If you're getting started in web development with Python you might find pleiades.normalizer handy, or at least a starting point for your own normalization code. We won't be publishing it to PyPI, but you can get an egg via:

$ easy_install http://atlantides.org/eggcarton/pleiades.normalizer-0.1.tar.gz

May 08, 2008

Tom Elliott (Horothesia)

Uptime

Now back up and running: all public-facing services hosted on atlantides.org (including the Concordia website, the Pleiades development environment, the Atlantides feed aggregators and the inscriptol mercurial repository) are up and running.

May 07, 2008

Tom Elliott (Horothesia)

Downtime

Update (0855 EDT, 13 May): at present the server is down; individual services, beginning with Concordia, will come back up as they are reinstalled over the course of today.

It's time for a server operating system upgrade, so atlantides.org will be down today (6 May 2008), between noon and 8:00 p.m. U.S. Eastern time (GMT/UTC + 5). The length of the outage is likely to be shorter than this window. The Atlantides feed aggregators, as well as the Pleiades and Concordia development environments will be inaccessible during this upgrade. pleiades.stoa.org will remain up and accessible throughout.

April 22, 2008

Tom Elliott (Horothesia)

New Planets: Concordia and Pleiades

Neither Pleiades nor Concordia has its own news blog; and I hope you'll forgive the fact that the team members were reluctant to create same, since some of us already blog in multiple places. The solution? Aggregate and filter our regular blog posts into project-specific streams. So today I have added to the Atlantides system the following:
  • Concordia: News and Views (html | rss)
  • Pleiades: News and Views (html | rss)
Hat-tip to the regexp_sifter.py filter in Sam Ruby's Venus.

April 02, 2008

Tom Elliott (Horothesia)

Concordia licensing and openness

Andy Powell hopes "that the conditions of funding in this case mandated that the resulting resources be made open rather than just free" and wonders what licenses will govern the content produced or incorporated by the various projects funded under the joint NEH/JISC Transatlantic Digitization Collaboration grants.

I can only speak for the Concordia project, a collaboration of ISAW and CCH.

In answer to the first question: no, I am not aware of any mandate placed on us in this regard. We did make explicit commitments of our own in our proposal about licensing, and we're now bound to abide by those.

Here is a list of the software and content that we will use, modify or produce, indicating the license that now governs (or will govern) each:

March 27, 2008

Sean Gillies's Weblog: Geography, Python, the Web

GeoRSS 2.0?

I saw several references to "GeoRSS 2.0" recently by people who are attending the OGC TC meeting in St. Louis. Here's my 2 cents on 2.0:

  • Deprecate the simple geometry encoding and just go with GML in a georss:where element already. Yes, this is a complete about face from my position in 2006. Nevermind XML Schema, I'm just saying that the GML encoding is more expressive, more clearly defined, and not significantly more difficult to produce or parse. I've even contributed a GML parsing patch for the Universal Feed Parser. If I can do it, how hard can it be?
  • Specify cardinality: an entry should have 0 or 1 geometry (see following comments about multiple locations).
  • According to the 1.0 spec: "GeoRSS geometry is meant to represent a real feature of the Earth's surface". Why not above the surface, or below? Do you really think that's going to stop Satan from using GeoRSS to geo-tag news from the hellish center of the Earth? I think this language can go in 2.0 and take elev, floor, and radius along with it. We're all better off using 3D coordinate reference systems, and the quick adoption of KML shows that a third dimension isn't a serious stumbling block.
  • In the next paragraph, it is written: "GeoRSS is a way of relating Web content to Earth features". That statement is more dense and chewy that it seems. To me, it seems like specifying nothing more than how to add location to entries is a worthy enough effort. The relationship of entries to the real world is really more of a semantic web concern.
  • The semantics of featuretype tag and relationship tag better belong to entry entities than to location entities. The fact that they are not significantly used is, in my opinion, tacit acknowledgement of the truth of that statement. Atom and RSS already allow you to categorize entries and items, which makes featuretype tags unnecessary. Atom also provides a mechanism to relate entries to other entities on the Web: links obviate the need for relationship tags.
  • Multiple locations I have already blogged. Entries are cheap.
  • The 1.0 spec only considers literal geometries. Referencing external geometries is something else that's being considered for a future version. I'm -0 on the first option (inspired by http://rest.blueoxen.net/cgi-bin/wiki.pl?XMLSemanticWeb) and -1 on the second (inspired by http://tools.ietf.org/html/rfc4287#section-4.1.3.2). I now prefer an approach that first came up on the GeoRSS list here, and which Tom Elliot and I had also been discussing that fall: using Atom links and custom relations. The equivalent of that first option could instead be something like:
<entry>
  ...
  <link
    rel="http://www.georss.org/relations/is-colocated-with"
    href="http://pleiades.stoa.org/places/639166.atom"
    type="application/atom+xml;type=entry"
    />
</entry>

Update (2008-03-28): Notes on my adventures in this direction are at http://atlantides.org/trac/concordia/wiki/AtomLocationByReference.

If you push more semantics from the location up into the entry, using Atom and its extensions, the ground that GeoRSS has to cover shrinks, a good thing for any specification. But what about RSS 1.0 and 2.0? The former can draw on RDF, of course. Maybe the latter needs links from SSE.

March 26, 2008

Tom Elliott (Horothesia)

Concordia grant award

Yesterday I had the pleasure of attending a nice event at the Folger library during which the Chairman of the National Endowment for the Humanities announced the award of 5 grants under the NEH/JISC joint Transatlantic Digitization Collaboration rubric (press release).

I'm happy to report that Pleiades is part of one of the winning proposals. The award goes jointly to ISAW at NYU and to CCH/Classics at King's College, London for a collaboration we're calling "Concordia" (to reflect its focus on cross-project interoperability). The principal investigators are Roger Bagnall and Charlotte Roueché. Sean Gillies, Gabriel Bodard and I will join them in working on the project. The period of performance is 1 April 2008 - 31 March 2009.

What will we do?
Our advisory board: