Earthli gets OpenGraph and Twitter metadata
Most tools that scrape web pages use the OpenGraph metadata embedded in web pages. Some fall back to using the more general and older metadata tags, like description
or the <title>
element, but this leads to a rather limited embedding. Almost no-one extracts pictures from pages unless explicitly requested to do so by metadata.
Until recently, earthli didn’t include this metadata, leading to somewhat substandard rendering of any links pasted to social media.
Sample Metadata
As an example, the article NY Times Spelling Bee now includes the following OpenGraph metadata:
<meta name="twitter:image"
content="https://…/forthwith.png" />
<meta property="og:url"
content="https://…view_article.php?id=3974" />
<meta property="og:title" content="NY Times Spelling Bee" />
<meta property="og:type" content="website" />
<meta property="og:description" content="I recently
wrote that Kath and I have a one-year streak going in the
NYT Crossword Puzzle. While that is still ongoing,
we've also recently discovered a little gem
called Spelling Bee. The concept is …" />
<meta property="article:author" content="marco" />
<meta property="article:published_time"
content="2020-05-16 20:39:52" />
<meta property="article:modified_time"
content="2020-05-21 21:15:08" />
<meta property="og:image"
content="https://…/forthwith.png" />
<meta property="og:image:width " content="2562" />
<meta property="og:image:height " content="1566" />
The same article also now has Twitter metadata:
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="" />
<meta name="twitter:creator" content="@mvonballmo" />
<meta name="twitter:title" content="NY Times Spelling Bee" />
<meta name="twitter:description" content="I recently
wrote that Kath and I have a one-year streak going in the
NYT Crossword Puzzle. While that is still ongoing,
we've also recently discovered a little gem
called Spelling Bee. The concept is …" />
Twitter refuses to use any of the OpenGraph information, so you really need to include both copies.
Implementation
Some of the properties aren’t necessarily required, but it was easy enough to generate them all from earthli’s general facilities.
- The AUDITABLE object provides the creator, creation date, and the last-modification date.
- The CONTENT_OBJECT object provides the title and the description.
- The DRAFTABLE_ENTRY updates the creation date to the publication date for articles.
- The ATTACHMENT_HOST sets the image to the first attachment.
- The IMAGE_METRICS provides size information for the default image
- The MUNGER_STRIPPER extracts and formats text for the description so that it is legal HTML.
- The ALBUM sets the image to the main picture for the album, if available.
- The PICTURE sets the image to the picture’s URL.
In addition, I added support for SOCIAL_PAGE_OPTIONS
and introduced a method on the data hierarchy called set_social_options()
, which allows the data objects to enrich the social options before they’re formatted into the metadata area of the page header. A page must enable the social options and explicitly request to generate them, a feature I only enabled from the view_entry.php
and view_folder.php
pages.
The results are shown below.
Apple Messages
Apple Messages uses the OpenGraph tags to make nicely formatted previews now.
I haven’t actually posted anything to Facebook, but was able to use the Social-graph Testing tools to see how posts would look.
I only tested articles with Twitter because I don’t anticipate ever tweeting photos or albums. The tweet is nicely formatted now, with or without an attachment. Previously, Twitter displayed links to earthli as only a simple title, with no description or image.