cellio: (Default)
Monica ([personal profile] cellio) wrote2020-11-29 09:54 pm
Entry tags:

our legacies are not always what we think they will be

In the mid-80s, in my first full-time position after college, I worked for a now-defunct software company doing artificial intelligence, specifically natural-language processing. The most significant project I worked on while there was a text categorization system. I was the tech lead (this was 1987ish). The client was Reuters, who at the time had literal rooms full of people whose job was to skim news stories coming over the wire, attach categories to them, and send them back out quickly. Our job was to automate that -- or, more realistically, to automate the parts that machines could do and send a much smaller set of "don't know" cases to humans. I'm writing this from memory; it's been more than 30 years and details are fuzzy.

I left that company and went on to do other things. I was vaguely aware that, at some point, the corpus of news stories we used for training data had been released publicly, by agreement between Reuters and my then-employer. I wasn't a researcher, wasn't in the NLP business any more, and lost touch. Technology moves on, and I figured our little project had long since faded into obscurity.

Tonight I got email with a question about that data set. My name is in the README file as one of the original compilers, and somebody tracked me down.

Somebody still cares about that data set.

I Googled it. Our data set was popular for close to a decade, during which time people improved the formatting (SGML, baby!) and cleaned up some other things. It spawned a child -- the original either had, or had acquired, some duplicate entries, and the new one removed them. (The question I got was actually about the child data set.) And now I'm curious about the question I was asked too, because I either don't know or don't remember how it got that way.

Neat!

minoanmiss: A detail of the Ladies in Blue fresco (Default)

[personal profile] minoanmiss 2020-11-30 03:58 am (UTC)(link)
This is nifty!
madfilkentist: (Mokka Librarian)

[personal profile] madfilkentist 2020-11-30 10:58 am (UTC)(link)
It's nice when things you work on live on and get used. While I was at the Harvard library, I wrote the bulk of the code for JHOVE, software for identifying formats, validating files, and extracting metadata. It's still widely in use by libraries and archives, and it's still being maintained, though no longer at Harvard, over 15 years after I started work on it.
hlinspjalda: Rolakan 5 (Default)

[personal profile] hlinspjalda 2020-11-30 10:08 pm (UTC)(link)
That's great!

Mr. Fixer had a similar experience quite recently with email about the CES DTD -- which he expected would never be touched again this many years after it was produced.

I wonder if covidtime is partially responsible for this sort of experience.

Awesome!

[personal profile] moe37x3 2020-12-01 03:09 pm (UTC)(link)
Like an old friend coming back to visit. It feels like an occasion to say "Blessed be the One who Revives the Dead" (https://www.sefaria.org/Shulchan_Arukh%2C_Orach_Chayim.225?lang=bi).

The closest I've had to this was a few years ago, when a high-school friend of mine unearthed and sent me my hand-written code for producing a "Dragon Curve" fractal on a TI-82. I got a TI-83 emulator for my phone, typed in the code, and was rewarded with a fractal produced by my teenaged self.
magid: (Default)

[personal profile] magid 2020-12-03 12:04 am (UTC)(link)
That is so cool!
andrewducker: (Default)

[personal profile] andrewducker 2020-12-04 04:27 pm (UTC)(link)
That is remarkably cool.