Thursday, October 25, 2007

Tagging and Folksonomy reading from ASIST

No comments :
Thanks to the Taxonomy Watch blog a pointer to the current issue (Oct/Nov 2007) of the Bulletin of the American Society of Information Science and Technology which has a section about folksonomies and tagging. Glorious and very timely.

Introduction: Folksonomies and Image Tagging: Seeing the Future?
by Diane Neal, Guest Editor
Full Text: HTML | PDF (Size: 502k)

Why Are They Tagging, and Why Do We Want Them To?
by P. Jason Morrison
Full Text: HTML | PDF (Size: 96k)

Trouble in Paradise: Conflict Management and Resolution in Social Classification Environments
by Chris Landbeck
Full Text: HTML | PDF (Size: 95k)

Image Indexing: How Can I Find a Nice Pair of Italian Shoes?
by Elaine Ménard
Full Text: HTML | PDF (Size: 109k)

Flickr Image Tagging: Patterns Made Visible
by Joan Beaudoin
Full Text: HTML | PDF (Size: 82k)



As i recently blogged this December i will be hosting a round table event to discuss Folksonomies and Taxonomies in the Enterprise - which will also include a tour of our printing plant in Palo Alto. If you are interested or know someone who is working in this space and would like attend, please drop me a line we still have a couple open seats. daniela{dot}barbosa @dowjones.com

Tuesday, October 23, 2007

Metadata thanks to a crack team of whipsmart librarians

No comments :
Nice catch by Ian Kennedy as usual over on his blog pointing us to this post on The New York Times Open Source blog about the Metadata that they are making available as part of their online open archive. They are making a good portion of the Metadata available for all their electronic Web content from 2001 onwards.

In my last post on this topic i mentioned some of the challenges of adding Metadata- especially only using machines. so i certainly grinned when i read how the NY Times deals with it:

"Summarization is a particularly tough one. At The Times, our goal is to apply our metadata to describe the essential summary of the story; this is more than mere entity extraction is capable of doing. Instead, we have tackled this problem by developing the most advanced computational text-categorizing system known to mankind: a crack team of whipsmart librarians. Armed with some guidelines and an organizational zeal, they’re able to maintain consistent tagging rules on our daily output. They and their predecessors have been doing this for our material all the way back to 1851."

The NY Times seems to be keeping an eye on those out their hacking different news views using the Metadata that is available and even go so far as to ask people to hack something even cooler....- nice.

photo credit: emdot

Thursday, October 18, 2007

Value of Common Metadata Schemas

2 comments :
This post's topic first came up on my 'to blog radar' when Ian Kennedy blogged about Mining the New York Times Archives a couple weeks back pointing out that NY Times online articles (who recently opened their archives for consumers) came with rich Metadata within each article. Metadata tags like bylines, what people the article was about, the company, the region etc. that can be used to build applications that display/deliver content.

Dave Winer's smart little script to track NY Times article key words and his orginal post in which he asks basically for a standard in news content Metadata (well he wants to look at the NY Times taxonomy which he states could be a standard) prompted me again this evening to think about the value of Metadata in dealing with media content overload.

Like Ian points out in his post, because he used to work with me at Factiva he know quite well that the New York Times along with many of the 10,000 sources we aggregate send us content that is rich in this type of Metadata. The Metadata they send is usually based on an establish taxonomy that the Information Provider (IP) uses in their own content production process and many a time it is still hand coded by the editorial staff- although big media companies or consolidated services do also use automatic categorizers.

So before i go any further let me define one thing- Dow Jones produces content, the Factiva division is basically an Aggregator- we take over 150,000 articles per day from thousands of news providers through our source processing process and apply structured Metadata to each article which you can learn more about in detail in this white paper. During that process we normalize all the content into a standard XML format and add additional Metadata to each article as i describe below.

The content goes through various steps to ensure that the Metadata applied is useful downstream. So there are Metadata fields like 'author', 'date' and 'source name' etc. that we expect the IPs to send us that is fairly straight forward and we have mapping tables to figure out what each IP calls each field. There are also fields that are automatically calculated that can not be disputed like for example 'wordcount' and 'language'.

Then there is Metadata that describes the 'aboutness' of the article. Some Metadata that addressed the 'aboutness' of the content we map from what the IPs provide us for example the NY Times might send us an article that they have tagged about Russia- we 'trust' the NY Times coding so we apply our 'Russia' region tag to it.

The rest of the 'aboutness' is applied as Factiva Intelligent Indexing (FII) which is Factiva's core taxonomy- that covers Industry, Subject, Region and Companies (more detail on its application here in this white paper). There are also a lot of additional Metadata elements such as people, brands, products, organizations, parts of speech (e.g. a quote) that can potentially be extracted from the content as well.

So as online media providers open their archives i always get pings from friends and family about what that means for the Factiva 'business'- i always point to the value of the aggregation, normalization and Metadata additions to our content that covers 22 languages. In addition we also provide many services to help Enterprises deliver that content to the users that need it including licensing our taxonomy or helping clients build their own.

Winer's call for a taxonomy standard that is maintained by the public specifically for News is an interesting one and one that i thought about as a good use case for Freebase perhaps as more content is made available and more tools are made available for users to create delivery mechanisms through mashups.

Afterthought: After i shut down the computer last night, i kept thinking about this. There is no reason why if you are a ubergeek like Winer or Ian and have access to Factiva services in your enteprise you can't build things like this or even cooler- get in contact with me if you are interested i am looking for crazy ideas ;-)

photo attributed to denverjeffrey

Monday, October 15, 2007

Reiser 2.0- How Sun Microsystems is doing what others are not even thinking about yet

1 comment :
Two Fridays ago i got to spend one of the most memorable thinking and learning sessions in a small conference room at the Sun Microsystems office in Menlo Park with a handful of folks including Peter Reiser and Mike Briggs from Sun, Robert Scoble and my colleague Greg Merkle.

Robert, Greg and I had already had some good conversations about a 'new media' white paper (more on that to follow in another post) we are writing together around emerging technologies and behaviors in the enterprise and i had this crazy idea to invite Peter and Mike to join our working session on Friday and they agreed which i was thrilled with because i knew they were up to something cool- and they certainly did not disappoint.

I met Peter and Mike two years ago through some of my other contacts at Sun- and since then we have been keeping in touch through our blogs, twitter, facebook, flickr, eyejot and whatever else we find ourselves in. Peter just renamed his blog Reiser 2.0 because that is what everyone at Sun calls him now a days and Mike and his team share much of what they do publicly over at the OneStop Secret Sauce Blog. They are part of the Sun Customer Engineering group that just had their annual Conference in Vegas and we got to see a preview of what they were going to roll out at the conference last week.

We weren't supposed to make it public before they announced it at the conference so Peter finally got a post up on Community Equity in Action over on his blog that provides a nice overview of how they used the conference to launch their home grown "Facebook for the Enterprise" service that will hopefully become a very successful key part of their CE 2.0 architecture. Peters thoughts and passion for Community Equity as a way to measure Social Capital for an enterprise is both fascinating and promising.

I personally think that what they are building will eventually become a common tool for collaboration in the Enterprise and look forward to hearing more from them as to how it is adopted and the returns they start seeing from their deployments. I have no doubt that soon enough we will also see Facebook like white label platforms for Enterprises.

In the upcoming weeks i will hopefully be sharing more about the 'new media' white paper we are tasked with writing including some of the things we learned at Sun. Our hope is to also reach out to the community for you participation so... watch this space.

CE 2.0 logo photo credit- by Neeraj Mathur who also works on Reiser's team and we met last week and was instrumental to their CE 2.0 launch.

Murdoch at Web 2.0 Summit

1 comment :
I was just looking at the conference details for Web 2.0 Summit this week in San Francisco. Unfortunately i am not going and it is already sold out anyway.

Looks like my soon to be new boss (that would be the one on the left in the photo unfortunately not Homer-d'oh!) will be participating in one of the evening events titled "Dinner & Conversation with Rupert Murdoch & Chris DeWolfe" at the conference. Speculation is that MySpace is getting in on the Platform train which i have been hearing buzz about for a couple of months.
On the same day according to Mashable a launch party for the new San Francisco MySpace office will be taking place and job posts on Craigslist claim that they are looking for people who are knowledgeable with the technology industry in the local and surrounding San Francisco area. D'oh and I haven't received an invite for the party but i am currently available if they act fast that is... but then again i don't even have a MySpace account i use that Facebook stuff instead so i might be disqualified.

All joking aside it will be interesting to see how this plays out at the conference, where many including Microsoft and Oracle plan to make announcements.

note: Click on photo to get to attribution page.

More support for APML this time from NewsGator one step closer to the Enterprise?

No comments :
Lately there has been more and more support for APML (Attention Profile Markup Language) a topic that i have been covering and a standard that i am involved in. Today's announcement by Nick Bradbury that FeedDemon, NetNewsWire and NewsGator Inbox will soon Support APML is another great piece of news for those of us interested in establishing Attention standards. Marshall Kirkpatrick at Read/WriteWeb starts his post on this new announcement asking Web users who are interested in personalization, privacy and increasing sophistication in their applications to take note and goes on to provide yet another good description of what APML is and what the benefits are.

Although i am certainly interested in Attention in the consumer space, it is the Enterprise space that i am the most interested in and as vendors like NewsGator that also have solutions that support enterprise users join in on the fun, things are only getting to be more exciting (although I suspect a long way off). Here is why:

Enterprise users especially information workers who do a lot of research typically have access to a handful of premium content tools to ensure comprehensive access to the information they need to do their jobs. If they are lucky (and there really are very few who are) they might have their logins managed by their single sign-on servers, most likely however they are logging into multiple ASP services. Note: the very very very few lucky ones might even have federated searching across some their services.

Once they are logged in however (yes i do believe that OpenId should also be supported by Enterprise information providers but let's leave that for another conversation although an important one) they probably have access to alerting tools across all those services to filter and deliver content exactly as they need it. So for each service they must personalize and tell the system what topics they are interested in so they can be alerted based on their 'attention' needs. Even if they use each service for a specific task- most of their information needs are the same across the set and can be aggregated. Every time the users 'attention' changes they need to update all the services manually. In addition not every service supports RSS outputs so the user is also receiving the content in multiple formats for example:

Today's announcement by NewsGator, one of the RSS market leaders in the enterprise space that they are supporting APML is a great step towards having Enterprise grade tools support the model that is needed (NewsGator has an Enterprise tool although this APML support announcement doesn't seem to address it). Another player in the market Attensa also has AttentionStream Prioritization that observes and analyzes explicit and implicit behavior as the user reads and processes feeds and articles- but this is post delivery which is great but doesn't completely solve the bigger problem of content ingestion not just consumption.

So obviously tools like for an example an Enterprise RSS server can act as intermediaries that deliver content based on the users 'Attention Profile' that is maintained in one central place- but in order for that to be effective in a diverse Enterprise the information pipes must be huge from the content providers or else the users still need to maintain individual filters on each service. In the graphic above the user sets up and maintains the filters in the multiple services yet only receive content based on their Attention Profile that is maintained as part of a enterprise tool in this case a RSS Enterprise Server.

What if content information providers supported APML?

The Enterprise User could maintain and 'own' (yes i know that 'ownership' is a big issue in the Enterprise space!) their Attention profile- having it dynamically change as their attention needs change and quickly be applied to new information services their companies might subscribe to.

Sunday, October 14, 2007

Folksonomies and Taxonomies in the Enterprise - Dow Jones Upcoming Round Table Event and Print Plant Tour

No comments :
Over the next few weeks i will be sharing resources and my thoughts on Folksonomies and Taxonomies in the Enterprise a topic that i have frequently blogged about before as i discuss Enterprise social tagging tools. The topic continues to be a topic of conversation at many of my clients and more and more the conversation is moving beyond the 'should we do it' but to the "how do we do it" and "how do we tie it in with our existing taxonomy and corporate taxonomy management tools".

On Tuesday December 4th at our Wall Street Journal printing plant location in Palo Alto, CA we will be hosting a round table event for Enterprise practitioners that are looking to deploy or have deployed tools in their enterprise that provide tagging capabilities. The round table portion will be from 2-4pm and then from 4-5pm we will get an exclusive tour of The Wall Street Journal printing plant which I arranged thinking that it would be a nice 'old media' meet 'new media' type of activity.

We still have some spots open for the round table if you or someone you know is working in implementing in the Enterprise and i am also hoping to open the printing plant tour and cocktail hour to more people who are interested in discussing the topic with other attendees while taking a peak at printing plant (hey who knows how much longer these will be around for!)- so drop me a line if you are interested daniela[dot]barbosa[@]dowjones.com

My Top Blogged Topics and Shared Items

1 comment :
This has been on my 'to do list' for a while- to create a listing of topics that i Blog about. This will make it easier for new visitors to understand what my main interests are and find relevant posts with little effort.

I use the Blogger platform for this blog- never have felt the need to migrate- and they do have a 'label' widget that comes with the new template features but i really didn't want to update my template that i have customized over the years so i manually created a listing that you can now find on the right sidebar. I will have to manually update the list as i pick up new topics but it will be easy enough now that i have it set.

I also added a Google Reader Shared items widget to my sidebar- no not to clog up my sidebar but to be able to share the top things i find interesting across the hundreds of feeds i read. The editorial process of tagging posts to share is a valuable one that i hope you will find useful.


So welcome if you are a new reader to this blog (according to my stats there are more and more of you!) and here lies my main bloggable interests as of today:

Top Topics this Blog Covers

Tuesday, October 09, 2007

Facebook Flyers - Plus a Bonus- you will get to work with me

1 comment :
The other day i posted some jobs that are open in the Dow Jones San Francisco office in my Facebook Marketplace profile- while i was posting them the option to buy a Facebook Flyer was presented and i wondered if it would be worth it. The costs didn't seem too high but i passed- well today via a twitter message from Jeremiah Owyang i saw this post on Charlene Li's blog about a test she is running with Facebook Flyers so i thought i would give it a try since i had a new posting to put up. Hey for 4 bucks it is definitely worth it- perhaps even better then a virtual Facebook beer for a buck that i will send to anyone that gets the job. Shel Israel also got into the action via the same Twitter message.A couple of things Facebook can do to make the service better would be to guide the flyer creator (i did it through the marketplace prompts)- for example providing a preview prior to publishing, suggesting a length for the title etc. Unless you go through the main page to create a flyer it doesn't tell you that you can't make any changes to the flyer once you purchase (BOO they should have at least had a warning!).

In the FAQ section (after i purchased) i found good information including the following : There is a 200 character limit for the body of the Flyer (including spaces and returns) and a 25 character limit for the title. In addition, there is a 15 character limit for words in the body of the Flyer.

Facebook won't provide you with a click-through-rate for your Flyer- which would make for a nice 'premium' service i suppose.

Anyway...I hope i have some success and if i do i will share- but i might as well post it here that we are hiring so if you know of anyone that fits any of the positions below, please send them along (the bonus of course is that you will get to work with me ~haha~ seriously contact me with questions, i am tired of doing all these jobs myself and would rather be socializing on Facebook all day ;-) - daniela[dot]barbosa{at}dowjones.com

Full job descriptions by clicking through these links:

Solutions Sales Architect- Financial Services, Client Solutions
Taxonomy Services Consultant - Client Solutions
Engagement Manager- Licensing Services, Client Solutions

Monday, October 08, 2007

Basics of Attention Profiling

No comments :
Marjolein Hoekstra over at the CleverClogs blog which often covers news alerts services has a good post on Attention Profiling for non-techies that describes some of the purposes and benefits of Attention Data and provides details of what an Attention profile can be based on such as :
  • pages you bookmark and tags you assign
  • your favorite videos, music and TV shows
  • hyperlinks you follow and share with your friends
  • things you write about and topics you keep track of
  • items you click on in your feed reader
  • things you buy from a web store
  • places you visit and events you attend

As people are learning and participating in conversations about Attention Data and the capabilities around capturing and sharing one's Attention a lot of the conversations turn to target Advertising or even recommendation engines (see this post on Read/WriteWeb). Although those are certainly two important uses that i guarantee we will be seeing a lot of services popping up around, I also believe that Attention can be a great way to target News to users both within RSS feed readers and destination News sites.

In Marjolein's post she provides an example about how she sees the benefits of a News site that removes sports pages from the home page that she might land on because it is clear that she is not interested in soccer and baseball. News sites today typically offer personalization based on a user setting up a profile that requires registration and setup and it usually is not utilized (especially in the enterprise News portal space). Personalization based on the users 'real time' Attention Profile that they 'bring' to the site without the need for ongoing explicit action could be very powerful.

The post also takes you through the simple steps to create your personal Attention Profile using the Engagd service. My Attention profile is also publicly available and as i have written before i am a member of the APML Workgroup.

Wednesday, October 03, 2007

Access of Web 2.0 tools behind the Enteprise FireWall

1 comment :
I received a Facebook message today from a friend and former coworker letting me know how her new gig was- she seems to be happy enough but imagine my shock to hear that her new company does not allow Gmail or any sort of chat- seems like Facebook is still ok - well at least for the time being. The other day i met someone who told me that her computer at work (insurance claims) only has access to corporate applications- no web access whatsoever!

I understand issues of compliance and security for certain types of workplaces but for some stupid reason i am always surprised to hear this since my tolerance for not being 'connected' is very very low...

According to this Sophos report- 50% of employees are blocked from access to Facebook at work. I admit i check Facebook a couple times a day, like i do Twitter, RSS feeds and Techmeme. Facebook has an 'online' status feature so i always think about coworkers who are on the East coast after hours on Facebook while it is still working hours on the West coast and they probably think i am slacking off.

Just today however- during work hours- i used Facebook to try to confirm a work related meeting. Leaving an old fashion voicemail, then knowing the person was at a conference dropping them a message in Facebook and an e-mail. Mission was accomplished- 1. E-mail returned 2. Facebook message return 3. no callback (i don't like phones either so fine by me).

I am pretty sure however that as people walk by my cube at the office- they see me on Facebook or Twitter and think i am goofing off-but it is just another communication tool for me- one that i am expanding from using only for my social 'friends' to some of my savvy clients. Right on- hopefully neither my company or theirs will block these tools and we can continue on.

What to contribute to my delinquency? :-) Join me on Facebook or Twitter.

~~above photo attributed to beewebhead via flickr

Tuesday, October 02, 2007

Bloglines will you win my heart back? Well at least you now have my attention

3 comments :
Bloglines was one of my first browser based RSS readers- one that treated me well until Google got in the game. If it is true that they are adopting APML then they might win my heart back from Google Reader (although rumors claim that Google is also looking to support APML as well) - which might turn my RSS reader choice into a battle of the bands or rather a battle for my attention! (i use many RSS aggregators )

I got my info on the Bloglines APML support first from the Particls folks and there have been a handful of other posts on the topic today as well.

APML stands for 'Attention Profile Markup Language' and you can learn more from these APML FAQs. i passionately believe that it is one of the solutions to the information overload we are dealing with- both in the consumer and in the enterprise space.

I am a member of the APML workgroup because from the minute i discussed the concept of Attention with one of the founding members Chris Saad, i knew that this was going to be essential in ensuring that information delivery in the enterprise continues to evolve. The consumer space is simpler (although scary advertising issues always come up) when we have discussions about Attention.

i also participate in the Attention Data Meetups in the Bay Area- we haven't gotten to really discussing enterprise use of Attention. There are many challenges to Attention gathering, ownership and distribution in the enterprise and i hope to explore this topic more over the next few posts.

Just today i had two conversations- one with a colleague and one with a client- where the topic of my 'attention' came up- there are a lot of people that constantly ask me how do i have time to keep up on everything- well i am special (read 'addicted') so i spend a lot of time tracking, tagging, consuming. My Attention Profile is public and as tools become available in the marketplace that help us share our Attention with others that have similar attention needs- i envision a "few in the enterprise being the eyes of many".

BTW- my wooing Bloglines- One big negative that Bloglines NEEDS to fix because the majority of the real world is just not ready for OpenID.....when i requested my password it sent the password in plain text back to me- NOOOOOOO good! I usually unsubscribe from any service that does that immediately since it is such bad practice... but because they are offering OpenID i won't - yet hoping they fix that.