Over new year, I’ve been playing around with Strigi and Nepomuk, which are
the two technologies around desktop search and “making sense of your data”. Strigi is the underlying library that is used to analyze all sorts of files and index those results. Nepomuk provides a semantic layer on top of this information, and a nice KDE API for easy integration in applications. While Nepomuk is only lately maturing, it shows some nice ways to interact with information on your desktop. Nepomuk uses RDF and ontologies to make sense of your data. It has the concept of types of data (think of a photo being an an image), and how those types relate to each other. It also stores metadata (for the photo example, that’d be size, camera model, and so on.
Nepomuk started catching my attention when I had been looking at a bug in krunner, that cropped up when the desktop search on my machine magically started working. It displayed too many results and kept the KRunner interface very busy, and sometime even made it crash. I’ve tried some tricks limiting it in a sensible way, but wasn’t really familiar with the workings of Nepomuk. Sebastian Trueg has fixed this bug last week, fortunately.
There are some caveats, when setting up Nepomuk
. In order to get reasonably fast search and indexing, make sure your Nepomuk is using the Java JNI based Soprano backend (JNI is an interface to some Java functionality for C++). I am compiling KDE from trunk usually. Soprano sits in kdesupport and has two possible backends you can use, Redland and Sesame2. The former is C++ based, but it’s very slow. So in order to get reasonable performance out of Nepomuk, make sure the Sesame2 backend gets installed. For compiling it, I had to install the necessary Java packages — I’ve installed both the JDK and JRE. CMake apparently didn’t find this enough, when configuring Soprano, it still reported that it cound’t find the JNI, so wouldn’t compile the Sesame2 backend. I had to tell it explicitely where to find Java by putting the following into my .zshrc file:
on my Kubuntu machine, and
on my OpenSuse machine.
After having it installed, you can configure the search in System Settings. The first run took pretty long, but after that, indexing doesn’t get in the way anymore and also doesn’t hurt performance noticably (i.e. having nepomuksearch monitoring). Also, the results I got from Nepomuk and Strigi where quite OK, even when limiting the search to only 5 seconds (which makes the first results display a lot quicker as well). In general, not having Nepomuk do endless searches is a good idea. :-)
Over christmas, I had been thinking how to integrate desktop search into Plasma, and started writing a small applet as interface to the desktop search. I’m not really fond of KRunner’s current interface. It displays too little information about the results, and looks a bit messy to me. We’ve already been thinking about possible improvements on the plasma mailing list, but with all the bug fixing around, we didn’t get to tackle KRunner’s UI in a way that makes it suitable to display results from a desktop search.
So I started writing an applet for Plasma
, since nobody seems to have done this yet and called it Crystal
, since that’s a cool name. Nepomuk actually made it very easy, with 300 lines of code (C++ even), I had a basic Plasma applet, including configuration to tweak the search, and the whole thing displaying results and opening those files.
I’ve committed the applet yesterday to plasma’s playground
so others can have a look at this, too. It’s already quite nifty, but only very basic and totally not polished. Many aspects of the applet might change in the future, but it should be a good starting point to experiment with desktop search on the KDE4 desktop.
The applet can popup on shortcut
(I got this for free with the plasma configuration dialog actually) and pre-fill the clipboard into the lineedit. This way, you just select and copy a word into your clipboard, then hit some shortcut and press enter and dang! it searches for it. When the applet pops up, the text in the query lineedit is selected so you can also just start typing and thus override the pre-filled query (your clipboard or the default from the config dialogue). There’s no keyboard navigation in the list of results yet, and it doesn’t sort the results by relevance. I actually have a small patch that makes the SearchHitView sort by relevance, but it sometimes causes rendering problems, painting resulting items on top of each other. That patch essentially returns a rounded score for the items and makes KCategorizedView to order descending. I’ll have to investigate if that’s the right way to do it. Rafael (ereslibre) has some promising work
on KCategorizedView in the pipeline
as well. We’ll see where this all goes…
Codewise, actually searching and displaying results with Nepomuk is dead easy. Most of the important functionality is included in KDE 4.2’s kdelibs. There’s the QueryServiceClient
which is what you use to fire queries into Nepomuk and retrieve the results (Nepomuk::Search::Result)
. The client is connected to a SearchHitView, which internally retrieves the hits from Nepomuk and puts them into the model, then updates the view.
The SearchHitView is from playground’s nepomuk-kde
. It gives us models and view for search items, and has plugins for different types of files and result types. Here’s how this looks like in an example
// Create a new client for the search
m_searchclient = new nepomuk::search::queryserviceclient( this );
// Search when the button is clicked
connect( m_buttonsearch, signal( clicked() ),
this, slot( slotsearch() ) );
// ... also search on Enter in the search field
connect( m_editsearchtext, signal( returnpressed() ),
this, slot( slotsearch() ) );
// Put new results in the SearchHitView
connect( m_searchclient, signal( newentries( const qlist<:search::result>& ) ),
m_resultsview, slot( updateresources( const qlist<:search::result>& ) ) );
// Do something when the user clicks on a result
connect( m_resultsview, signal( resourceactivated( const qurl& ) ),
this, slot( slotresourceactivated( const qurl& ) ) );
You’ll need two functions then, slotresourceactivated(QUrl url), which might be as simple as just KRun(url) in order to open the file (or url), and slotsearch(), which runs the query on the the string from the search lineedit. The plugin system doesn’t quite work for me at the moment, there’s an issue that a factory class always returns the most generic of the visualization plugins, making the results in the view look a bit boring. (The mechanism works fine in knepsearchclient, I dunno what exactly is going on there, deeply in KService …) I hope to get this to work quickly, also to get some more polishing applied to the plugins, and have them — based on their type — display more interesting information so you can quickly see if that file is the one you’re looking for. Also, displaying them at different sizes can reveal or hide some information, which is definitely useful and makes the user interface a lot more flexible. The code isn’t yet stable, and it needs some work.
Earlier on Sunday, richmoore blogged about
a small class he has written to interact with MediaWikis. I’ve quickly took his code and put it into Crystal. Within an hour or so, I had Crystal display matches from TechBase along with the Nepomuk matches (albeit not correctly yet since the SearchHitView expects a bit more than a QUrl pointing to a random location (oh noes, our techbase is not random!). After a bit more hacking, Crystal now searches in Wikipedia, TechBase and UserBase for your search term. Display of the results is a bit better, but still not quite there yet. Something to sort out. :)
Some more ideas I have for the Crystal applet are saving of searches, for example to the desktop, a dropdown, or something so you can easily access pre-defined queries. Rich’s MediaWiki code also got me thinking a bit about the difference between searches on the web and on your desktop (other than “I don’t want everything I type submitted to Google
“, of course ;-)).
It would be really good if applications started using Nepomuk more, although we see some interesting stuff coming up there as well, Gwenview has nepomuk based tagging and rating in KDE 4.2, for example. What I’d find useful is if applications started tagging files through Nepomuk automatically. It would be really useful to keep track of visited webpages, and have some information about that in the results of the desktop search. And of course Email. Tags for attachments from certain persons would be useful. Also there’s a small problem in the config dialogue for Strigi right now — you can’t enter directories starting with a “.”, and my email is somewhere under ~/.kde4/share/apps/kmail. That should be relatively easy to fix though. Mid-term Akonadi should be accessible in the same way.
And finally, I think we should put a searchbox into our file dialogue. During the redesign that has taken place for KDE 4.2, we left some free space in its top center, arguing that it make be the place to put a desktop search box. Sure enough the desktop search is fast enough (and I’m lazy enough) for me to replace most of the navigation-through-my-filesystem by simply typing the filename and choosing the correct file. The results from Strigi and Nepomuk are usually fast enough and correct. The file dialogue feels as a very natural for file search.