Information workers are often faced with this problem: copy and paste many text snippets from many positions in unstructured or semi-structured text files. This is usually too much work to do manually, while there is no easy way to do this automatically. Current solutions include using ancient Unix tools like grep or coding custom tools just for the one job. This idea is for developing a GUI application which allows users to input sample text positions across multiple files and which generalizes these to the entire collection, going by hints such as common text patterns around the section in question.
Some time ago, most programming was pretty simple: a program was compiled and run on a single computer, at most interacting with others through the network. A lot of today’s programming languages stem from those days. With the advent of the cloud it became possible to run software in many ways and automatically. In the platform-as-a-service (PaaS) model, a finished program is handed to the cloud operator and executed automatically. But this is done with the same old programming model of yesteryear.
There are now more open-source projects on the Internet than you can shake a stick at! My go-to search engine for them is still Google, but that is no longer a good option. So I think there should be a dedicated search engine for open-source software projects and similarly for software products, such as SaaS offerings and closed-source libraries. There are two main challenges for this to work as a startup project.
In a previous post I introduced the idea of location-based messaging apps. This post is about the tools that can be used to make this work. The main software ingredient for such apps to work is a “spatial database”, which allows location queries to be made of the stored data. The standard solution for location queries is called PostGIS, which is an open source extension for the database software PostgreSQL. PostgreSQL itself is a standard open source SQL database engine and a very popular one, too.
Imagine you could copy any part from a web page in your browser and paste it into a notebook in the cloud. Styles and images would be preserved, so that the clipped part looks just the same as on the original page, but it would be kept as you found it, for as long as you want, annotated with the time and place where it was originally found. You could for example keep track of results from booking a trip across multiple web sites on a single notebook, adding your own comments as required, to remember what you did.
In a previous post I asked “How to bring ML to Search?” In this post I want to discuss which companies could likely bring machine learning to search engines. Some ML techniques are already used for Web search, but search engines fall behind modern possibilities by far. When I speak of ML in this post, I also mean related NLP techniques. The benefits of bringing ML to search are many: better spam fighting, a replacement for PageRank, higher quality and more varied results on the first page and search term disambiguation, among others.