2020-03-20

Search and reinvention

The other day I reinvented the wheel.

I did search for existing solutions first, but all the descriptions of the problem I tried had too many common words and generated pages of search hits to other things described with the same words. Even when I restricted the search to stackoverflow.com.

It took me almost fifteen minutes to get bored with that, and a bit over five minute to find a working solution for myself.

I felt rather proud of my solution as it is quiet neat. Almost elegant really. Which made me absolutely sure it was well-known,1 so I searched again. I was able to quickly find many existing examples by including ceil (the mathematical and programming name for the function returning the smallest integer not smaller than the argument; which is used in my solution) among the search terms.

But until I had the solution there as no way to know how to search successfully for it.2

Search is so much better but it still has weaknesses

I’m sure many (most?) of you have a story like this of your own. Maybe more than one.

I’m old enough to recall the internet before the web and the web before there was real search. I mean, I remember when Yahoo! was a big advance in finding things.3 When Google came along it was a breath of fresh air.

But they haven’t solved the context problem. There is no way to say to Google “search algorithms and programming advice for these terms”, so I got buried in irrelevant junk. But even that might not have done, because I was getting swamped with irrelevant Stack Overflow links as well as irrelevant links from other domains.

The next big thing?


As I recall it the deficiencies of search were one of the reasons Jeff & Joel had for starting the project that became Stack Overflow.4 It is evident both from the duplicate problems (including the problems experienced user have finding duplicates that they know are there) on Stack Exchange sites and from this kind of anecdote that the search issue is only partly solved.

How is the AI coming along in the natural language processing domain? Any chance of context sensitive search anytime soon?

That wheel I reinvented?


Put in the form I encountered it
How many rows and columns should be used to automatically place $N$ plots on a page?
The requirements for solutions being
  • The plots will be arranged in a grid
  • Each plot will be the same size
  • All the plots should be displayed
  • The number of empty cell should be no bigger than necessary
  • The aspect ratio of the plots should be reasonable
As my display mechanism puts the plots on a page in landscape orientation and a typical page has a good aspect ratio for plots the aspect ratio requirement comes down to the grid needing the same number of division horizontally and vertically; and when they differ it should be more across than down.

Of course for $N$ a perfect square you’d just want the grid to be $\sqrt{N}$ on a side, and a little thinking about how sqrt() works with numbers that aren’t perfect squares and the need to never be too small can lead you to the algorithm. In pseudo-code (all two lines of it):
ncols := ceil(sqrt(N)) 
nrows := ceil(N/ncols)
the first use of ceil here insures that if we have an uneven grid the number of columns will be larger than the number of rows and if the grid comes out even it will be large enough. The second use of ceil insures the size is large enough if the grid is uneven,

The results are pretty good, but a human might choose differently. For instance the code generates a 3x3 grid for $N$ equal to 7 or 8 where a human might try out 2x4 to get fewer empty spaces and see how it looks. Something similar happens for 13-15.




1 Very little that is as short and sweet as this little trick is new. The more so when it can be expressed in a domain with a lot of traffic.

2 Many times on Stack Exchange sites I’ve seen users describe a well-know idea or problem in a lot of words because they didn’t know the name or phrase usually used for it. Finding good existing answers and other good resources on the internet would generally be easy if they knew the nomenclature. In this case though, there doesn’t seem to be a name batted about. Anyone know if the problem described in the last section has a conventional name?

3 Who else remembers WWWW? Don’t you wish you could forget?

4 In addition to the way existing models (forums, blogs, Wikipedia, Reddit, and so on…) were generally ill-suited to serving as a repository of good answers to good questions.

Written with StackEdit.

No comments:

Post a Comment