2025-12-14

The anagram problem (part 1/N)

Programmatically finding the anagrams of a given input is an easy problem if you have a list of valid words. Easy enough that it is often used as an example problem or assignment in introductory programming classes. The key1 observation is that the order of the letters doesn't matter so you can safely place them in a deterministic order for comparison. You decide if two words are anagrams of one another by sorting the letter in each and comparing the resulting strings. In psudocode it looks roughly like skey := sorted(input) for (word in words) if (sorted(word) = skey) println(word) Easy peasy.

Except that I'm writing in C++, so some things are wordier than in other langauges and there some boilerplate to wrap around it all. But, hey, at least it isn't Java. Oh, and I usually work on more substantial projects and in a pinch you code like you practice, so perhaps I should employ some techniques that I don't actually need for a toy example like this. Yeah, that's the ticket. Let's make the key a strong type.

The point is that the sorted nature of the contents of the key are the very core of it's nature. There is no point in comparing a key against an unsorted word: it won't tell us what we need to know. So make ourselves a type that wraps up a string with the content being sorted as a class invariant and provide comparison support between keys but not between keys and plain strings. That way the compiler will tell us if we make a mistake like that. Not really needed in a toy project like this—debugging those kinds of issues isn't too painful in a tiny code base like this—but good practice.

class key
{
public:
  template <typename Iter>
  explicit key(Iter first, Iter last)
      : val(first, last)
  {
    make_invariant(val);
  }
  explicit key(const std::string &input)
    : key(input.begin(), input.end()) {}
  // Move and copy c'tors and assignments operator

  store_type::size_type length() const;
  store_type::size_type size() const;

  // Provide iterators for use in the algorithms header

  auto operator<=>(const key &rhs) const = default;

private:
  void make_invariant(store_type &);

  std::vector<std::string::value_type> val;
  };

So, put into almost up-to-date C++ idiom (i.e. C++20, as I'm not yet ready to deal with the ranges and views library), the core of the anagram program looks like:

  std::ifstream in(argv[2]);
  if (!in) {
    std::cerr << "Failed to open '" << argv[2] << "' for reading."
              << std::endl;
    return EXIT_FAILURE;
  }
  std::vector<std::string> dictionary = anagram::get_dict(in);
  const anagram::key search_key = anagram::key(std::string(argv[1]));

  std::vector<std::string> alternatives;
  std::copy_if(dictionary.begin(), dictionary.end(), std::back_inserter(alternatives),
	       [&search_key](const std::string &word){
		 return anagram::key(word) == search_key;
	       });

  for (const auto &word : alternatives)
      std::cout << word << std::endl;

Digression

Perhaps you're wondering why I would care about an introductory exercise? Afterall, the problem describe above had no surprises. That is a fair question, and the answer is simple: I've recently started doing more puzzles involving letter and words. Cryptics, wordle and things like that. I've been a fan of plain crosswords for a long time, but new things are good for the brain at my age.

Anyway. Anagraming is a common sub-puzzle in cryptics, and I'm only so good at it. So I wrote a tool to use when I'm struggling. But it turns out the puzzles present cases that extend the basic anagram problem...

Back to the main story arc

Sometimes you know how long the desired word is but only some of the input letter. Say in cryptics you have an anagram idicator and you're confident about what fodder goes with it, but you have more spaces than characters in the existing source. For that we need a little more input (the range of lengths to consider, and to change the condition from equality of keys, from strict equality to the search key being a strict subset of the word key. We'll need to user to indicate the total size as well as the known source characters. As a minor generalization we can allow a range of sizes. Then we replace the main loop above with

  std::vector<std::string> anagram::find_single(const anagram::key &search_key, size_t minl,
					      size_t maxl,
					      const std::vector<std::string> &words)
{
  std::vector<std::string> alternatives;
  std::copy_if(std::execution::par_unseq,
	       words.begin(), words.end(), std::back_inserter(alternatives),
	       [&search_key, minl, maxl](const std::string &word){
		 const key word_key(word);
		 const auto word_len = word_key.length();
		 return (word_len > 0 && word_len >= minl && word_len <= maxl &&
			 is_superset_of(word_key, search_key));
	       });
    return alternatives;
      }
which calls
bool anagram::is_superset_of(const key ∓candidate, const key ∓search_key)
{
  std::string unmatched;
  std::set_difference(search_key.begin(), search_key.end(), candidate.begin(),
		      candidate.end(), std::back_inserter(unmatched));
  return unmatched.size() == 0;
}
I call that "partialgram".

Multi-word results remain outstanding

One more problem. Maybe you have a set of source character and want to create more then one word from them That will be the subject of the next part.


1 Pun intended.

No comments:

Post a Comment