Programmatically finding the anagrams of a given input is an easy problem if
you have a list of valid words. Easy enough that it is often used as an example
problem or assignment in introductory programming classes. The key1
observation is that the order of the letters doesn't matter so you can safely
place them in a deterministic order for comparison. You decide if two words are
anagrams of one another by sorting the letter in each and comparing the
resulting strings. In psudocode it looks roughly like
skey := sorted(input)
for (word in words)
if (sorted(word) = skey)
println(word)
Easy peasy.
Except that I'm writing in C++, so some things are wordier than in other
langauges and there some boilerplate to wrap around it all. But, hey, at least
it isn't Java. Oh, and I usually work on more substantial projects and in a
pinch you code like you practice, so perhaps I should employ some techniques
that I don't actually need for a toy example like this. Yeah, that's the
ticket. Let's make the key a strong type.
The point is that the sorted nature of the contents of the key are the very
core of it's nature. There is no point in comparing a key against an unsorted
word: it won't tell us what we need to know. So make ourselves a type that wraps
up a string with the content being sorted as a class invariant and provide
comparison support between keys but not between keys and plain strings. That way
the compiler will tell us if we make a mistake like that. Not really needed in a
toy project like this—debugging those kinds of issues isn't too painful in
a tiny code base like this—but good practice.
class key
{
public:
template <typename Iter>
explicit key(Iter first, Iter last)
: val(first, last)
{
make_invariant(val);
}
explicit key(const std::string &input)
: key(input.begin(), input.end()) {}
// Move and copy c'tors and assignments operator
store_type::size_type length() const;
store_type::size_type size() const;
// Provide iterators for use in the algorithms header
auto operator<=>(const key &rhs) const = default;
private:
void make_invariant(store_type &);
std::vector<std::string::value_type> val;
};
So, put into almost up-to-date C++ idiom (i.e. C++20, as I'm not yet ready
to deal with the ranges and views library), the core of the anagram program
looks like:
std::ifstream in(argv[2]);
if (!in) {
std::cerr << "Failed to open '" << argv[2] << "' for reading."
<< std::endl;
return EXIT_FAILURE;
}
std::vector<std::string> dictionary = anagram::get_dict(in);
const anagram::key search_key = anagram::key(std::string(argv[1]));
std::vector<std::string> alternatives;
std::copy_if(dictionary.begin(), dictionary.end(), std::back_inserter(alternatives),
[&search_key](const std::string &word){
return anagram::key(word) == search_key;
});
for (const auto &word : alternatives)
std::cout << word << std::endl;
Digression
Perhaps you're wondering why I would care about an introductory
exercise? Afterall, the problem describe above had no surprises. That is a fair
question, and the answer is simple: I've recently started doing more puzzles
involving letter and
words. Cryptics, wordle
and things like that. I've been a fan of plain crosswords for a long time, but
new things are good for the brain at my age.
Anyway. Anagraming is a common sub-puzzle in cryptics, and I'm only so good
at it. So I wrote a tool to use when I'm struggling. But it turns out the
puzzles present cases that extend the basic anagram problem...
Back to the main story arc
Sometimes you know how long the desired word is but only some of the
input letter. Say in cryptics you have an anagram idicator and you're confident
about what fodder goes with it, but you have more spaces than characters in the
existing source. For that we need a little more input (the range of lengths to
consider, and to change the condition from equality of keys, from strict
equality to the search key being a strict subset of the word key. We'll need to
user to indicate the total size as well as the known source characters. As a
minor generalization we can allow a range of sizes. Then we replace the main
loop above with
std::vector<std::string> anagram::find_single(const anagram::key &search_key, size_t minl,
size_t maxl,
const std::vector<std::string> &words)
{
std::vector<std::string> alternatives;
std::copy_if(std::execution::par_unseq,
words.begin(), words.end(), std::back_inserter(alternatives),
[&search_key, minl, maxl](const std::string &word){
const key word_key(word);
const auto word_len = word_key.length();
return (word_len > 0 && word_len >= minl && word_len <= maxl &&
is_superset_of(word_key, search_key));
});
return alternatives;
}
which calls
bool anagram::is_superset_of(const key ∓candidate, const key ∓search_key)
{
std::string unmatched;
std::set_difference(search_key.begin(), search_key.end(), candidate.begin(),
candidate.end(), std::back_inserter(unmatched));
return unmatched.size() == 0;
}
I call that "partialgram".
Multi-word results remain outstanding
One more problem. Maybe you have a set of source character and want to
create more then one word from them That will be the subject of the next
part.
1 Pun intended.