I implemented an efficient algorithm for Anagram assignment, which finds anagrams faster than Odersky wanted by applying SICP coin change algorithm. It should be faster because it does not look up for all permutations since it is obvious that order of coins does not change the sum. The algorthm therefore produces single anagram "aab" or "aba" or "baa" for any of them entered, which means that I want to permute the results. It makes sense first to produce small amount of combinations and then permute them since anagram search space is huge -- it is a dictionary of all english words!
Here is the scala code. Pleas appreciate how multiests, which we actually need, generalize both permutations with and without replacements. Multisets allow you to specify how many different items you have (how many replacements can you make for that item) whereas with replacement permutation hardcodes infinite supply of replacements (so that you can fill all the output string with a single item). Without replacement means that you can use every item only one time in the output string.
Here is the scala code. Pleas appreciate how multiests, which we actually need, generalize both permutations with and without replacements. Multisets allow you to specify how many different items you have (how many replacements can you make for that item) whereas with replacement permutation hardcodes infinite supply of replacements (so that you can fill all the output string with a single item). Without replacement means that you can use every item only one time in the output string.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Inspired by Oderski's anagram and the idea that SICP count change algorithm can be used | |
// for anagram lookup and more efficiently than odersky wanted. Anagrams are however permu | |
// tated "changes" and, thus, we need an algorithm for permutation. There is none in stack | |
// overflow, at least I could not find out whether scala and other programmers provide per | |
// mutations with replacements or not whereas anagrams requre special kind of permutations | |
// It should be that "aab", "aba" and "baa" would be permutations. Note multiplicity 2 of | |
// 'a'. With replacements produces aaa since a is not exhausted by two repetitions wheras | |
// without repetitions cannot produce sequences longer than 2 since picked two items we a | |
// re out of items in the dictionary. If we put two a items in the dictionary, with/wo rep | |
// etitions will produce idential sequences, aab and aab can be produced two times. | |
// The generic algorithm produced enables emulating standard permutations with/without rep | |
// eititions. Take multiplicity = 1 for every item and you will generate permutations with | |
// out repetitions. Take multiplicity exceeding the generated seq length and result will | |
// match the permutations with repetitions. | |
object Permutations { | |
def printResult[A](msg: String, r: A): A = {println(msg + r) ; r} | |
//> printResult: [A](msg: String, r: A)A | |
// PART I | |
def withReplacements(chars: String, n: Int) = { | |
def internal(path: String, acc: List[String]): List[String] = { | |
if (path.length == n) path :: acc else | |
chars.toList.flatMap {c => internal(path + c, acc)} | |
} | |
val res = internal("", Nil) | |
printResult("there are " + res.length + " " + n + "-permutations with replacement for " + chars + " = ", res) | |
} //> withReplacements: (chars: String, n: Int)List[String] | |
//def indent[A](path: List[A]) = path map (_ => " ") | |
import scala.collection.immutable.Queue | |
def noReplacements(chars: String, n: Int/* = chars.length*/) = { // unfortunately Scala cannot refer other args in the list | |
type Set = Queue[Char] | |
type Result = Queue[String] | |
// The idea is that recursions will scan the set with one element excluded. | |
// Queue was chosen to implement the dict to enable excluded element to bubble through it. | |
def internal(dict: Set, path: String, acc: Result): Result = { | |
if (path.length == n) acc.enqueue(path) | |
else | |
dict.foldLeft(acc, dict.dequeue){case ((acc, (consumed_el, q)), e) => | |
(internal(q, consumed_el + path, acc), q.enqueue(consumed_el).dequeue) | |
}. _1 | |
} | |
val dict = Queue[Char](chars.toList: _*) | |
val res = internal(dict, "", Queue.empty) | |
printResult("there are " + res.length + " " + n + "-permutations without replacement for " + dict + " = ", res) | |
} //> noReplacements: (chars: String, n: Int)Result | |
withReplacements("abc", 2) //> there are 9 2-permutations with replacement for abc = List(aa, ab, ac, ba, | |
//| bb, bc, ca, cb, cc) | |
//| res0: List[String] = List(aa, ab, ac, ba, bb, bc, ca, cb, cc) | |
noReplacements("abc", 2) //> there are 6 2-permutations without replacement for Queue(a, b, c) = Queue(b | |
//| a, ca, cb, ab, ac, bc) | |
//| res1: Result = Queue(ba, ca, cb, ab, ac, bc) | |
//http://stackoverflow.com/a/20528637/1083704 | |
def permute(xs:List[Int]):List[List[Int]] = xs match { | |
case Nil => List(List()) | |
case head::tail => { | |
val tps = (0 until xs.length).map(xs.splitAt(_)).toList.filter(tp => !tp._1.contains(tp._2.head)) | |
tps.map(tp => permute(tp._1:::tp._2.tail).map(tp._2.head :: _)).flatten | |
} | |
} //> permute: (xs: List[Int])List[List[Int]] | |
// check function | |
def ===[A](a: Seq[A], b: Seq[A]) { | |
assert(a.toSet == b.toSet, a + " != " + b) | |
} //> === : [A](a: Seq[A], b: Seq[A])Unit | |
val ref = permute(List(0,1,2)) map (_ map (i => (i + 'a') . toChar) mkString "") | |
//> ref : List[String] = List(abc, acb, bac, bca, cab, cba) | |
===(ref, noReplacements("abc", 3)) //> there are 6 3-permutations without replacement for Queue(a, b, c) = Queue(c | |
//| ba, bca, acb, cab, bac, abc) | |
withReplacements("abc", 3) //> there are 27 3-permutations with replacement for abc = List(aaa, aab, aac, | |
//| aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, | |
//| caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc) | |
//| res2: List[String] = List(aaa, aab, aac, aba, abb, abc, aca, acb, acc, baa, | |
//| bab, bac, bba, bbb, bbc, bca, bcb, bcc, caa, cab, cac, cba, cbb, cbc, cca, | |
//| ccb, ccc) | |
withReplacements("abc", 4) //> there are 81 4-permutations with replacement for abc = List(aaaa, aaab, aaa | |
//| c, aaba, aabb, aabc, aaca, aacb, aacc, abaa, abab, abac, abba, abbb, abbc, | |
//| abca, abcb, abcc, acaa, acab, acac, acba, acbb, acbc, acca, accb, accc, baa | |
//| a, baab, baac, baba, babb, babc, baca, bacb, bacc, bbaa, bbab, bbac, bbba, | |
//| bbbb, bbbc, bbca, bbcb, bbcc, bcaa, bcab, bcac, bcba, bcbb, bcbc, bcca, bcc | |
//| b, bccc, caaa, caab, caac, caba, cabb, cabc, caca, cacb, cacc, cbaa, cbab, | |
//| cbac, cbba, cbbb, cbbc, cbca, cbcb, cbcc, ccaa, ccab, ccac, ccba, ccbb, ccb | |
//| c, ccca, cccb, cccc) | |
//| res3: List[String] = List(aaaa, aaab, aaac, aaba, aabb, aabc, aaca, aacb, a | |
//| acc, abaa, abab, abac, abba, abbb, abbc, abca, abcb, abcc, acaa, acab, acac | |
//| , acba, acbb, acbc, acca, accb, accc, baaa, baab, baac, baba, babb, babc, b | |
//| aca, bacb, bacc, bbaa, bbab, bbac, bbba, bbbb, bbbc, bbca, bbcb, bbcc, bcaa | |
//| , bcab, bcac, bcba, bcbb, bcbc, bcca, bccb, bccc, caaa, caab, caac, caba, c | |
// Since replacements mean that we have infinite supply of denominations rather than concrete | |
// coins, they can generate forever, the sequences of infinite lengths. The maximum length of | |
// permutation without replacements on other hand is equal to the numebr of coins in the dictionary. | |
// Dictionary length is the natural length of permutation and | |
//noReplacements("abc", 4)} // throws "out of elements exception", as expected. | |
// You may have notice that duplicating some coin in the dictionary, produces duplicate permutations | |
(1 to 3) foreach (u => noReplacements("aab", u)) | |
//> there are 3 1-permutations without replacement for Queue(a, a, b) = Queue(a | |
//| , a, b) | |
//| there are 6 2-permutations without replacement for Queue(a, a, b) = Queue(a | |
//| a, ba, ba, aa, ab, ab) | |
//| there are 6 3-permutations without replacement for Queue(a, a, b) = Queue(b | |
//| aa, aba, aba, baa, aab, aab) | |
// because different entries of a are considered as different elements. What we want in fact is to | |
// have coins of multiplicity n here, aka multisets https://en.wikipedia.org/wiki/Permutation#k-permutations_of_n | |
// We want coin 'a' present in two different places when we say that our dictionary is "aab". | |
// Using the latter algoringm, we will reduce coin availalbility, every time it is picked. | |
// PART II: Multiplicity | |
def multisets(chars: String, n: Int) = { // elaborating norep to account multipicity | |
type Set = Queue[Bubble] | |
case class Bubble (val char: Char, val avail: Int) | |
type Result = Queue[String] | |
def internal(dict: Set, path: String, acc: Result): Result = { | |
if (path.length == n) acc.enqueue(path) | |
else { | |
dict.foldLeft(acc, dict.dequeue) { | |
case ((acc, (consumed_el, q)), e) => consumed_el match { | |
case Bubble(char, avail) => { | |
//children collect premutations in other positions | |
val chd = if (avail == 0) acc else internal(q.enqueue(new Bubble(char, avail-1)), path + char, acc) | |
// proceed collecting from siblings, which try different chars in the same position | |
(chd, q.enqueue(consumed_el).dequeue) | |
} | |
} | |
}. _1 | |
} | |
} | |
val dict = Queue[Bubble]((chars.toList groupBy (c => c) map {case (k, v) => new Bubble(k, v.length)}).toList: _*) | |
val res = internal(dict, "", Queue.empty) | |
printResult("there are " + res.length + " multiset " + n + "-permutations for " + dict + " = ", res) | |
} //> multisets: (chars: String, n: Int)Result | |
//take one element of each kind to emulate premutations without replacements | |
===(multisets("abc", 2), noReplacements("abc", 2)) | |
//> there are 6 multiset 2-permutations for Queue(Bubble(b,1), Bubble(a,1), Bub | |
//| ble(c,1)) = Queue(ba, bc, ac, ab, cb, ca) | |
//| there are 6 2-permutations without replacement for Queue(a, b, c) = Queue(b | |
//| a, ca, cb, ab, ac, bc) | |
===(multisets("abc", 3), noReplacements("abc", 3)) | |
//> there are 6 multiset 3-permutations for Queue(Bubble(b,1), Bubble(a,1), Bub | |
//| ble(c,1)) = Queue(bac, bca, acb, abc, cba, cab) | |
//| there are 6 3-permutations without replacement for Queue(a, b, c) = Queue(c | |
//| ba, bca, acb, cab, bac, abc) | |
multisets("aab", 2) //> there are 3 multiset 2-permutations for Queue(Bubble(b,1), Bubble(a,2)) = Q | |
//| ueue(ba, ab, aa) | |
//| res4: Result = Queue(ba, ab, aa) | |
multisets("aab", 3) //> there are 3 multiset 3-permutations for Queue(Bubble(b,1), Bubble(a,2)) = Q | |
//| ueue(baa, aba, aab) | |
//| res5: Result = Queue(baa, aba, aab) | |
noReplacements("aab", 3) //> there are 6 3-permutations without replacement for Queue(a, a, b) = Queue(b | |
//| aa, aba, aba, baa, aab, aab) | |
//| res6: Result = Queue(baa, aba, aba, baa, aab, aab) | |
//take far more letters than resulting permutation length to emulate withReplacements | |
===(multisets("aaaaabbbbbccccc", 3), withReplacements("abc", 3)) | |
//> there are 27 multiset 3-permutations for Queue(Bubble(b,5), Bubble(a,5), Bu | |
//| bble(c,5)) = Queue(bac, bab, baa, bcb, bca, bcc, bba, bbc, bbb, acb, aca, a | |
//| cc, aba, abc, abb, aac, aab, aaa, cba, cbc, cbb, cac, cab, caa, ccb, cca, c | |
//| cc) | |
//| there are 27 3-permutations with replacement for abc = List(aaa, aab, aac, | |
//| aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc, | |
//| caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc) | |
// PART 3: For fun, foldLeft iteration over siblings is replaced by recursion | |
def recursiveNoRepl(chars: String, n: Int) = { | |
type Set = Queue[Char] | |
val set = Queue[Char](chars.toList: _*) | |
type Result = Queue[String] | |
def siblings(set: (Char, Set), offset: Int, path: String, acc: Result): Result = set match {case (bubble, queue) => | |
val children = descend(queue, path + bubble, acc) // children that will produce combinations in other positions | |
if (offset == 0) children else siblings(queue.enqueue(bubble).dequeue, offset - 1, path, children) // siblings will produce different chars at the same position, fetch next char for them | |
} | |
def descend(set: Set, path: String, acc: Result): Result = { | |
if (path.length == n) acc.enqueue(path) else siblings(set.dequeue, set.size-1, path, acc) | |
} | |
val res = descend(set, "", Queue.empty) | |
printResult("there are " + res.length + " " + n + "-permutations without replacement for " + set + " = ", res) | |
} //> recursiveNoRepl: (chars: String, n: Int)Result | |
def recursiveMultisets(chars: String, n: Int) = { | |
type MultiSet = Queue[Multiplicity] | |
type Multiplicity = (Char, Int) | |
type Result = Queue[String] | |
def siblings(set: (Multiplicity, MultiSet), offset: Int, path: String, acc: Result): Result = set match {case ((char, avail), queue) => | |
val children = descend(if (avail - 1 == 0) queue else queue.enqueue(char -> {avail-1}), path + char, acc) // childern can reuse the symbol while if it is available | |
if (offset == 0) children else siblings(queue.enqueue((char, avail)).dequeue, offset - 1, path, children) | |
} | |
def descend(set: MultiSet, path: String, acc: Result): Result = { | |
if (path.length == n) acc.enqueue(path) else siblings(set.dequeue, set.size-1, path, acc) | |
} | |
val multiset = Queue[Multiplicity]((chars.toList groupBy (c => c) map {case (k, v) => (k, v.length)}).toList: _*) | |
val res = descend(multiset, "", Queue.empty) | |
printResult("there are " + res.length + " multiset " + n + "-permutations for " + multiset + " = ", res) | |
} //> recursiveMultisets: (chars: String, n: Int)Result | |
// PART 4: Genericized versions | |
def noreplGeneric[A](chars: List[A], n: Int) = { | |
type Set = Queue[A] | |
val set = Queue[A](chars.toList: _*) | |
type Result = List[List[A]] | |
def siblings(set: (A, Set), offset: Int, path: List[A], acc: Result): Result = set match {case (bubble, queue) => | |
val children = descend(queue, bubble :: path, acc) // bubble was used, it is not available for children that will produce combinations in other positions | |
if (offset == 0) children else siblings(queue.enqueue(bubble).dequeue, offset - 1, path, children) // siblings will produce different chars at the same position, fetch next char for them | |
} | |
def descend(set: Set, path: List[A], acc: Result): Result = { | |
if (path.length == n) path :: acc else siblings(set.dequeue, set.size-1, path, acc) | |
} | |
val res = descend(set, List.empty, List.empty) | |
printResult("there are " + res.length + " " + n + "-permutations without replacement for " + set + " = ", res) | |
} //> noreplGeneric: [A](chars: List[A], n: Int)Result | |
===(noreplGeneric("aab".toList, 2).map(_ mkString ""), noReplacements("aab", 2)) | |
//> there are 6 2-permutations without replacement for Queue(a, a, b) = List(Li | |
//| st(a, b), List(a, b), List(a, a), List(b, a), List(b, a), List(a, a)) | |
//| there are 6 2-permutations without replacement for Queue(a, a, b) = Queue(a | |
//| a, ba, ba, aa, ab, ab) | |
// generic multisets | |
def genericPermutation[A](chars: List[A], n: Int) = { | |
type Multiset = Queue[Multiplicity] | |
type Multiplicity = (A, Int) | |
type Result = List[List[A]] | |
def siblings(set: (Multiplicity, Multiset), offset: Int, path: List[A], acc: Result): Result = set match {case ((char, avail), queue) => | |
val children = descend(if (avail - 1 == 0) queue else queue.enqueue(char -> {avail-1}), char :: path, acc) // childern can reuse the symbol while if it is available | |
if (offset == 0) children else siblings(queue.enqueue((char, avail)).dequeue, offset - 1, path, children) | |
} | |
def descend(set: Multiset, path: List[A], acc: Result): Result = { | |
if (path.length == n) path :: acc else siblings(set.dequeue, set.size-1, path, acc) | |
} | |
val multiset = Queue[Multiplicity]((chars.toList groupBy (c => c) map {case (k, v) => (k, v.length)}).toList: _*) | |
val res = descend(multiset, List.empty, List.empty) | |
printResult("there are " + res.length + " multiset " + n + "-permutations for " + multiset + " = ", res) | |
} //> genericPermutation: [A](chars: List[A], n: Int)Result | |
def sgp(dict: String, len: Int) = genericPermutation(dict.toList, len) map (_ mkString "") | |
//> sgp: (dict: String, len: Int)List[String] | |
=== (sgp("aaab", 2), multisets("aaab", 2))//> there are 3 multiset 2-permutations for Queue((b,1), (a,3)) = List(List(a, | |
//| a), List(b, a), List(a, b)) | |
//| there are 3 multiset 2-permutations for Queue(Bubble(b,1), Bubble(a,3)) = | |
//| Queue(ba, ab, aa) | |
//take one letter of each to emulate withoutReplacements | |
===(sgp("aaaaabbbbbccccc", 2), withReplacements("abc", 2)) | |
//> there are 9 multiset 2-permutations for Queue((b,5), (a,5), (c,5)) = List( | |
//| List(c, c), List(a, c), List(b, c), List(a, a), List(b, a), List(c, a), Li | |
//| st(b, b), List(c, b), List(a, b)) | |
//| there are 9 2-permutations with replacement for abc = List(aa, ab, ac, ba, | |
//| bb, bc, ca, cb, cc) | |
===(sgp("aaaaabbbbbccccc", 3), withReplacements("abc", 3)) | |
//> there are 27 multiset 3-permutations for Queue((b,5), (a,5), (c,5)) = List | |
//| (List(c, c, c), List(a, c, c), List(b, c, c), List(a, a, c), List(b, a, c) | |
//| , List(c, a, c), List(b, b, c), List(c, b, c), List(a, b, c), List(a, a, a | |
//| ), List(b, a, a), List(c, a, a), List(b, b, a), List(c, b, a), List(a, b, | |
//| a), List(c, c, a), List(a, c, a), List(b, c, a), List(b, b, b), List(c, b, | |
//| b), List(a, b, b), List(c, c, b), List(a, c, b), List(b, c, b), List(a, a | |
//| , b), List(b, a, b), List(c, a, b)) | |
//| there are 27 3-permutations with replacement for abc = List(aaa, aab, aac, | |
//| aba, abb, abc, aca, acb, acc, baa, bab, bac, bba, bbb, bbc, bca, bcb, bcc | |
//| , caa, cab, cac, cba, cbb, cbc, cca, ccb, ccc) | |
===(sgp("abc", 2), noReplacements("abc", 2)) | |
//> there are 6 multiset 2-permutations for Queue((b,1), (a,1), (c,1)) = List( | |
//| List(a, c), List(b, c), List(b, a), List(c, a), List(c, b), List(a, b)) | |
//| there are 6 2-permutations without replacement for Queue(a, b, c) = Queue( | |
//| ba, ca, cb, ab, ac, bc) | |
===(sgp("abc", 3), noReplacements("abc", 3)) | |
//> there are 6 multiset 3-permutations for Queue((b,1), (a,1), (c,1)) = List( | |
//| List(b, a, c), List(a, b, c), List(c, b, a), List(b, c, a), List(a, c, b), | |
//| List(c, a, b)) | |
//| there are 6 3-permutations without replacement for Queue(a, b, c) = Queue( | |
//| cba, bca, acb, cab, bac, abc) | |
// Published at https://gist.github.com/valtih1978/99d8b320e59fcde634ad | |
} |
No comments:
Post a Comment