Smallest grammar problem

In data compression and the theory of formal languages, the smallest grammar problem is the problem of finding the smallest context-free grammar that generates a given string of characters (but no other string). The size of a grammar is defined by some authors as the number of symbols on the right side of the production rules.[1] Others also add the number of rules to that.[2] A grammar that generates only a single string, as required for the solution to this problem, is called a straight-line grammar.[3]

Every binary string of length n {\displaystyle n} has a grammar of length O ( n / log n ) {\displaystyle O(n/\log n)} , as expressed using big O notation.[3] For binary de Bruijn sequences, no better length is possible.[4]

The (decision version of the) smallest grammar problem is NP-complete.[1] It can be approximated in polynomial time to within a logarithmic approximation ratio; more precisely, the ratio is O ( log n g ) {\displaystyle O(\log {\tfrac {n}{g}})} where n {\displaystyle n} is the length of the given string and g {\displaystyle g} is the size of its smallest grammar. It is hard to approximate to within a constant approximation ratio. An improvement of the approximation ratio to o ( log n / log log n ) {\displaystyle o(\log n/\log \log n)} would also improve certain algorithms for approximate addition chains.[5]

See also

  • Grammar-based code
  • Kolmogorov Complexity
  • Lossless data compression

References

  1. ^ a b Charikar, Moses; Lehman, Eric; Liu, Ding; Panigrahy, Rina; Prabhakaran, Manoj; Sahai, Amit; Shelat, Abhi (2005). "The Smallest Grammar Problem". IEEE Transactions on Information Theory. 51 (7): 2554–2576. CiteSeerX 10.1.1.185.2130. doi:10.1109/TIT.2005.850116. S2CID 6900082. Zbl 1296.68086.
  2. ^ Florian Benz and Timo Kötzing, “An effective heuristic for the smallest grammar problem,” Proceedings of the fifteenth annual conference on Genetic and evolutionary computation conference - GECCO ’13, 2013. ISBN 978-1-4503-1963-8 doi:10.1145/2463372.2463441
  3. ^ a b Lohrey, Markus (2012). "Algorithmics on SLP-compressed strings: A survey" (PDF). Groups Complexity Cryptology. 4 (2): 241–299. doi:10.1515/GCC-2012-0016.
  4. ^ Domaratzki, Michael; Pighizzini, Giovanni; Shallit, Jeffrey (2002). "Simulating finite automata with context-free grammars". Information Processing Letters. 84 (6): 339–344. doi:10.1016/S0020-0190(02)00316-2. MR 1937222.
  5. ^ Charikar, Moses; Lehman, Eric; Liu, Ding; Panigrahy, Rina; Prabhakaran, Manoj; Rasala, April; Sahai, Amit; Shelat, Abhi (2002). "Approximating the Smallest Grammar: Kolmogorov Complexity in Natural Models" (PDF). Proceedings of the thirty-fourth annual ACM symposium on theory of computing (STOC 2002), Montreal, Quebec, Canada, May 19–21, 2002. New York, NY: ACM Press. pp. 792–801. doi:10.1145/509907.510021. ISBN 978-1-581-13495-7. S2CID 282489. Zbl 1192.68397.

External links

  • "CFG-Kolm-complexity is singleton sets with Lance and Bill". Computational Complexity. June 9, 2024.


  • v
  • t
  • e