In computer science, the Cocke–Younger–Kasami algorithm (alternatively called CYK, or CKY) is a parsing algorithm for context-free grammars published by Itiroo Sakai in 1961. The algorithm is named after some of its rediscoverers: John Cocke, Daniel Younger, Tadao Kasami, and Jacob T. Schwartz. It employs bottom-up parsing and dynamic programming.
The standard version of CYK operates only on context-free grammars given in Chomsky normal form (CNF). However any context-free grammar may be algorithmically transformed into a CNF grammar expressing the same language .
The importance of the CYK algorithm stems from its high efficiency in certain situations. Using big O notation, the worst case running time of CYK is , where is the length of the parsed string and is the size of the CNF grammar . This makes it one of the most efficient parsing algorithms in terms of worst-case asymptotic complexity, although other algorithms exist with better average running time in many practical scenarios.
The dynamic programming algorithm requires the context-free grammar to be rendered into Chomsky normal form (CNF), because it tests for possibilities to split the current sequence into two smaller sequences. Any context-free grammar that does not generate the empty string can be represented in CNF using only production rules of the forms , , and where is the start symbol.
The algorithm in pseudocode is as follows:
let the input be a string I consisting of n characters: a1 ... an.
let the grammar contain r nonterminal symbols R1 ... Rr, with start symbol R1.
let P[n,n,r] be an array of booleans. Initialize all elements of P to false.
let back[n,n,r] be an array of lists of backpointing triples. Initialize all elements of back to the empty list.
for each s = 1 to n
for each unit production Rv → as
set P[1,s,v] = true
for each l = 2 to n -- Length of span
for each s = 1 to n-l+1 -- Start of span
for each p = 1 to l-1 -- Partition of span
for each production Ra → Rb Rc
if P[p,s,b] and P[l-p,s+p,c] then
set P[l,s,a] = true,
append to back[l,s,a]
if P[n,1,1] is true then
I is member of language
return back -- by retracing the steps through back, one can easily construct all possible parse trees of the string.