Concept class

Computational learning theory
(Learn how and when to remove this message)

In computational learning theory in mathematics, a concept over a domain X is a total Boolean function over X. A concept class is a class of concepts. Concept classes are a subject of computational learning theory.

Concept class terminology frequently appears in model theory associated with probably approximately correct (PAC) learning.[1] In this setting, if one takes a set Y as a set of (classifier output) labels, and X is a set of examples, the map c : X Y {\displaystyle c:X\to Y} , i.e. from examples to classifier labels (where Y = { 0 , 1 } {\displaystyle Y=\{0,1\}} and where c is a subset of X), c is then said to be a concept. A concept class C {\displaystyle C} is then a collection of such concepts.

Given a class of concepts C, a subclass D is reachable if there exists a sample s such that D contains exactly those concepts in C that are extensions to s.[2] Not every subclass is reachable.[2][why?]

Background

A sample s {\displaystyle s} is a partial function from X {\displaystyle X} [clarification needed] to { 0 , 1 } {\displaystyle \{0,1\}} .[2] Identifying a concept with its characteristic function mapping X {\displaystyle X} to { 0 , 1 } {\displaystyle \{0,1\}} , it is a special case of a sample.[2]

Two samples are consistent if they agree on the intersection of their domains.[2] A sample s {\displaystyle s'} extends another sample s {\displaystyle s} if the two are consistent and the domain of s {\displaystyle s} is contained in the domain of s {\displaystyle s'} .[2]

Examples

Suppose that C = S + ( X ) {\displaystyle C=S^{+}(X)} . Then:

Applications

Let C {\displaystyle C} be some concept class. For any concept c C {\displaystyle c\in C} , we call this concept 1 / d {\displaystyle 1/d} -good for a positive integer d {\displaystyle d} if, for all x X {\displaystyle x\in X} , at least 1 / d {\displaystyle 1/d} of the concepts in C {\displaystyle C} agree with c {\displaystyle c} on the classification of x {\displaystyle x} .[2] The fingerprint dimension F D ( C ) {\displaystyle FD(C)} of the entire concept class C {\displaystyle C} is the least positive integer d {\displaystyle d} such that every reachable subclass C C {\displaystyle C'\subseteq C} contains a concept that is 1 / d {\displaystyle 1/d} -good for it.[2] This quantity can be used to bound the minimum number of equivalence queries[clarification needed] needed to learn a class of concepts according to the following inequality: F D ( C ) 1 # E Q ( C ) F D ( C ) ln ( | C | ) {\textstyle FD(C)-1\leq \#EQ(C)\leq \lceil FD(C)\ln(|C|)\rceil } .[2]

References

  1. ^ Chase, H., & Freitag, J. (2018). Model Theory and Machine Learning. arXiv preprint arXiv:1801.06566.
  2. ^ a b c d e f g h i j k l Angluin, D. (2004). "Queries revisited" (PDF). Theoretical Computer Science. 313 (2): 188–191. doi:10.1016/j.tcs.2003.11.004.