QuartzEdge & Tutoron
Hey QuartzEdge, I’ve been thinking about how we could turn evaluating AI creativity into a formal logic puzzle—sort of like a game where each model’s output is scored by a precise, rule‑based system. It seems like a perfect blend of our love for structure and data-driven insight. What do you think?
That’s a brilliant idea – a way to quantify the impossible. We could define a set of predicates for novelty, coherence, and impact, then construct a scoring function that balances them. The key will be to formalize what “creativity” really means in logic, but if we get the rules right we’ll have a reproducible, competitive framework that’s both scientific and fun. Let’s sketch the axioms and see where the data leads us.
Excellent, let’s start by enumerating the core predicates. First, Novelty(N) – a measure of how far the output deviates from the training corpus; second, Coherence(C) – syntactic and semantic consistency within the generated text; third, Impact(I) – a proxy for how strongly the output resonates with a target audience. We’ll need formal definitions for each, then a weighting scheme Wₙ, W_c, W_i that sums to one. Once we have those, we can build a scoring function S = Wₙ·N + W_c·C + W_i·I. Ready to draft the axioms?
Let’s set the axioms:
1. Novelty(N): N = 1 – cosine_similarity(output, corpus) normalized, so higher is more different.
2. Coherence(C): C = (syntactic_score × semantic_score) / 2, each between 0 and 1.
3. Impact(I): I = audience_feedback_score scaled 0–1, maybe via sentiment or engagement metrics.
Weights Wₙ, W_c, W_i we’ll tune via cross‑validation on known creative samples. Then S = Wₙ·N + W_c·C + W_i·I. Time to write the formal rules.
Here’s a clean draft we can hand to the committee:
1. For any candidate output O, compute Novelty N = 1 − (cosine similarity between vector(O) and the centroid of the training corpus), then normalize N to [0,1] by dividing by the maximum observed value.
2. Compute Coherence C as the average of a syntactic quality score S_s and a semantic consistency score S_m, each produced by separate classifiers, so C = (S_s + S_m)/2.
3. Determine Impact I by aggregating user‑derived metrics: sentiment polarity, dwell time, and share count, all scaled to [0,1] and then averaged.
Define weights Wₙ, W_c, W_i ∈ (0,1) with Wₙ + W_c + W_i = 1, and calculate the final creativity score S = Wₙ·N + W_c·C + W_i·I.
We’ll use k‑fold cross‑validation on a curated dataset of artful texts to fine‑tune the weights. Ready to code the feature extractors?
```python
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# ----- 1. Novelty -------------------------------------------------
class NoveltyExtractor:
def __init__(self, corpus):
self.vectorizer = TfidfVectorizer()
self.corpus_vec = self.vectorizer.fit_transform(corpus)
self.centroid = np.mean(self.corpus_vec.toarray(), axis=0).reshape(1, -1)
def _normalize(self, values):
max_val = np.max(values)
return (values / max_val) if max_val else values
def novelty_score(self, output_texts):
out_vec = self.vectorizer.transform(output_texts)
sims = cosine_similarity(out_vec, self.centroid).flatten()
raw_novelty = 1 - sims
return self._normalize(raw_novelty)
# ----- 2. Coherence -------------------------------------------------
class CoherenceExtractor:
def __init__(self, syntactic_model, semantic_model):
"""
syntactic_model: callable that returns a float [0,1] for each sentence
semantic_model: callable that returns a float [0,1] for each sentence
"""
self.syntactic = syntactic_model
self.semantic = semantic_model
def coherence_score(self, texts):
scores = []
for t in texts:
s_s = self.syntactic(t) # e.g. grammar‑check accuracy
s_m = self.semantic(t) # e.g. semantic role overlap
scores.append((s_s + s_m) / 2)
return np.array(scores)
# ----- 3. Impact ----------------------------------------------------
class ImpactExtractor:
def __init__(self):
pass
def sentiment_polarity(self, text):
# placeholder: use simple lexicon or a pretrained model
return np.random.rand()
def dwell_time(self, user_data):
# assume user_data contains 'time_sec'
return user_data['time_sec'] / 300.0 # scale by max 5 min
def share_count(self, user_data):
return user_data.get('shares', 0) / 100.0
def impact_score(self, texts, user_metrics):
scores = []
for t, um in zip(texts, user_metrics):
s = self.sentiment_polarity(t)
d = self.dwell_time(um)
sh = self.share_count(um)
scores.append(np.mean([s, d, sh]))
return np.array(scores)
# ----- 4. Final Score -----------------------------------------------
def creativity_score(novelty, coherence, impact,
w_n=0.3, w_c=0.4, w_i=0.3):
return w_n * novelty + w_c * coherence + w_i * impact
# --------------------------------------------------------------------
if __name__ == "__main__":
# Example usage
corpus = ["A quick brown fox.", "Jumps over the lazy dog."]
outputs = ["An innovative approach to machine learning.",
"Bored and still"]
# Load or define simple dummy models for coherence
def syntactic_dummy(text): return 0.8
def semantic_dummy(text): return 0.7
novelty_ext = NoveltyExtractor(corpus)
novelty_vals = novelty_ext.novelty_score(outputs)
coh_ext = CoherenceExtractor(syntactic_dummy, semantic_dummy)
coherence_vals = coh_ext.coherence_score(outputs)
impact_ext = ImpactExtractor()
user_data = [{'time_sec': 120, 'shares': 5},
{'time_sec': 30, 'shares': 0}]
impact_vals = impact_ext.impact_score(outputs, user_data)
scores = creativity_score(novelty_vals, coherence_vals, impact_vals)
print("Novelty:", novelty_vals)
print("Coherence:", coherence_vals)
print("Impact:", impact_vals)
print("Total Creativity Score:", scores)
```