The Francis Crick Institute
Browse
- No file added yet -

Engineering AI for provable retention of objectives over time

Download (203.43 kB)
journal contribution
posted on 2024-06-20, 09:53 authored by Adeniyi Fasoro
I argue that ensuring artificial intelligence (AI) retains alignment with human values over time is critical yet understudied. Most research focuses on static alignment, neglecting crucial retention dynamics enabling stability during learning and autonomy. This paper elucidates limitations constraining provable retention, arguing key gaps include formalizing dynamics, transparency of advanced systems, participatory scaling, and risks of uncontrolled recursive self‐improvement. I synthesize technical and ethical perspectives into a conceptual framework grounded in control theory and philosophy to analyze dynamics. I argue priorities should shift towards capability modulation, participatory design, and advanced modeling to verify enduring alignment. Overall, I argue that realizing AI safely aligned throughout its lifetime necessitates translating principles into formal methods, demonstrations, and systems integrating technical and humanistic rigor.

History

Usage metrics

    The Francis Crick Institute

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC