My Approach to AI Safety
TL;DRShortcoming issues should not be undermined because, if not tackled immediately, they will make Alignment a lot more complex. I am convinced that interpretability will be the best tool for monitoring and control, and for that, I will pursue this agenda through research and entrepreneurship.
# Hot Takes
- It is necessary to pursue research for short-term and long-term AIS in the meantime
- Entrepreneurship can be highly valuable to contribute to AIS
- My impact is likely to be net positive for Alignment
Epistemic StatusI am early in my career and even more in AIS; while I stand by the words below, I might not have considered the whole picture, and I don’t rule out the possibility of changing my mind.
# Short-term vs Long-term
The Industry Pressure
When thinking about what could go wrong with AI, it often comes to competitiveness and arm race dynamics. OpenAI, DeepMind & co are often criticised as having a monopoly on AI and are urged to make their research more open. Yet I really don’t see where letting all the models open-source could solve such an issue. Open source doesn’t genuinely reduce risks and can definitely make competition harder and faster. I am much more comfortable with the Frontier Model Forum to foster cooperation and ensure that the most prominent players don’t fall into a race.
Regarding sustainability, the industry has a vile tendency towards technology solutionism, relying too heavily on technological solutions without fully considering other leverage. Each capability increase has an associated environmental or cyber-security cost. Such a rapid increase is not controllable or sustainable in the long run. Regulating might be the most efficient but also the only effective way to gain security as fast as the capabilities improve. AI governance is definitely needed as the transforming potential of AI is, first and foremost, a societal issue.
From a safety point of view, regulation might also have some gotchas. The two main drawbacks to take into account are:
-
Control loss: Regulation might cause a slowdown in technological advancements. Thus, a country unaffected by regulation might take the upper hand. The critical point here is to devise regulations that support innovation, e.g., highly regulating the use cases but less regulating the technology.
-
Goodhart’s law: Benchmark optimisation has been seen with model evaluation and could also be the case with model interpretability. Companies will try to beat the benchmark or improve some interpretability metrics that don’t necessarily correlate with safety. It is, therefore, primordial to advance these safeguards as much as possible technically.
The Curse of AI Doomerism
I want to address mitigated criticism against AI Doomerism, which is reasonably represented in the AIS sphere. As an optimistic by nature, I might be failing to see the same future possibilities, and my arguments could be inaudible. In my Doomerism definition, I englobe focusing on long-term safety issues firstly because of existential risks.
The first criticism is the tendency to only consider worst-case scenarios. While I value some aspects of this approach, it often leads to working on highly theoretical and abstract subjects based on an extremely low probability with a lesser impact on immediate issues. Solving the worst-case scenario will not solve less dangerous ones.
GotchaAlignment is obviously a double-edged sword. If you can align an AGI to be benevolent, you’ll also be able to align it to be rogue. Misuse cannot be solved by Alignment alone.
It is crucial to address short-term issues that could potentially destroy Alignment research if they are not dealt with rapidly. It is essential to keep in mind that there are also other pressing matters that require attention, such as climate change, cybersecurity, and wars. I value prioritisation based on danger weighted by occurrence probability and timeline.
I finally have critics on some speeches I had the occasion to listen to, which is by far a loud minority. The righteous and moralistic speeches don’t foster debate and isolate the AIS community. This is dramatic when collaboration outside of the AIS sphere might just accelerate safety. Then, I sometimes feel too many people harbouring sentiments of culpability and pessimism. While scepticism is an essential research skill, I think that pessimism is terrible and ultimately prevents an extended range of expected rewards. It can also repulse people and cause isolation.
Viable Paths
Exploring effective strategies for AI safety, I believe, involves a blend of approaches. While I see the potential in symbolism and model conversion for inherent interpretability, they may not keep pace with rapid AI advancements. My focus is more on practical, immediate solutions.
I’m keen on closed evaluations where we can test AI models in controlled environments to understand their behaviours better. But that’s just the start; combining these closed evaluations with broader research into model interpretability can uncover patterns and insights that aren’t immediately obvious.
Another exciting direction I’m exploring is anchoring interpretability in theoretical foundations. This could lead us to a control theory for AI models, providing a solid framework for developing interpretable systems. It’s all about keeping up with AI’s fast-paced evolution while ensuring our safety measures are robust and scalable.
# Entrepreneurship
Precious Mindset
One of the main points of disagreement I have with Yudowsky, and others are regarding the driving forces behind the development of AIS. Safety cannot emerge in a domain where there is no market or innovation. Safety is a logical response to ongoing concerns and needs to be part of the system to truly model it. I firmly believe that creation, innovation, and people focus should be the critical drivers for the upcoming new AI era.
In order to ensure a significant impact and bring about change in the industry and, more generally, in the system, it is essential to be an active participant rather than just a pawn. Easier said than done! Yet, I believe that entrepreneurship gives the best leverage for this, whereas employees’ goals are crushed by hierarchy, and industry pressure and academic dreams of impact can fade behind the publication pressure.
Founding an AI Startup
Founding a startup can be an extremely valuable experience, as it allows for the development of a diverse skill set. It also provides an opportunity to gain a deeper understanding of the market and industry responses. By starting a company, you can work towards getting acknowledgement and influence in the industry, which can be invaluable for future success.
Obviously, AI will be the most transformative technology of our era, and there are plenty of possibilities to make AI applications into startups. I also think that this could be a first step towards creating a safety-oriented startup, which can be slightly different, as described in the next section Founding an Org.
Ulterior MotiveThe people need to be empowered by AI for their own good.
Although I am still at the beginning of this journey, I have learned a lot. It motivates me deeply and lets my creative and perfectionist selves be wholly expressed. I’ll be outlining this in a future post, “Creating a Startup”.
Founding an Org
Establishing an organisation dedicated to AI safety is the most efficient way to contribute to this field. Such an initiative would allow for the cultivation of independent research tailored to address specific concerns in AI safety heads. The governance and direction of such an organisation would be crucial, hinging on the selection of a forward-thinking, safety-oriented board. I think there are so many safety issues that this mindset could scale for the next few decades.
With impact and regulation, there’s a burgeoning necessity for entities that can bridge the gap between AI industry practices and academic research. An organisation focused on AI safety could serve as this vital link, facilitating the translation of theoretical safety concepts into practical industry applications. While it shares some operational similarities with traditional startups, particularly in terms of funding and product orientation, its core mission remains distinctively focused on enhancing safety in the AI domain.
Through this venture, the aim is to create a collaborative ecosystem that not only advances safety research but also influences industry standards and practices. By fostering an environment where innovative safety solutions are developed and shared, the organisation would play a critical role in shaping a future where AI is both powerful and safe.
# My General Approach
My AIS Edges
In order to maximise my impact on improving safety in AI, it is essential that I take advantage of all my edges. My first edge is technical problem-solving, taking advantage of my skills ranging from ML engineering to theoretical fundamentals from mathematics and physics. This enables me to tackle concrete AI Safety problems without being afraid of mathematical aspects while also being able to iterate quickly with simple concrete experiments. Obviously, my scope is limited, and I cannot really have a significant impact on too many theoretical, philosophical or political aspects for now.
Then, my second relevant edge for AIS lies in teaching and mentoring, which comes from my experiences and personal fit. I’m particularly passionate about mentoring, and I have helped students in private lessons in mathematics, CS, AI, and more throughout all my study years. I have been a high school teacher for a few months, introducing the fundamentals of CS. I am curious and a perfectionist by nature (fundamental teaching skills, IMO), and I think explaining concepts in the simplest way has always helped me. I would love to contribute to AIS by helping out with any upcoming AIS training programs, and overall, my goal is to continue learning and growing along with the field of AIS.
I left my entrepreneur edge apart since I am only beginning to grow and explore this facet. What is more, this edge is instead a catalyst and could serve the two previous ones outlined.
Brief Agenda
My main focus? Making AI interpretable. It’s the key to scaling up safely alongside rapid advancements in AI capabilities. I’m all in for grounding interpretability in solid theory where it makes sense, making sure it’s not just a temporary fix but a long-term solution.
I’m also putting a lot of stock in model evaluation. Sure, it might be a challenge to scale in the long run, but right now, it’s where the most promising and practical work is happening. I’m especially interested in how we can intertwine evaluation and interpretability to get the best of both worlds.
My agenda is about being proactive and adaptable, always looking for the most effective ways to keep AI safe and beneficial. It’s a mix of immediate actions and strategic long-term planning, ensuring that we’re prepared for the evolving landscape of AI technology.
What’s next?
I am currently undergoing a research internship on interpretability, and I maintain my participation in different AIS programs in parallel. I am finishing the program SPAR and will be involved in a research project with Apart Research.
I am also committed to developing my startup with my co-founders. While our focus is on empowering small shopkeepers with our solutions, I am also focusing on bringing the most valuable applications of AI to their shops for their productivity and their customers’ experience.
Line of ImpactThese plans are in line with what I depicted and what I believe are the most important tasks I should work on as of today.
In the long term, I am creating a PhD project with a focus on safety. I cannot say much for now, but it should involve multi-agent systems’ safety through interpretability. Meanwhile, I refused an industry PhD as I was worried I would be too constrained to explore the topics I wanted. This watershed opportunity might also be the occasion for me to start a podcast as a logical extension of this blog.
Regarding my involvement in the AIS sphere, I am planning to be a mentor in future iterations of AIS programs. I keep an eye on CERI, AIS Camp, and SPAR should they support the technical interpretability agenda.