Interested in contributing to Cryptpad, I have been reading through the developer documentation, Genaral Information, and quickly became confused.
Here is some notes IĀ gathered while reading the document. Hopefully this could be used as feedback to help better the documentation to smoothen wannabe contributors (like me) journey into Cryptpad.
"Sandbox" section
https://docs.cryptpad.org/en/dev_guide/general.html#sandbox
The second (safe) is usually not shown to users, it's only used internally within an iframe.
This isolation system has been implemented by putting the entire user interface in an iframe that takes up the whole window and that shows the safe URL.
These two sentences seems to contradict each other. Is the safe URL shown or not?
What does "internally" means in the first sentence?
CryptPad uses a sandboxing system
Feels like this should be the leading sentence of the section, before talking about URLs.
that isolates the user interface from the in-memory content
In-memory where?
- On the server? Then it seems that it would be "the rendering of the user interface from the in-memory content".
- In the frontend? Then replacing "in-memory content" with "browser storage content" would be less confusing.
- Both?
The interface, which is the exposed part of the structure, could contain vulnerabilities
Well, the backend could also contain vulnerabilities... What is the threat model here? I.e. what is the sandboxing supposed to protect against?
Also, what does "interface", and "exposed" mean?
I guess user interface. What about the APIs (websockets, HTTP endpoints)?
Also it seems that both unsafe and safe domains are "exposed", because technically, one could try to communicate with both...
contains the sensitive data
Again, it's unclear what the threat model is: Which data is sensitive? Document content, metadata, user information, etc.?
"5-level structure" section
https://docs.cryptpad.org/en/dev_guide/general.html#level-structure
Some style remark: "unsafe"
, and "safe"
should not be quoted.
Also, terminology introduced in this section should be in bold for consistency with previous section.
- Server side
- The "server" which contains the code launched in the main process. It manages the websocket connections and all calls to the server go through this level.
- The "workers" that manage all database connections and scripts that require more CPU resources. The main "server" calls them when it receives certain commands from users. They are launched in separate sub-processes in order to be able to make the most of the available CPU cores.
Wait, are both the safe and unsafe domain running the same server!?
The main "server" calls them when it receives certain commands from users.
Nit: this seems to imply that users can directly pass commands to the server... I guess this is more that workers are called by the main server to perform actions requested by users.
The base level, called "outer" in the code. This level is loaded with the "unsafe" URL (the one visible in the browser address bar) because it has access to sensitive data, including user account encryption keys.
What is a "level"? It seems that it is the content coming from the unsafe URL, loaded by the browser, so I guess it's an HTML document.
This level is loaded with the "unsafe" URL (the one visible in the browser address bar) because it has access to sensitive data, including user account encryption keys.
Unsafely loaded because it contains sensitive data, including encryption keys!? This logic sounds inappropriate...
The upper level, called "worker", which manages the connection to the server and keeps all the user account data in memory. This level is loaded in a SharedWorker when the browser supports it (Firefox, Chrome, Edge) with the "unsafe" URL [...]
Continuation of the previous point: unsafe domain loaded to contain all the user account data!? Really, what is the threat model?
https://docs.cryptpad.org/en/dev_guide/general.html#id3 diagram
It could be helpful to make the distinction between backend (running on a server) and frontend (running in the user's browser) components.
Also, where is the iframe
loading its content from!? This doesn't appear on the diagram.
"Encryption" section
User accounts, including their associated cryptographic keys, drive, contacts and teams, are stored in the database in the same way as any CryptPad collaborative document.
This seems to contradict https://docs.cryptpad.org/en/dev_guide/database.html , which says that "CryptPad takes an unusual approach to storing documents on the server. User data is simply stored on the file system rather than a database."...
"Registration, login and block" section
I found this section hard to read: it mixes high level explanation with implementation details.
In my opinion, it would be nicer to have a general overview of what makes a user, what information it consists of, etc. before explaining the registration and login process.
"Block" in the title made me first believe that it was possible to block users. It's weird to see an implementation details alongside actions.
I was hoping to find more information about cryptographic protocols used for the block encryption in the "Encryption" section, but couldn't find anything.
Also, I feel that the Scrypt explanation (in particular, the "expensive CPU" explanation) could be moved over there to simplify the explanation in this section.
"Client-server communication (Netflux)" section
Style: "The important points" shouldn't be bold, as in the rest of the document, bold words are introducing specific terminology.
"Server side" subsection
Is the history keeper unique? Or is it created per channel?
"Client side" subsection
a "bower" module manages the Netflux protocol with simple APIs
More accurately, it's a JavaScript module, although it is distributed by bower
.
End of document thoughts
I was hoping to find information about how the code is structured: which folder held frontend or backend code, which folder contained libraries used in which context, etc.