Multi+refactor: persistent peer manager (p2p)

https://github.com/lightningnetwork/lnd/pull/5700

Host: guggero  -  PR author: ellemouton

Notes

  • This is a medium sized pure refactor PR.
  • There should be no behaviour change from this PR.

Questions

  • What is the purpose of the server struct in server.go? How does it differ from rpcServer in rpcserver.go?
  • What does refactoring mean in this context?
  • What are the advantages of refactoring code in this way (extracting into packages/structs)?
  • What are the risks of such refactor changes?
  • Where is the affected code tested? Before and after the PR?
  • What’s the biggest challenge for reviewers when reviewing a refactor PR?
  • Why is a clean commit structure especially important for a refactor PR?
  • How would you structure such a refactor PR?
  • How can a reviewer test the code changes?

Meeting Log

guggero Hi all! Excited to host the first lnd review club! Thank you very much @lucasdcf for setting everything up! Let’s start with the first, more general question: Did you review the PR? Concept ACK, approach ACK, tested ACK or NACK? (Please reply in thread)

bitromortac
only had a quick look today
Oscar Lafarga
concept ack
evan
Concept ACK
ziggie
concept ACK
Tony
concept ack
Priyansh
Concept ACK
elle
concept ack, approach nack :nerd_face:
Thomas Volk
concept ack
bhandras
concept ack
TKChattoraj
concept ack
coco
Concept ACK, but I’ve not reviewed the code. Ive been following these peer manager-related things since last year, I need to refresh my memory. :sweat_smile:
guggero
concept ack from me as well
ziggie
related to the discussions in the PR, did this change also fix a bug in the lnd code ?
Edward Hollander
concept ACK
guggero
I think there is a very small bug fix in there, but don’t remember exactly. @elle can you correct me if I’m wrong?
elle
it originally did but that was then handled by another PR. So no, this is now purely a refactor

guggero
Cool cool, answers seem to be flowing in. Feel free to answer all questions in order, that’s what’s great about the threads.
Moving on, second question: What is the purpose of the server struct in server.go? How does it differ from rpcServer in rpcserver.go?

Tony
server is the main entrypoint for all server interactions and bringing subcomponents together, rpcserver is for grpc specific interactions, which it itself is a subcomponent of server
guggero
Great answer! Couldn’t have described it better myself.
ziggie
is the server.go the main “scheduler” of the lnd daemon ?
guggero
yes, you could say that’ it’s the main thing that starts (and later stops) everything.
guggero
the only slightly larger task that is executed before the server is started is the configuration parsing and logging setup

guggero
Question 3: What does refactoring mean in this context?

Tony
Moving functionality into its own dedicated place.
guggero
Is it just about moving? or are there other aspects? If yes, what aspects are there?
coco
Wasn’t there different types of peers (in different arrays) that are now consolidated?
Oliver Offing
Organizing a.k.a. SOLID. Grouping code by the measure of when it changes—i.e. changes to the persistent peer manager do not affect the whole of server.go hence make s sense to separate them.
Thomas Volk
As the server code grew, it took on more and more responsibilities. At a certain point, a separation of concerns is needed. bitromortac Looks like testing is improved as well. TKChattoraj That functionality is the PersistePeerManager-which consists of managing a peerKey, address and configuration parameters for that peer. What constitutes a Persistent Peer though?
Tony
I wasn’t able to review deeply enough to see if there’s other changes, though I agree @bitromortac, now testing looks like it improves
guggero
All good and valid points! So to summarize: move code for better readability (separation of concern), better encapsulation, improved control over visibility (encapsulation, e.g. what is exported across package boundaries) and testability

guggero
Question 4: What are the advantages of refactoring code in this way (extracting into packages/structs)?

guggero
I guess that has already partly been answered by folks in the last question.
Thomas Volk
Better testability. Separation of concerns. Easier to read due to higher-level abstractions that clearly define and separate behavior.
Priyansh
It helps to navigate through the code easily.
Jordi Montes
Easier to read and easier to reason about it too*
guggero
Y’all are on fire! Great answers
Tony
I like what you said in the last thread @guggero “what is exported across package boundaries”. Making things private so that server isn’t able to access everything willy nilly
bitromortac
reusability could be a plus, although here it’s a very specific purpose
ziggie
does lnd follow a special software model how it organizes its code ?
guggero
Hmm, not that I’m aware of. But there are many “idiomatic” principles in the Golang world that many projects follow (intentionally or just by way of example in the Golang source) that we try to also implement
guggero
Or I mean you could say we try to follow all the “classical” software development models where they apply
guggero
But with that many different developers (internal and external contributors) it’s also hard to settle on common styles sometimes
Thomas Volk
Also, allows for more people to contribute to the codebase simultaneously without nasty merge conflicts.

guggero
Moving on, question 5: What are the risks of such refactor changes?

Priyansh
Imprecise refactoring could introduce new bugs and errors into the code
Thomas Volk
I think this is true of refactors generally, but the fact that this refactor increased test coverage means that our coverage before wasn’t as good, and therefore could easily introduce behavior changes due to implicit and untested contracts/invariants being broken.
Tony
Also, premature or “incorrect” refactoring could cause maintainability issues in the future. Though this one looks like a great improvement.
Oliver Offing
Adding on to what’s being said already, there’s the risk of introducing the wrong abstraction layer, which might make it difficult later on to change and evolve the code-base further.

guggero
Where is the affected code tested? Before and after the PR?

Tony
It does not look like the affected code was being tested in server_test.go before, now is being tested in peer/persistentpeer_mgr_test.go.
elle
well it isnt tested directly before but is tested indirectly through the itests
Oliver Offing
itests = integration tests?
guggero
yes
elle
specifically, see itest testReconnectAfterIPChange
guggero
Anyone have ideas why (probably) the code wasn’t unit tested before?
evan
too monolithic?
Oliver Offing
My guess is that it’d require mocking a whole bunch of things since the code was entangled with other code.
Tony
Probably because there was just too much going on to easily separate or mock everything else?
vyom
it wasn’t refactored.
bitromortac
organic growth?
elle
would need sooooo many mocks :fearful:
guggero
right, everything was in the main server struct, which basically has references to each component running in lnd. so mocking that would mean mocking all of lnd.
Tony
maybe a follow up question, is there anything else that should also be refactored out of server ?
guggero
I would need to look at the current code, but I’d say everything that is business logic and not pure start/stop logic should probably eventually be moved out. But before you start creating PRs for that, please follow the next questions carefully :sweat_smile:

guggero
Awesome answers and follow-up questions across the board. Very nice, I love the energy!
Let’s continue with question 7: What’s the biggest challenge for reviewers when reviewing a refactor PR?

McDouchebag
To guarantee feature parity
Tony
It’s sometimes hard to tell why certain refactors are needed and to tracking big diffs to make sure nothing changes
Oliver Offing
git has no concept of patches, so the diffs are always terrible (pijul can’t come soon enough).
guggero
interesting, have never heard of pijul, going to google that after the review club Oscar Lafarga
To justify a separation of concerns, a broad and high-level understanding of many different parts of the codebase is often needed to review refactors properly
McDouchebag
To guarantee feature parity
You have test to do so, but if there are no tests before a large refactor, it is almost impossible to guarantee that nothing of importance changed
Oliver Offing
difftastic helps a bit (semantic diffing)
guggero
personal follow-up question: how do you review PRs? purely on GitHub (side-by-side or merged view) or with other tools? (will take a look at difftastic, any other suggestions?)
Oscar Lafarga
I recently started trying out the Github extension on VSCode and it is convenient but needs some tweaking
McDouchebag
how do you review PRs?
For refactor PRs i usually try to follow the reconstruction via git commits to understand the thought process. This sometimes means re-coding the whole thing to understand
Tony
if it is simple, side by side in github and pulling down the code to test. More complex, I like two copies of the repo locally, one on main and one on the PR to dive into both, especially on code bases I don’t fully understand yet
Jordi Montes
Usually github UI. If there are to many things going on in a part of the code I use github ui to open the new/old version of the file open the other on in my editor and try to make sure I catch everything
guggero
Wow, re-coding a PR sounds like a lot of work! But definitely the most thorough way I can think of.
guggero
I use the GitHub side-by-side view for more simple things. And I really like the diff view of GoLand (yeah, I’m still not a cool not a vim hacker :disappointed: )
McDouchebag
I cant review something i dont understand. So sometimes it requires me to retrace all the steps.
McDouchebag
It however does not guarantee i will understand the PR and instead my time is wasted :)

guggero
Next question (8): Why is a clean commit structure especially important for a refactor PR?

McDouchebag
First, a clean commit structure is important an all PRs
Second, a clean commit structure should make it easier to follow the thought process of the person who made the PR. It should break the refactor into sizable chunks that are easy to review in separation which in turn makes the refactor easier to understand
Oliver Offing
There’s usually tons of changes
Often files get both moved/renamed AND edited/changed so it’s useful to keep track of exactly what happened to get confidence
Always useful to have a record to come back to eventually (costs nothing) Allows one to reference specific changes (specific commits) when providing feedback/review
Priyansh
It helps the reviewer to get a better understanding of the approach.
Tony
Makes it easier to follow changes when big diffs are involved. Maps smaller sections of the moved code one bit at a time.
Jordi Montes
one changes at a time =>
Allows you to see what exactly is changing
Allows you to reason about its implications

guggero Okay, second-to-last (prepared) question: How would you structure such a refactor PR?

guggero
This is a hard one, I agree. And perhaps not very easy to explain in text
guggero
Don’t be shy though, there are no wrong answers!
Oliver Offing
Yeah I don’t think I understand the question.
Tony
I thought the commit structure was good in the PR. Create the new parent struct, start moving things over slowly one at a time.
elle
but dont you think the first commit is a bit hard to follow?
Tony
I see @elle’s comment on changing the structure, curious how it’ll change
Oscar Lafarga
Maybe a 1st commit of purely comments describing the planned approach could be useful?
Tony
ah yeah, maybe it’s too big initially
guggero
To those struggling with the question itself: There are different ways to approach a code move, different patterns to follow. What patterns can you think of?
Oscar Lafarga
Identifying all dependencies of any variables or functions you intend to move should be an early step
Thomas Volk
I might have a first commit that just rips the code out and moves it into another file, moving any clearly related tests. Then add follow-up commits that refactor the code into new structs, and better abstractions/tests.
Oliver Offing
I guess I’d add that usually you’ll want to refrain from changing code that isn’t tested. If the code you want to refactor doesn’t have tests, you’ll want to write tests first and then refactor. Of course, your mileage may vary.

guggero
Okay last question from my end. feel free to post your own questions afterwards (please try to answer in threads to keep things organized): How can a reviewer test the code changes?

Oliver Offing
Run tests locally
Run multiple nodes locally and see if anything breaks — à la plebnet-playground-docker
Run it on mainnet at your own risk :)
Tony
If there’s integration tests, run those before and after, and throughout each and every commit (sometimes some commits in the middle break but go unnoticed). With harder PR’s, sometimes I just gotta run the program before and after, testing some related functionality to make sure it behaves correctly.
Bjarne
on a related note there was this comment by joostjager who spotted a concurrency issues, which can be really hard sometimes. Apart from using the –race flag on running tests, what are your approaches to spotting such concurrency issues. Purely from logical reasoning?
https://github.com/lightningnetwork/lnd/pull/5700/files#r806552891
guggero
cool, haven’t heard of the plebnet-playground-docker before. definitely going to take a look
Tony
build the docker image of the PR and throw it into polar :slightly_smiling_face:
guggero
Yeah, concurrency issues are very hard to find. Logical reasoning definitely helps. But having unit tests and running them with the race detector can help a lot as well
Priyansh
Other than testing the functionality locally, the reviewer can try improving the tests file by adding more edge cases.
roasbeef
+1 for the race condition detector, it’s a great tool: https://go.dev/blog/race-detector
Oscar Lafarga
checkout PR, build and run on mainnet with live channels #reckless (jk)
Oliver Offing
YOLO
Tony
i like test coverage tools as well to see if I can spot any possible bugs being missed where new code might not be covered by unit tests.
Edward Hollander
Indeed the Plebnet Playground Docker is great! We also test in at SRROF with a docker LND on testnet: https://github.com/ringtools/docker-ringtools-testnet
Then testing with multiple other test users who are willing to participate in the same manner.

roasbeef
:zap::rotating_light:BONUS ROUND:rotating_light::zap:

roasbeef
@elle you mentioned in the main PR that you started to re-work your approach somewhat, can you tell us more about that?

elle
yeah! i think the first commit is super hard to follow. So i think a better approach would be to move across one thing at a time. Here is a sneak peak to how i would do it now: https://github.com/ellemouton/lnd/pull/26
elle
so not adding any component until it is used
elle
it makes for more commits but they are more manageable review i think
elle
so with this new structure, in the same commit you can see the new logic being added AND the old logic being removed
elle
so easier to check for correctness
elle
cause with the current structure, the first commit is like: wtf is happening

roasbeef One thing I think is worth mentioning is that I view this PR as a complement to this issue I made earlier this year: https://github.com/lightningnetwork/lnd/issues/6294
The high level idea in the issue above, is to introduce a few more layers of abstractions to stop sub-systems from directly using the main *channel.DB “god pointer”, in favor of directly using the existing graph access abstractions we have in place (like for path finding, and also autopilot)
Combined with elle’s PR, this means that we can eventually instantiate the discovery.AuthenticatedGossiper outside of lnd in a stand alone process This is really cool, as it then enables entirely new architectures for lnd, like:

  • Gossiper in a stand alone process, it spins up these new managed peer instances to connect out to the p2p network.
  • It use a custom graph interface implementation that writes the graph updates and processes then to a stand alone DB we add a new SendGraphUpdate RPC call to the new peersrpc server
  • New lnd instances then don’t actually need to process gossip data, instead, they have a read-only replica/mirror of the main graph DB, and use SendGraphUpdate to send out updates
  • This means any new lnd nodes effectively get instant graph sync, as they just rely on that read-only copy, and then store their own channels outside the graph and add them as hop hints

lucasdcf This was awesome!!! Thank you everyone that participated (even the lurkers), I hope you’ve learned new things!
I encourage all of you to review the PR on Github. If you’re excited about contributing to LND, read our guide (https://github.com/lightningnetwork/lnd/blob/master/docs/code_contribution_guidelines.md) and check our list of good first issues here: https://github.com/lightningnetwork/lnd/labels/good%20first%20issue
roasbeef
worth mentioning that rn we’re wrapping up 0.15, has a number of yuuge updates like full on taproot support (prob the most advanced taproot onchain wallet when it drops?)
there’re some things we’re looking to push to a 0.15.1 that aren’t quite ready, but not large enough to block the next major release
we’re a bit off our usual schedule re major releases, but 0.16 has a number of exciting things on the horizon like: taproot private channels, sqlite3 support, a native SQL invoice registry, and some other goodies
guggero
A special thank you to @elle for agreeing to let us review her PR in the first review club. We’ll all take a look at the updated PR once you push the changes you mentioned in the question above.
Tony
Thank you all! Enjoyed this.
Edward Hollander
Thanks everyone! Great to follow and read everything! :+1:
Oliver Offing
Thank you @guggero! You’ve been a most gracious host. See y’all next week.